red-candle 1.2.1 → 1.2.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 15a070c7424707802e4e82d00ef5532691d76d7627d28ab0a4b5f0ac7522471f
4
- data.tar.gz: 112d038f09eda1a6b1751935057bfefbc2355d7b4962c85461de192e3c667980
3
+ metadata.gz: 2d5cfca9bb05ab3dd9e8b1f1d92db4d94190b9acf473a41c7eb8a35c51a1a94a
4
+ data.tar.gz: aa2028e08fde8be4d9fd55b9719beab8ee8afabdc240900f176d88f7888bae6f
5
5
  SHA512:
6
- metadata.gz: cf43744df320c1d69773dad4713bb41e6fa9bd0f75359d3651aa53e429ac48615705682a61e793487d4a09a1f0d5a4aa28df18b3375de63916d9d1b91b2c98b2
7
- data.tar.gz: 81fbfe62ba6135b22b34cfeb8ea99b39b12b3dbb2e4c56ac76a758903554ff9f1911132f0be86e8ca130077d6e44997cfcf61a96d48452c8e13995acf76c7e88
6
+ metadata.gz: 16426f08f0cf7fd5ec1353adb139702c662df714a9d6fa7c3ed4d8feac81ba0313dcb58874c13b76c47ef6d8ca6acf6e22c8818888bf1affbcbc3d3b3a6fde96
7
+ data.tar.gz: 82f161767e7214ed97f36127d979da634507584f3c8110aa4672876d78edd6c9af0494f473e1a22f83311f7ebdf7e7ab8c41ec6451e5cf433b6a5acdc2ea603f
data/Cargo.lock CHANGED
@@ -2999,9 +2999,9 @@ checksum = "0fda2ff0d084019ba4d7c6f371c95d8fd75ce3524c3cb8fb653a3023f6323e64"
2999
2999
 
3000
3000
  [[package]]
3001
3001
  name = "slab"
3002
- version = "0.4.10"
3002
+ version = "0.4.11"
3003
3003
  source = "registry+https://github.com/rust-lang/crates.io-index"
3004
- checksum = "04dc19736151f35336d325007ac991178d504a119863a2fcb3758cdb5e52c50d"
3004
+ checksum = "7a2ae44ef20feb57a68b23d846850f861394c2e02dc425a50098ae8c90267589"
3005
3005
 
3006
3006
  [[package]]
3007
3007
  name = "smallvec"
data/LICENSE CHANGED
@@ -1,7 +1,7 @@
1
1
  MIT License
2
2
 
3
3
  Copyright (c) 2023 kojix2
4
- Copyright (c) 2024 Christopher Petersen
4
+ Copyright (c) 2024, 2025 Christopher Petersen
5
5
 
6
6
  Permission is hereby granted, free of charge, to any person obtaining a copy
7
7
  of this software and associated documentation files (the "Software"), to deal
data/README.md CHANGED
@@ -1,6 +1,6 @@
1
1
  <img src="/docs/assets/logo-title.png" alt="red-candle" height="80px">
2
2
 
3
- [![build](https://github.com/assaydepot/red-candle/actions/workflows/build.yml/badge.svg)](https://github.com/assaydepot/red-candle/actions/workflows/build.yml)
3
+ [![build](https://github.com/scientist-labs/red-candle/actions/workflows/build.yml/badge.svg)](https://github.com/scientist-labs/red-candle/actions/workflows/build.yml)
4
4
  [![Gem Version](https://badge.fury.io/rb/red-candle.svg)](https://badge.fury.io/rb/red-candle)
5
5
 
6
6
  Run state-of-the-art **language models directly from Ruby**. No Python, no APIs, no external services - just Ruby with blazing-fast Rust under the hood. Hardware accelerated with **Metal (Mac)** and **CUDA (NVIDIA).** Red candle leverages the Rust ecosystem, notably [Candle](https://github.com/huggingface/candle) and [Magnus](https://github.com/matsadler/magnus), to provide a fast and efficient way to run LLMs in Ruby. See [Dependencies](#dependencies) for more.
@@ -363,6 +363,12 @@ require 'candle'
363
363
  # Initialize the reranker with a cross-encoder model
364
364
  reranker = Candle::Reranker.from_pretrained("cross-encoder/ms-marco-MiniLM-L-12-v2")
365
365
 
366
+ # Or with custom max_length for truncation (default is 512)
367
+ reranker = Candle::Reranker.from_pretrained(
368
+ "cross-encoder/ms-marco-MiniLM-L-12-v2",
369
+ max_length: 256 # Faster processing with less context
370
+ )
371
+
366
372
  # Define your query and candidate documents
367
373
  query = "How many people live in London?"
368
374
  documents = [
@@ -469,6 +475,75 @@ The reranker uses a BERT-based architecture that:
469
475
 
470
476
  This joint processing allows cross-encoders to capture subtle semantic relationships between queries and documents, making them more accurate for reranking tasks, though at the cost of higher computational requirements.
471
477
 
478
+ ### Performance Considerations
479
+
480
+ **Important**: The Reranker automatically truncates documents to ensure stable performance. The default maximum is 512 tokens, but this is configurable.
481
+
482
+ #### Configurable Truncation
483
+
484
+ You can adjust the `max_length` parameter to balance performance and context:
485
+
486
+ ```ruby
487
+ # Default: 512 tokens (maximum context, ~300ms per doc on CPU)
488
+ reranker = Candle::Reranker.from_pretrained(model_id)
489
+
490
+ # Faster: 256 tokens (~60% faster, ~120ms per doc on CPU)
491
+ reranker = Candle::Reranker.from_pretrained(model_id, max_length: 256)
492
+
493
+ # Fastest: 128 tokens (~80% faster, ~60ms per doc on CPU)
494
+ reranker = Candle::Reranker.from_pretrained(model_id, max_length: 128)
495
+ ```
496
+
497
+ Choose based on your needs:
498
+ - **512 tokens**: Maximum context for complex queries (default)
499
+ - **256 tokens**: Good balance of speed and context
500
+ - **128 tokens**: Fast processing for simple matching
501
+
502
+ #### Performance Guidelines
503
+
504
+ 1. **Document Length**: Documents longer than ~400 words will be truncated
505
+ - The first 512 tokens (roughly 300-400 words) are used
506
+ - Consider splitting very long documents into chunks if full coverage is needed
507
+
508
+ 2. **Batch Size**: Process multiple documents in one call for efficiency
509
+ ```ruby
510
+ # Good: Single call with multiple documents
511
+ results = reranker.rerank(query, documents)
512
+
513
+ # Less efficient: Multiple calls
514
+ documents.map { |doc| reranker.rerank(query, [doc]) }
515
+ ```
516
+
517
+ 3. **Expected Performance**:
518
+ - **CPU**: ~0.3-0.5s per query-document pair
519
+ - **GPU (Metal/CUDA)**: ~0.05-0.1s per query-document pair
520
+ - Performance is consistent regardless of document length due to truncation
521
+
522
+ 4. **Chunking Strategy** for long documents:
523
+ ```ruby
524
+ def rerank_long_document(query, long_text, chunk_size: 300)
525
+ # Split into overlapping chunks
526
+ words = long_text.split
527
+ chunks = []
528
+
529
+ (0...words.length).step(chunk_size - 50) do |i|
530
+ chunk = words[i...(i + chunk_size)].join(" ")
531
+ chunks << chunk
532
+ end
533
+
534
+ # Rerank chunks
535
+ results = reranker.rerank(query, chunks)
536
+
537
+ # Return best chunk
538
+ results.max_by { |r| r[:score] }
539
+ end
540
+ ```
541
+
542
+ 5. **Memory Usage**:
543
+ - Model size: ~125MB
544
+ - Each batch processes all documents simultaneously
545
+ - Consider batching if you have many documents
546
+
472
547
  ## Tokenizer
473
548
 
474
549
  Red-Candle provides direct access to tokenizers for text preprocessing and analysis. This is useful for understanding how models process text, debugging issues, and building custom NLP pipelines.
@@ -874,7 +949,7 @@ Failed to load GGUF model: cannot find llama.attention.head_count in metadata (R
874
949
  FORK IT!
875
950
 
876
951
  ```
877
- git clone https://github.com/assaydepot/red-candle
952
+ git clone https://github.com/scientist-labs/red-candle
878
953
  cd red-candle
879
954
  bundle
880
955
  bundle exec rake compile
@@ -18,46 +18,48 @@ pub struct Reranker {
18
18
  }
19
19
 
20
20
  impl Reranker {
21
- pub fn new(model_id: String, device: Option<Device>) -> Result<Self> {
21
+ pub fn new(model_id: String, device: Option<Device>, max_length: Option<usize>) -> Result<Self> {
22
22
  let device = device.unwrap_or(Device::best()).as_device()?;
23
- Self::new_with_core_device(model_id, device)
23
+ let max_length = max_length.unwrap_or(512); // Default to 512
24
+ Self::new_with_core_device(model_id, device, max_length)
24
25
  }
25
-
26
- fn new_with_core_device(model_id: String, device: CoreDevice) -> std::result::Result<Self, Error> {
26
+
27
+ fn new_with_core_device(model_id: String, device: CoreDevice, max_length: usize) -> std::result::Result<Self, Error> {
27
28
  let result = (|| -> std::result::Result<(BertModel, TokenizerWrapper, Linear, Linear), Box<dyn std::error::Error + Send + Sync>> {
28
29
  let api = Api::new()?;
29
30
  let repo = api.repo(Repo::new(model_id.clone(), RepoType::Model));
30
-
31
+
31
32
  // Download model files
32
33
  let config_filename = repo.get("config.json")?;
33
34
  let tokenizer_filename = repo.get("tokenizer.json")?;
34
35
  let weights_filename = repo.get("model.safetensors")?;
35
-
36
+
36
37
  // Load config
37
38
  let config = std::fs::read_to_string(config_filename)?;
38
39
  let config: Config = serde_json::from_str(&config)?;
39
40
 
40
- // Setup tokenizer with padding
41
+ // Setup tokenizer with padding AND truncation
41
42
  let tokenizer = Tokenizer::from_file(tokenizer_filename)?;
42
43
  let tokenizer = TokenizerLoader::with_padding(tokenizer, None);
43
-
44
+ let tokenizer = TokenizerLoader::with_truncation(tokenizer, max_length);
45
+
44
46
  // Load model weights
45
47
  let vb = unsafe {
46
48
  VarBuilder::from_mmaped_safetensors(&[weights_filename], DType::F32, &device)?
47
49
  };
48
-
50
+
49
51
  // Load BERT model
50
52
  let model = BertModel::load(vb.pp("bert"), &config)?;
51
-
53
+
52
54
  // Load pooler layer (dense + tanh activation)
53
55
  let pooler = candle_nn::linear(config.hidden_size, config.hidden_size, vb.pp("bert.pooler.dense"))?;
54
-
56
+
55
57
  // Load classifier layer for cross-encoder (single output score)
56
58
  let classifier = candle_nn::linear(config.hidden_size, 1, vb.pp("classifier"))?;
57
-
59
+
58
60
  Ok((model, TokenizerWrapper::new(tokenizer), pooler, classifier))
59
61
  })();
60
-
62
+
61
63
  match result {
62
64
  Ok((model, tokenizer, pooler, classifier)) => {
63
65
  Ok(Self { model, tokenizer, pooler, classifier, device, model_id })
@@ -65,18 +67,18 @@ impl Reranker {
65
67
  Err(e) => Err(Error::new(magnus::exception::runtime_error(), format!("Failed to load model: {}", e))),
66
68
  }
67
69
  }
68
-
70
+
69
71
  /// Extract CLS embeddings from the model output, handling Metal device workarounds
70
72
  fn extract_cls_embeddings(&self, embeddings: &Tensor) -> std::result::Result<Tensor, Error> {
71
73
  let cls_embeddings = if self.device.is_metal() {
72
74
  // Metal has issues with tensor indexing, use a different approach
73
75
  let (batch_size, seq_len, hidden_size) = embeddings.dims3()
74
76
  .map_err(|e| Error::new(magnus::exception::runtime_error(), format!("Failed to get dims: {}", e)))?;
75
-
77
+
76
78
  // Reshape to [batch * seq_len, hidden] then take first hidden vectors for each batch
77
79
  let reshaped = embeddings.reshape((batch_size * seq_len, hidden_size))
78
80
  .map_err(|e| Error::new(magnus::exception::runtime_error(), format!("Failed to reshape: {}", e)))?;
79
-
81
+
80
82
  // Extract CLS tokens (first token of each sequence)
81
83
  let mut cls_vecs = Vec::new();
82
84
  for i in 0..batch_size {
@@ -85,7 +87,7 @@ impl Reranker {
85
87
  .map_err(|e| Error::new(magnus::exception::runtime_error(), format!("Failed to extract CLS: {}", e)))?;
86
88
  cls_vecs.push(cls_vec);
87
89
  }
88
-
90
+
89
91
  // Stack the CLS vectors
90
92
  Tensor::cat(&cls_vecs, 0)
91
93
  .map_err(|e| Error::new(magnus::exception::runtime_error(), format!("Failed to cat CLS tokens: {}", e)))?
@@ -93,39 +95,39 @@ impl Reranker {
93
95
  embeddings.i((.., 0))
94
96
  .map_err(|e| Error::new(magnus::exception::runtime_error(), format!("Failed to extract CLS token: {}", e)))?
95
97
  };
96
-
98
+
97
99
  // Ensure tensor is contiguous for downstream operations
98
100
  cls_embeddings.contiguous()
99
101
  .map_err(|e| Error::new(magnus::exception::runtime_error(), format!("Failed to make CLS embeddings contiguous: {}", e)))
100
102
  }
101
-
103
+
102
104
  pub fn debug_tokenization(&self, query: String, document: String) -> std::result::Result<magnus::RHash, Error> {
103
105
  // Create query-document pair for cross-encoder
104
106
  let query_doc_pair: EncodeInput = (query.clone(), document.clone()).into();
105
-
107
+
106
108
  // Tokenize using the inner tokenizer for detailed info
107
109
  let encoding = self.tokenizer.inner().encode(query_doc_pair, true)
108
110
  .map_err(|e| Error::new(magnus::exception::runtime_error(), format!("Tokenization failed: {}", e)))?;
109
-
111
+
110
112
  // Get token information
111
113
  let token_ids = encoding.get_ids().to_vec();
112
114
  let token_type_ids = encoding.get_type_ids().to_vec();
113
115
  let attention_mask = encoding.get_attention_mask().to_vec();
114
116
  let tokens = encoding.get_tokens().iter().map(|t| t.to_string()).collect::<Vec<_>>();
115
-
117
+
116
118
  // Create result hash
117
119
  let result = magnus::RHash::new();
118
120
  result.aset("token_ids", RArray::from_vec(token_ids.iter().map(|&id| id as i64).collect::<Vec<_>>()))?;
119
121
  result.aset("token_type_ids", RArray::from_vec(token_type_ids.iter().map(|&id| id as i64).collect::<Vec<_>>()))?;
120
122
  result.aset("attention_mask", RArray::from_vec(attention_mask.iter().map(|&mask| mask as i64).collect::<Vec<_>>()))?;
121
123
  result.aset("tokens", RArray::from_vec(tokens))?;
122
-
124
+
123
125
  Ok(result)
124
126
  }
125
-
127
+
126
128
  pub fn rerank_with_options(&self, query: String, documents: RArray, pooling_method: String, apply_sigmoid: bool) -> std::result::Result<RArray, Error> {
127
129
  let documents: Vec<String> = documents.to_vec()?;
128
-
130
+
129
131
  // Create query-document pairs for cross-encoder
130
132
  let query_and_docs: Vec<EncodeInput> = documents
131
133
  .iter()
@@ -135,13 +137,13 @@ impl Reranker {
135
137
  // Tokenize batch using inner tokenizer for access to token type IDs
136
138
  let encodings = self.tokenizer.inner().encode_batch(query_and_docs, true)
137
139
  .map_err(|e| Error::new(magnus::exception::runtime_error(), format!("Tokenization failed: {}", e)))?;
138
-
140
+
139
141
  // Convert to tensors
140
142
  let token_ids = encodings
141
143
  .iter()
142
144
  .map(|e| e.get_ids().to_vec())
143
145
  .collect::<Vec<_>>();
144
-
146
+
145
147
  let token_type_ids = encodings
146
148
  .iter()
147
149
  .map(|e| e.get_type_ids().to_vec())
@@ -153,11 +155,11 @@ impl Reranker {
153
155
  .map_err(|e| Error::new(magnus::exception::runtime_error(), format!("Failed to create token type ids tensor: {}", e)))?;
154
156
  let attention_mask = token_ids.ne(0u32)
155
157
  .map_err(|e| Error::new(magnus::exception::runtime_error(), format!("Failed to create attention mask: {}", e)))?;
156
-
158
+
157
159
  // Forward pass through BERT
158
160
  let embeddings = self.model.forward(&token_ids, &token_type_ids, Some(&attention_mask))
159
161
  .map_err(|e| Error::new(magnus::exception::runtime_error(), format!("Model forward pass failed: {}", e)))?;
160
-
162
+
161
163
  // Apply pooling based on the specified method
162
164
  let pooled_embeddings = match pooling_method.as_str() {
163
165
  "pooler" => {
@@ -181,10 +183,10 @@ impl Reranker {
181
183
  (sum / (seq_len as f64))
182
184
  .map_err(|e| Error::new(magnus::exception::runtime_error(), format!("Failed to compute mean: {}", e)))?
183
185
  },
184
- _ => return Err(Error::new(magnus::exception::runtime_error(),
186
+ _ => return Err(Error::new(magnus::exception::runtime_error(),
185
187
  format!("Unknown pooling method: {}. Use 'pooler', 'cls', or 'mean'", pooling_method)))
186
188
  };
187
-
189
+
188
190
  // Apply classifier to get relevance scores (raw logits)
189
191
  // Ensure tensor is contiguous before linear layer
190
192
  let pooled_embeddings = pooled_embeddings.contiguous()
@@ -193,7 +195,7 @@ impl Reranker {
193
195
  .map_err(|e| Error::new(magnus::exception::runtime_error(), format!("Classifier forward failed: {}", e)))?;
194
196
  let scores = logits.squeeze(1)
195
197
  .map_err(|e| Error::new(magnus::exception::runtime_error(), format!("Failed to squeeze tensor: {}", e)))?;
196
-
198
+
197
199
  // Optionally apply sigmoid activation
198
200
  let scores = if apply_sigmoid {
199
201
  sigmoid(&scores)
@@ -201,7 +203,7 @@ impl Reranker {
201
203
  } else {
202
204
  scores
203
205
  };
204
-
206
+
205
207
  let scores_vec: Vec<f32> = scores.to_vec1()
206
208
  .map_err(|e| Error::new(magnus::exception::runtime_error(), format!("Failed to convert scores to vec: {}", e)))?;
207
209
 
@@ -212,7 +214,7 @@ impl Reranker {
212
214
  .enumerate()
213
215
  .map(|(idx, (doc, score))| (doc, score, idx))
214
216
  .collect();
215
-
217
+
216
218
  // Sort documents by relevance score (descending)
217
219
  ranked_docs.sort_by(|a, b| b.1.partial_cmp(&a.1).unwrap_or(std::cmp::Ordering::Equal));
218
220
 
@@ -232,17 +234,17 @@ impl Reranker {
232
234
  pub fn tokenizer(&self) -> std::result::Result<crate::ruby::tokenizer::Tokenizer, Error> {
233
235
  Ok(crate::ruby::tokenizer::Tokenizer(self.tokenizer.clone()))
234
236
  }
235
-
237
+
236
238
  /// Get the model_id
237
239
  pub fn model_id(&self) -> String {
238
240
  self.model_id.clone()
239
241
  }
240
-
242
+
241
243
  /// Get the device
242
244
  pub fn device(&self) -> Device {
243
245
  Device::from_device(&self.device)
244
246
  }
245
-
247
+
246
248
  /// Get all options as a hash
247
249
  pub fn options(&self) -> std::result::Result<magnus::RHash, Error> {
248
250
  let hash = magnus::RHash::new();
@@ -254,7 +256,7 @@ impl Reranker {
254
256
 
255
257
  pub fn init(rb_candle: RModule) -> std::result::Result<(), Error> {
256
258
  let c_reranker = rb_candle.define_class("Reranker", class::object())?;
257
- c_reranker.define_singleton_method("_create", function!(Reranker::new, 2))?;
259
+ c_reranker.define_singleton_method("_create", function!(Reranker::new, 3))?;
258
260
  c_reranker.define_method("rerank_with_options", method!(Reranker::rerank_with_options, 4))?;
259
261
  c_reranker.define_method("debug_tokenization", method!(Reranker::debug_tokenization, 2))?;
260
262
  c_reranker.define_method("tokenizer", method!(Reranker::tokenizer, 0))?;
@@ -3,8 +3,7 @@ module Candle
3
3
  def self.display_cuda_info
4
4
  info = Candle.build_info
5
5
 
6
- # Only display CUDA info if running in development or if CANDLE_VERBOSE is set
7
- return unless ENV['CANDLE_VERBOSE'] || ENV['CANDLE_DEBUG'] || $DEBUG
6
+ # CUDA info is now controlled by logger level
8
7
 
9
8
  if info["cuda_available"] == false
10
9
  # :nocov:
@@ -13,11 +12,11 @@ module Candle
13
12
  File.exist?('/usr/local/cuda') || File.exist?('/opt/cuda')
14
13
 
15
14
  if cuda_potentially_available
16
- warn "=" * 80
17
- warn "Red Candle: CUDA detected on system but not enabled in build."
18
- warn "This may be due to CANDLE_DISABLE_CUDA being set during installation."
19
- warn "To enable CUDA support, reinstall without CANDLE_DISABLE_CUDA set."
20
- warn "=" * 80
15
+ Candle.logger.warn "=" * 80
16
+ Candle.logger.warn "Red Candle: CUDA detected on system but not enabled in build."
17
+ Candle.logger.warn "This may be due to CANDLE_DISABLE_CUDA being set during installation."
18
+ Candle.logger.warn "To enable CUDA support, reinstall without CANDLE_DISABLE_CUDA set."
19
+ Candle.logger.warn "=" * 80
21
20
  end
22
21
  # :nocov:
23
22
  end
@@ -3,7 +3,7 @@ module Candle
3
3
  # @deprecated Use {Candle::Device.best} instead
4
4
  # Get the best available device (Metal > CUDA > CPU)
5
5
  def self.best_device
6
- warn "[DEPRECATION] `DeviceUtils.best_device` is deprecated. Please use `Device.best` instead."
6
+ Candle.logger.warn "[DEPRECATION] `DeviceUtils.best_device` is deprecated. Please use `Device.best` instead."
7
7
  Device.best
8
8
  end
9
9
  end
data/lib/candle/llm.rb CHANGED
@@ -78,7 +78,7 @@ module Candle
78
78
  JSON.parse(json_content)
79
79
  rescue JSON::ParserError => e
80
80
  # Return the raw string if parsing fails
81
- warn "Warning: Generated output is not valid JSON: #{e.message}" if options[:warn_on_parse_error]
81
+ Candle.logger.warn "Generated output is not valid JSON: #{e.message}" if options[:warn_on_parse_error]
82
82
  result
83
83
  end
84
84
  end
@@ -261,7 +261,7 @@ module Candle
261
261
  if e.message.include?("No tokenizer found")
262
262
  # Auto-detect tokenizer
263
263
  detected_tokenizer = guess_tokenizer(model_id)
264
- warn "No tokenizer found in GGUF repo. Using tokenizer from: #{detected_tokenizer}"
264
+ Candle.logger.info "No tokenizer found in GGUF repo. Using tokenizer from: #{detected_tokenizer}"
265
265
  model_str = "#{model_str}@@#{detected_tokenizer}"
266
266
  _from_pretrained(model_str, device)
267
267
  else
@@ -0,0 +1,149 @@
1
+ require 'logger'
2
+
3
+ module Candle
4
+ # Logging functionality for the Red Candle gem
5
+ class << self
6
+ # Get the current logger instance
7
+ # @return [Logger] The logger instance
8
+ def logger
9
+ @logger ||= create_default_logger
10
+ end
11
+
12
+ # Set a custom logger instance
13
+ # @param custom_logger [Logger] A custom logger instance
14
+ def logger=(custom_logger)
15
+ @logger = custom_logger
16
+ end
17
+
18
+ # Configure logging with a block
19
+ # @yield [config] Configuration object
20
+ def configure_logging
21
+ config = LoggerConfig.new
22
+ yield config if block_given?
23
+ @logger = config.build_logger
24
+ end
25
+
26
+ private
27
+
28
+ # Create the default logger with CLI-friendly settings
29
+ # @return [Logger] Configured logger instance
30
+ def create_default_logger
31
+ logger = Logger.new($stderr)
32
+ logger.level = default_log_level
33
+ logger.formatter = cli_friendly_formatter
34
+ logger
35
+ end
36
+
37
+ # Determine default log level based on environment variables
38
+ # @return [Integer] Logger level constant
39
+ def default_log_level
40
+ # Support legacy CANDLE_VERBOSE for backward compatibility, but prefer explicit configuration
41
+ return Logger::DEBUG if ENV['CANDLE_VERBOSE']
42
+ Logger::WARN # CLI-friendly: only show warnings/errors by default
43
+ end
44
+
45
+ # CLI-friendly formatter that outputs just the message
46
+ # @return [Proc] Formatter proc
47
+ def cli_friendly_formatter
48
+ proc { |severity, datetime, progname, msg| "#{msg}\n" }
49
+ end
50
+ end
51
+
52
+ # Configuration helper for logger setup
53
+ class LoggerConfig
54
+ attr_accessor :level, :output, :formatter
55
+
56
+ def initialize
57
+ @level = :warn
58
+ @output = $stderr
59
+ @formatter = :simple
60
+ end
61
+
62
+ # Build a logger from the configuration
63
+ # @return [Logger] Configured logger
64
+ def build_logger
65
+ logger = Logger.new(@output)
66
+ logger.level = normalize_level(@level)
67
+ logger.formatter = build_formatter(@formatter)
68
+ logger
69
+ end
70
+
71
+ # Set log level to debug (verbose output)
72
+ def verbose!
73
+ @level = :debug
74
+ end
75
+
76
+ # Set log level to info
77
+ def info!
78
+ @level = :info
79
+ end
80
+
81
+ # Set log level to warn (default)
82
+ def quiet!
83
+ @level = :warn
84
+ end
85
+
86
+ # Set log level to error (minimal output)
87
+ def silent!
88
+ @level = :error
89
+ end
90
+
91
+ # Log to stdout instead of stderr
92
+ def log_to_stdout!
93
+ @output = $stdout
94
+ end
95
+
96
+ # Log to a file
97
+ # @param file_path [String] Path to log file
98
+ def log_to_file!(file_path)
99
+ @output = file_path
100
+ end
101
+
102
+ # Disable logging completely
103
+ def disable!
104
+ @output = File::NULL
105
+ end
106
+
107
+ private
108
+
109
+ # Convert symbol/string level to Logger constant
110
+ # @param level [Symbol, String, Integer] Log level
111
+ # @return [Integer] Logger level constant
112
+ def normalize_level(level)
113
+ case level.to_s.downcase
114
+ when 'debug' then Logger::DEBUG
115
+ when 'info' then Logger::INFO
116
+ when 'warn', 'warning' then Logger::WARN
117
+ when 'error' then Logger::ERROR
118
+ when 'fatal' then Logger::FATAL
119
+ else Logger::WARN
120
+ end
121
+ end
122
+
123
+ # Build formatter based on type
124
+ # @param formatter_type [Symbol] Type of formatter
125
+ # @return [Proc] Formatter proc
126
+ def build_formatter(formatter_type)
127
+ case formatter_type
128
+ when :simple, :cli
129
+ proc { |severity, datetime, progname, msg| "#{msg}\n" }
130
+ when :detailed
131
+ proc do |severity, datetime, progname, msg|
132
+ "[#{datetime.strftime('%Y-%m-%d %H:%M:%S')}] #{severity}: #{msg}\n"
133
+ end
134
+ when :json
135
+ require 'json'
136
+ proc do |severity, datetime, progname, msg|
137
+ JSON.generate({
138
+ timestamp: datetime.iso8601,
139
+ level: severity,
140
+ message: msg,
141
+ program: progname
142
+ }) + "\n"
143
+ end
144
+ else
145
+ proc { |severity, datetime, progname, msg| "#{msg}\n" }
146
+ end
147
+ end
148
+ end
149
+ end
data/lib/candle/ner.rb CHANGED
@@ -196,7 +196,7 @@ module Candle
196
196
  # This is especially important for Ruby < 3.2
197
197
  max_length = 1_000_000 # 1MB of text
198
198
  if text.length > max_length
199
- warn "PatternEntityRecognizer: Text truncated from #{text.length} to #{max_length} chars for safety"
199
+ Candle.logger.warn "PatternEntityRecognizer: Text truncated from #{text.length} to #{max_length} chars for safety"
200
200
  text = text[0...max_length]
201
201
  end
202
202
 
@@ -6,18 +6,20 @@ module Candle
6
6
  # Load a pre-trained reranker model from HuggingFace
7
7
  # @param model_id [String] HuggingFace model ID (defaults to cross-encoder/ms-marco-MiniLM-L-12-v2)
8
8
  # @param device [Candle::Device] The device to use for computation (defaults to best available)
9
+ # @param max_length [Integer] Maximum sequence length for truncation (defaults to 512)
9
10
  # @return [Reranker] A new Reranker instance
10
- def self.from_pretrained(model_id = DEFAULT_MODEL_PATH, device: Candle::Device.best)
11
- _create(model_id, device)
11
+ def self.from_pretrained(model_id = DEFAULT_MODEL_PATH, device: Candle::Device.best, max_length: 512)
12
+ _create(model_id, device, max_length)
12
13
  end
13
14
 
14
15
  # Constructor for creating a new Reranker with optional parameters
15
16
  # @deprecated Use {.from_pretrained} instead
16
17
  # @param model_path [String, nil] The path to the model on Hugging Face
17
18
  # @param device [Candle::Device, Candle::Device.cpu] The device to use for computation
18
- def self.new(model_path: DEFAULT_MODEL_PATH, device: Candle::Device.best)
19
+ # @param max_length [Integer] Maximum sequence length for truncation (defaults to 512)
20
+ def self.new(model_path: DEFAULT_MODEL_PATH, device: Candle::Device.best, max_length: 512)
19
21
  $stderr.puts "[DEPRECATION] `Reranker.new` is deprecated. Please use `Reranker.from_pretrained` instead."
20
- _create(model_path, device)
22
+ _create(model_path, device, max_length)
21
23
  end
22
24
 
23
25
  # Returns documents ranked by relevance using the specified pooling method.
@@ -1,5 +1,5 @@
1
1
  # :nocov:
2
2
  module Candle
3
- VERSION = "1.2.1"
3
+ VERSION = "1.2.3"
4
4
  end
5
5
  # :nocov:
data/lib/candle.rb CHANGED
@@ -1,3 +1,4 @@
1
+ require_relative "candle/logger"
1
2
  require_relative "candle/candle"
2
3
  require_relative "candle/tensor"
3
4
  require_relative "candle/device_utils"
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: red-candle
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.2.1
4
+ version: 1.2.3
5
5
  platform: ruby
6
6
  authors:
7
7
  - Christopher Petersen
@@ -9,7 +9,7 @@ authors:
9
9
  autorequire:
10
10
  bindir: bin
11
11
  cert_chain: []
12
- date: 2025-08-11 00:00:00.000000000 Z
12
+ date: 2025-09-07 00:00:00.000000000 Z
13
13
  dependencies:
14
14
  - !ruby/object:Gem::Dependency
15
15
  name: rb_sys
@@ -218,13 +218,14 @@ files:
218
218
  - lib/candle/embedding_model.rb
219
219
  - lib/candle/embedding_model_type.rb
220
220
  - lib/candle/llm.rb
221
+ - lib/candle/logger.rb
221
222
  - lib/candle/ner.rb
222
223
  - lib/candle/reranker.rb
223
224
  - lib/candle/tensor.rb
224
225
  - lib/candle/tokenizer.rb
225
226
  - lib/candle/version.rb
226
227
  - lib/red-candle.rb
227
- homepage: https://github.com/assaydepot/red-candle
228
+ homepage: https://github.com/scientist-labs/red-candle
228
229
  licenses:
229
230
  - MIT
230
231
  metadata: {}