red-candle 1.5.0 → 1.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: a15236dd78a04cfcc6d7978ca088934987b84a599371c1508383d147172abcc3
4
- data.tar.gz: f2661ad1220f566dd8a7d9eab1d74bd30007ed022c9381daf2697d12d88fac96
3
+ metadata.gz: 0e67b3106c236e7f579cf3bdfdf56f0b5ae99e8b33bb9bf35cfe361c19bb0514
4
+ data.tar.gz: 573ae31fda25ebd4f22e7338a6a94fd48ce9d869959957acbc8d77e1ed83052c
5
5
  SHA512:
6
- metadata.gz: d74f26e2c0527f25424245442b5b74cf1575505a42a2c1c823be31606433a6c9e4736afc6d8062bd54642cbc8373deaae197bfc7163cbaa392d457a844e67628
7
- data.tar.gz: 1ffc6d400ba4ffe9c23603fb42adae088142446f1d0bc03d0365a2bfcde3c1d1beab51eec01481d1a6e89f686c235e2ec58f0703daed73fabcf5cfd7f7e22d95
6
+ metadata.gz: 382e0ab840efe730184a7baa9f7438f43b10898d94fbea2648d92fbcf30a39ae9dcf48626dbf5a63e64263c2300d88555815c95820c4a6060eaa7b5f370deb9d
7
+ data.tar.gz: 00b15a30260ce226ef97c4e47a193ed557147377835472df552fde8de5f60c319ac4b8d7a1ef4f6a0408232c46776503b6f4b9a30bb46209df750ce3f47dd7e7
data/README.md CHANGED
@@ -273,6 +273,85 @@ See [STRUCTURED_GENERATION.md](docs/STRUCTURED_GENERATION.md) for detailed docum
273
273
 
274
274
  **Note on Reliability**: Structured generation constrains the model's output tokens, but success rates vary by model size and schema complexity. Smaller models (< 7B parameters) may occasionally produce incomplete or invalid JSON, especially with complex schemas. Consider implementing retry logic or fallback strategies in production applications. Larger models generally perform much better with structured generation.
275
275
 
276
+ ## Tool Calling
277
+
278
+ Red-candle supports tool/function calling, enabling models to invoke external functions during generation. This works best with models fine-tuned for tool calling, such as Qwen3.
279
+
280
+ ### Defining Tools
281
+
282
+ ```ruby
283
+ get_weather = Candle::Tool.new(
284
+ name: "get_weather",
285
+ description: "Get the current weather for a city",
286
+ parameters: {
287
+ type: "object",
288
+ properties: { city: { type: "string", description: "City name" } },
289
+ required: ["city"]
290
+ }
291
+ ) { |args| { city: args["city"], temperature: 72, condition: "sunny" } }
292
+ ```
293
+
294
+ ### Extracting Tool Calls
295
+
296
+ `chat_with_tools` injects tool definitions into the system prompt, generates a response, and parses any `<tool_call>` tags from the output. It does **not** feed results back to the model — it just tells you what the model wants to call. You decide what to do with it:
297
+
298
+ ```ruby
299
+ llm = Candle::LLM.from_pretrained("Qwen/Qwen3-0.6B")
300
+
301
+ messages = [{ role: "user", content: "What's the weather in San Francisco?" }]
302
+ result = llm.chat_with_tools(messages, tools: [get_weather],
303
+ config: Candle::GenerationConfig.deterministic(max_length: 500))
304
+
305
+ if result.has_tool_calls?
306
+ result.tool_calls.each do |tc|
307
+ puts "#{tc.name}(#{tc.arguments})"
308
+ output = get_weather.call(tc.arguments)
309
+ puts "=> #{output}"
310
+ end
311
+ else
312
+ puts result.text_response
313
+ end
314
+ ```
315
+
316
+ Pass `execute: true` to automatically run the tools (but still no round-trip back to the model):
317
+
318
+ ```ruby
319
+ result = llm.chat_with_tools(messages, tools: [get_weather], execute: true,
320
+ config: Candle::GenerationConfig.deterministic(max_length: 500))
321
+
322
+ result.tool_results.each do |tr|
323
+ puts "#{tr[:tool_call].name} => #{tr[:result]}"
324
+ end
325
+ ```
326
+
327
+ ### Agent (Multi-Turn Tool Loop)
328
+
329
+ `Candle::Agent` completes the round-trip: generate → parse tool calls → execute → feed results back to the model → repeat until the model produces a final text answer or hits `max_iterations`. This is a convenience wrapper for quick prototyping — for production use, frameworks like [RubyLLM](https://github.com/crmne/ruby_llm) manage this loop for you via the [ruby_llm-red_candle](https://github.com/scientist-labs/ruby_llm-red_candle) plugin:
330
+
331
+ ```ruby
332
+ agent = Candle::Agent.new(llm, tools: [get_weather, lookup_price], max_iterations: 5)
333
+ result = agent.run("What's the weather in Paris, and how much does a widget cost?",
334
+ config: Candle::GenerationConfig.deterministic(max_length: 1000))
335
+
336
+ puts result.response # Final text answer from the model
337
+ puts result.iterations # Number of generate cycles
338
+ puts result.tool_calls_made # Number of tools invoked
339
+ ```
340
+
341
+ ### Model Recommendations
342
+
343
+ Tool calling quality depends heavily on model size:
344
+
345
+ | Model | Tool Calling Quality |
346
+ |-------|---------------------|
347
+ | **Qwen3-8B GGUF** (~5 GB) | Calls correct tools, self-corrects errors, but may hallucinate values from tool results |
348
+ | **Qwen3-4B GGUF** (~2.5 GB) | Calls correct tools, occasional reasoning errors |
349
+ | **Qwen3-0.6B** (~1.2 GB) | Single-turn works, needs `max_length: 500+` for thinking |
350
+ | SmolLM2-360M | Does not work |
351
+ | TinyLlama-1.1B | Does not work (not fine-tuned for tool calling) |
352
+
353
+ **Tip:** Qwen3 models use a `<think>` reasoning block before producing tool calls. Set `max_length` high enough (500+ for 0.6B, 1000+ for larger models) to allow room for both thinking and the tool call.
354
+
276
355
  ## ⚠️ Model Format Requirements
277
356
 
278
357
  ### EmbeddingModels and Rerankers: Safetensors Only
@@ -319,9 +319,9 @@ impl Llama {
319
319
  if i == 1 || (i == 0 && system_message.is_empty()) {
320
320
  // First user message
321
321
  if !system_message.is_empty() {
322
- prompt.push_str(&format!("<s>[INST] <<SYS>>\n{}\n<</SYS>>\n\n{} [/INST]", system_message, content));
322
+ prompt.push_str(&format!("[INST] <<SYS>>\n{}\n<</SYS>>\n\n{} [/INST]", system_message, content));
323
323
  } else {
324
- prompt.push_str(&format!("<s>[INST] {} [/INST]", content));
324
+ prompt.push_str(&format!("[INST] {} [/INST]", content));
325
325
  }
326
326
  } else {
327
327
  prompt.push_str(&format!(" [INST] {} [/INST]", content));
@@ -340,7 +340,7 @@ impl Llama {
340
340
  fn apply_llama3_template(&self, messages: &[serde_json::Value]) -> CandleResult<String> {
341
341
  let mut prompt = String::new();
342
342
 
343
- prompt.push_str("<|begin_of_text|>");
343
+ // BOS token is added by the tokenizer's encode(prompt, add_special_tokens=true)
344
344
 
345
345
  for message in messages {
346
346
  let role = message["role"].as_str().unwrap_or("");
@@ -386,9 +386,9 @@ impl QuantizedGGUF {
386
386
  "user" => {
387
387
  if i == 1 || (i == 0 && system_message.is_empty()) {
388
388
  if !system_message.is_empty() {
389
- prompt.push_str(&format!("<s>[INST] <<SYS>>\n{}\n<</SYS>>\n\n{} [/INST]", system_message, content));
389
+ prompt.push_str(&format!("[INST] <<SYS>>\n{}\n<</SYS>>\n\n{} [/INST]", system_message, content));
390
390
  } else {
391
- prompt.push_str(&format!("<s>[INST] {} [/INST]", content));
391
+ prompt.push_str(&format!("[INST] {} [/INST]", content));
392
392
  }
393
393
  } else {
394
394
  prompt.push_str(&format!(" [INST] {} [/INST]", content));
@@ -406,7 +406,7 @@ impl QuantizedGGUF {
406
406
 
407
407
  fn apply_llama3_template(&self, messages: &[serde_json::Value]) -> CandleResult<String> {
408
408
  let mut prompt = String::new();
409
- prompt.push_str("<|begin_of_text|>");
409
+ // BOS token is added by the tokenizer's encode(prompt, add_special_tokens=true)
410
410
 
411
411
  for message in messages {
412
412
  let role = message["role"].as_str().unwrap_or("");
@@ -481,15 +481,18 @@ impl QuantizedGGUF {
481
481
  "assistant" => {
482
482
  prompt.push_str(&format!("<|im_start|>assistant\n{}<|im_end|>\n", content));
483
483
  }
484
+ "tool" => {
485
+ prompt.push_str(&format!("<|im_start|>tool\n{}<|im_end|>\n", content));
486
+ }
484
487
  _ => {}
485
488
  }
486
489
  }
487
-
490
+
488
491
  // Add generation prompt
489
492
  prompt.push_str("<|im_start|>assistant\n");
490
493
  Ok(prompt)
491
494
  }
492
-
495
+
493
496
  fn apply_phi_template(&self, messages: &[serde_json::Value]) -> CandleResult<String> {
494
497
  let mut prompt = String::new();
495
498
 
@@ -128,6 +128,9 @@ impl Qwen {
128
128
  "assistant" => {
129
129
  prompt.push_str(&format!("<|im_start|>assistant\n{}<|im_end|>\n", content));
130
130
  }
131
+ "tool" => {
132
+ prompt.push_str(&format!("<|im_start|>tool\n{}<|im_end|>\n", content));
133
+ }
131
134
  _ => {}
132
135
  }
133
136
  }
@@ -0,0 +1,68 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "json"
4
+
5
+ module Candle
6
+ class Agent
7
+ MAX_ITERATIONS = 10
8
+
9
+ attr_reader :llm, :tools, :system_prompt, :max_iterations
10
+
11
+ def initialize(llm, tools:, system_prompt: nil, max_iterations: MAX_ITERATIONS)
12
+ @llm = llm
13
+ @tools = tools
14
+ @system_prompt = system_prompt
15
+ @max_iterations = max_iterations
16
+ end
17
+
18
+ def run(user_message, **options)
19
+ messages = []
20
+ messages << { role: "system", content: @system_prompt } if @system_prompt
21
+ messages << { role: "user", content: user_message }
22
+
23
+ iterations = 0
24
+ loop do
25
+ iterations += 1
26
+ if iterations > @max_iterations
27
+ raise AgentMaxIterationsError,
28
+ "Agent exceeded maximum iterations (#{@max_iterations})"
29
+ end
30
+
31
+ result = @llm.chat_with_tools(messages, tools: @tools, execute: true, **options)
32
+
33
+ if result.has_tool_calls?
34
+ # If the model produced a substantial text answer alongside tool calls,
35
+ # treat it as a final response (model is done, trailing tool calls are noise).
36
+ # Strip <think> blocks so they don't count toward the length check.
37
+ text_without_thinking = result.text_response&.gsub(/<think>.*?<\/think>/m, "")&.strip
38
+ if text_without_thinking && text_without_thinking.length > 50
39
+ return AgentResult.new(
40
+ response: result.text_response,
41
+ messages: messages,
42
+ iterations: iterations,
43
+ tool_calls_made: messages.count { |m| m[:role] == "tool" }
44
+ )
45
+ end
46
+
47
+ messages << { role: "assistant", content: result.raw_response }
48
+
49
+ result.tool_results.each do |tr|
50
+ tool_name = tr[:tool_call]&.name || "unknown"
51
+ tool_output = tr[:error] ? "Error: #{tr[:error]}" : JSON.generate(tr[:result])
52
+ messages << { role: "tool", content: "[#{tool_name}] #{tool_output}" }
53
+ end
54
+ else
55
+ return AgentResult.new(
56
+ response: result.text_response || result.raw_response,
57
+ messages: messages,
58
+ iterations: iterations,
59
+ tool_calls_made: messages.count { |m| m[:role] == "tool" }
60
+ )
61
+ end
62
+ end
63
+ end
64
+ end
65
+
66
+ AgentResult = Struct.new(:response, :messages, :iterations, :tool_calls_made, keyword_init: true)
67
+ AgentMaxIterationsError = Class.new(StandardError)
68
+ end
data/lib/candle/llm.rb CHANGED
@@ -252,7 +252,7 @@ module Candle
252
252
  base_model
253
253
  end
254
254
 
255
- # Simple chat interface for instruction models
255
+ # Chat interface always returns a String
256
256
  def chat(messages, **options)
257
257
  prompt = apply_chat_template(messages)
258
258
  generate(prompt, **options)
@@ -263,7 +263,48 @@ module Candle
263
263
  prompt = apply_chat_template(messages)
264
264
  generate_stream(prompt, **options, &block)
265
265
  end
266
-
266
+
267
+ # Chat with tool calling — always returns a ToolCallResult
268
+ # Set execute: true to automatically run the tools (default: false)
269
+ def chat_with_tools(messages, tools:, execute: false, **options)
270
+ tool_prompt = build_tool_system_prompt(tools)
271
+ augmented = inject_tool_instructions(messages, tool_prompt)
272
+
273
+ raw_response = chat(augmented, **options)
274
+
275
+ result = ToolCallParser.parse(raw_response, available_tools: tools)
276
+
277
+ if result.has_tool_calls? && execute
278
+ tool_results = result.tool_calls.map do |tool_call|
279
+ tool = tools.find { |t| t.name == tool_call.name }
280
+ unless tool
281
+ next { tool_call: tool_call, result: nil, error: "Unknown tool: #{tool_call.name}" }
282
+ end
283
+
284
+ begin
285
+ output = tool.call(tool_call.arguments)
286
+ { tool_call: tool_call, result: output, error: nil }
287
+ rescue Exception => e
288
+ { tool_call: tool_call, result: nil, error: e.message }
289
+ end
290
+ end
291
+
292
+ ToolCallResult.new(
293
+ tool_calls: result.tool_calls,
294
+ tool_results: tool_results,
295
+ text_response: result.text_response,
296
+ raw_response: raw_response
297
+ )
298
+ else
299
+ ToolCallResult.new(
300
+ tool_calls: result.tool_calls,
301
+ tool_results: [],
302
+ text_response: result.has_tool_calls? ? result.text_response : raw_response,
303
+ raw_response: raw_response
304
+ )
305
+ end
306
+ end
307
+
267
308
  # Inspect method for debugging and exploration
268
309
  def inspect
269
310
  opts = options rescue {}
@@ -354,6 +395,27 @@ module Candle
354
395
 
355
396
  private
356
397
 
398
+ def build_tool_system_prompt(tools)
399
+ tool_defs = tools.map { |t| JSON.generate(t.to_tool_definition) }.join("\n\n")
400
+ "You are a helpful assistant with access to the following tools:\n\n" \
401
+ "#{tool_defs}\n\n" \
402
+ "When you need to use a tool, respond with a tool call in the following format:\n" \
403
+ "<tool_call>\n" \
404
+ "{\"name\": \"tool_name\", \"arguments\": {\"arg1\": \"value1\"}}\n" \
405
+ "</tool_call>\n\n" \
406
+ "If you don't need to use a tool, respond normally with text."
407
+ end
408
+
409
+ def inject_tool_instructions(messages, tool_prompt)
410
+ msgs = messages.map { |m| m.dup }
411
+ if msgs.first && msgs.first[:role] == "system"
412
+ msgs.first[:content] = "#{tool_prompt}\n\n#{msgs.first[:content]}"
413
+ else
414
+ msgs.unshift({ role: "system", content: tool_prompt })
415
+ end
416
+ msgs
417
+ end
418
+
357
419
  # Extract JSON content from generated text, handling stop tokens and extra content
358
420
  def extract_json_content(text)
359
421
  # Remove any content after common stop tokens
@@ -0,0 +1,47 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Candle
4
+ class Tool
5
+ attr_reader :name, :description, :parameters
6
+
7
+ def initialize(name:, description:, parameters: {}, &block)
8
+ @name = name
9
+ @description = description
10
+ @parameters = parameters
11
+ @callable = block
12
+ end
13
+
14
+ def call(arguments)
15
+ @callable.call(arguments)
16
+ end
17
+
18
+ def to_tool_definition
19
+ {
20
+ "type" => "function",
21
+ "function" => {
22
+ "name" => @name,
23
+ "description" => @description,
24
+ "parameters" => @parameters
25
+ }
26
+ }
27
+ end
28
+ end
29
+
30
+ ToolCall = Struct.new(:name, :arguments, keyword_init: true)
31
+
32
+ ToolCallResult = Struct.new(
33
+ :tool_calls,
34
+ :tool_results,
35
+ :text_response,
36
+ :raw_response,
37
+ keyword_init: true
38
+ ) do
39
+ def has_tool_calls?
40
+ tool_calls && !tool_calls.empty?
41
+ end
42
+
43
+ def success?
44
+ tool_results.all? { |r| r[:error].nil? }
45
+ end
46
+ end
47
+ end
@@ -0,0 +1,57 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "json"
4
+
5
+ module Candle
6
+ class ToolCallParser
7
+ DEFAULT_PATTERN = /<tool_call>\s*(.*?)\s*<\/tool_call>/m
8
+
9
+ attr_reader :pattern
10
+
11
+ def initialize(pattern: DEFAULT_PATTERN)
12
+ @pattern = pattern
13
+ end
14
+
15
+ ParseResult = Struct.new(:text_response, :tool_calls, keyword_init: true) do
16
+ def has_tool_calls?
17
+ tool_calls && !tool_calls.empty?
18
+ end
19
+ end
20
+
21
+ def parse(text, available_tools: [])
22
+ tool_calls = []
23
+
24
+ text.scan(@pattern) do |match|
25
+ json_str = match[0].strip
26
+ begin
27
+ parsed = JSON.parse(json_str)
28
+ name = parsed["name"]
29
+ arguments = parsed["arguments"] || parsed["parameters"] || {}
30
+
31
+ next unless name
32
+ if available_tools.empty? || available_tools.any? { |t| t.name == name }
33
+ tool_calls << ToolCall.new(name: name, arguments: arguments)
34
+ end
35
+ rescue JSON::ParserError
36
+ # Skip malformed tool calls
37
+ end
38
+ end
39
+
40
+ # Deduplicate identical tool calls (models sometimes repeat the same call)
41
+ tool_calls.uniq! { |tc| [tc.name, tc.arguments] }
42
+
43
+ remaining_text = text.gsub(@pattern, "").strip
44
+ remaining_text = nil if remaining_text.empty?
45
+
46
+ ParseResult.new(
47
+ text_response: remaining_text,
48
+ tool_calls: tool_calls
49
+ )
50
+ end
51
+
52
+ # Convenience class method using the default pattern
53
+ def self.parse(text, available_tools: [])
54
+ new.parse(text, available_tools: available_tools)
55
+ end
56
+ end
57
+ end
@@ -1,5 +1,5 @@
1
1
  # :nocov:
2
2
  module Candle
3
- VERSION = "1.5.0"
3
+ VERSION = "1.6.0"
4
4
  end
5
5
  # :nocov:
data/lib/candle.rb CHANGED
@@ -5,7 +5,10 @@ require_relative "candle/device_utils"
5
5
  require_relative "candle/embedding_model_type"
6
6
  require_relative "candle/embedding_model"
7
7
  require_relative "candle/reranker"
8
+ require_relative "candle/tool"
9
+ require_relative "candle/tool_call_parser"
8
10
  require_relative "candle/llm"
11
+ require_relative "candle/agent"
9
12
  require_relative "candle/tokenizer"
10
13
  require_relative "candle/ner"
11
14
  require_relative "candle/build_info"
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: red-candle
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.5.0
4
+ version: 1.6.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Christopher Petersen
@@ -254,6 +254,7 @@ files:
254
254
  - ext/candle/tests/device_tests.rs
255
255
  - ext/candle/tests/tensor_tests.rs
256
256
  - lib/candle.rb
257
+ - lib/candle/agent.rb
257
258
  - lib/candle/build_info.rb
258
259
  - lib/candle/device_utils.rb
259
260
  - lib/candle/embedding_model.rb
@@ -264,6 +265,8 @@ files:
264
265
  - lib/candle/reranker.rb
265
266
  - lib/candle/tensor.rb
266
267
  - lib/candle/tokenizer.rb
268
+ - lib/candle/tool.rb
269
+ - lib/candle/tool_call_parser.rb
267
270
  - lib/candle/version.rb
268
271
  - lib/red-candle.rb
269
272
  homepage: https://github.com/scientist-labs/red-candle