RubyGems - red-candle - Versions diffs - 1.5.0 → 1.6.0 - Mend

red-candle 1.5.0 → 1.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (12) hide show

checksums.yaml +4 -4
data/README.md +79 -0
data/ext/candle/src/llm/llama.rs +3 -3
data/ext/candle/src/llm/quantized_gguf.rs +8 -5
data/ext/candle/src/llm/qwen.rs +3 -0
data/lib/candle/agent.rb +68 -0
data/lib/candle/llm.rb +64 -2
data/lib/candle/tool.rb +47 -0
data/lib/candle/tool_call_parser.rb +57 -0
data/lib/candle/version.rb +1 -1
data/lib/candle.rb +3 -0
metadata +4 -1

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: a15236dd78a04cfcc6d7978ca088934987b84a599371c1508383d147172abcc3
-  data.tar.gz: f2661ad1220f566dd8a7d9eab1d74bd30007ed022c9381daf2697d12d88fac96
+  metadata.gz: 0e67b3106c236e7f579cf3bdfdf56f0b5ae99e8b33bb9bf35cfe361c19bb0514
+  data.tar.gz: 573ae31fda25ebd4f22e7338a6a94fd48ce9d869959957acbc8d77e1ed83052c
 SHA512:
-  metadata.gz: d74f26e2c0527f25424245442b5b74cf1575505a42a2c1c823be31606433a6c9e4736afc6d8062bd54642cbc8373deaae197bfc7163cbaa392d457a844e67628
-  data.tar.gz: 1ffc6d400ba4ffe9c23603fb42adae088142446f1d0bc03d0365a2bfcde3c1d1beab51eec01481d1a6e89f686c235e2ec58f0703daed73fabcf5cfd7f7e22d95
+  metadata.gz: 382e0ab840efe730184a7baa9f7438f43b10898d94fbea2648d92fbcf30a39ae9dcf48626dbf5a63e64263c2300d88555815c95820c4a6060eaa7b5f370deb9d
+  data.tar.gz: 00b15a30260ce226ef97c4e47a193ed557147377835472df552fde8de5f60c319ac4b8d7a1ef4f6a0408232c46776503b6f4b9a30bb46209df750ce3f47dd7e7

data/README.md CHANGED Viewed

@@ -273,6 +273,85 @@ See [STRUCTURED_GENERATION.md](docs/STRUCTURED_GENERATION.md) for detailed docum
 **Note on Reliability**: Structured generation constrains the model's output tokens, but success rates vary by model size and schema complexity. Smaller models (< 7B parameters) may occasionally produce incomplete or invalid JSON, especially with complex schemas. Consider implementing retry logic or fallback strategies in production applications. Larger models generally perform much better with structured generation.
+## Tool Calling
+Red-candle supports tool/function calling, enabling models to invoke external functions during generation. This works best with models fine-tuned for tool calling, such as Qwen3.
+### Defining Tools
+```ruby
+get_weather = Candle::Tool.new(
+  name: "get_weather",
+  description: "Get the current weather for a city",
+  parameters: {
+    type: "object",
+    properties: { city: { type: "string", description: "City name" } },
+    required: ["city"]
+  }
+) { |args| { city: args["city"], temperature: 72, condition: "sunny" } }
+```
+### Extracting Tool Calls
+`chat_with_tools` injects tool definitions into the system prompt, generates a response, and parses any `<tool_call>` tags from the output. It does **not** feed results back to the model — it just tells you what the model wants to call. You decide what to do with it:
+```ruby
+llm = Candle::LLM.from_pretrained("Qwen/Qwen3-0.6B")
+messages = [{ role: "user", content: "What's the weather in San Francisco?" }]
+result = llm.chat_with_tools(messages, tools: [get_weather],
+  config: Candle::GenerationConfig.deterministic(max_length: 500))
+if result.has_tool_calls?
+  result.tool_calls.each do |tc|
+    puts "#{tc.name}(#{tc.arguments})"
+    output = get_weather.call(tc.arguments)
+    puts "=> #{output}"
+  end
+else
+  puts result.text_response
+end
+```
+Pass `execute: true` to automatically run the tools (but still no round-trip back to the model):
+```ruby
+result = llm.chat_with_tools(messages, tools: [get_weather], execute: true,
+  config: Candle::GenerationConfig.deterministic(max_length: 500))
+result.tool_results.each do |tr|
+  puts "#{tr[:tool_call].name} => #{tr[:result]}"
+end
+```
+### Agent (Multi-Turn Tool Loop)
+`Candle::Agent` completes the round-trip: generate → parse tool calls → execute → feed results back to the model → repeat until the model produces a final text answer or hits `max_iterations`. This is a convenience wrapper for quick prototyping — for production use, frameworks like [RubyLLM](https://github.com/crmne/ruby_llm) manage this loop for you via the [ruby_llm-red_candle](https://github.com/scientist-labs/ruby_llm-red_candle) plugin:
+```ruby
+agent = Candle::Agent.new(llm, tools: [get_weather, lookup_price], max_iterations: 5)
+result = agent.run("What's the weather in Paris, and how much does a widget cost?",
+  config: Candle::GenerationConfig.deterministic(max_length: 1000))
+puts result.response         # Final text answer from the model
+puts result.iterations       # Number of generate cycles
+puts result.tool_calls_made  # Number of tools invoked
+```
+### Model Recommendations
+Tool calling quality depends heavily on model size:
+| Model | Tool Calling Quality |
+|-------|---------------------|
+| **Qwen3-8B GGUF** (~5 GB) | Calls correct tools, self-corrects errors, but may hallucinate values from tool results |
+| **Qwen3-4B GGUF** (~2.5 GB) | Calls correct tools, occasional reasoning errors |
+| **Qwen3-0.6B** (~1.2 GB) | Single-turn works, needs `max_length: 500+` for thinking |
+| SmolLM2-360M | Does not work |
+| TinyLlama-1.1B | Does not work (not fine-tuned for tool calling) |
+**Tip:** Qwen3 models use a `<think>` reasoning block before producing tool calls. Set `max_length` high enough (500+ for 0.6B, 1000+ for larger models) to allow room for both thinking and the tool call.
 ## ⚠️ Model Format Requirements
 ### EmbeddingModels and Rerankers: Safetensors Only

data/ext/candle/src/llm/llama.rs CHANGED Viewed

@@ -319,9 +319,9 @@ impl Llama {
                     if i == 1 || (i == 0 && system_message.is_empty()) {
                         // First user message
                         if !system_message.is_empty() {
-                            prompt.push_str(&format!("<s>[INST] <<SYS>>\n{}\n<</SYS>>\n\n{} [/INST]", system_message, content));
+                            prompt.push_str(&format!("[INST] <<SYS>>\n{}\n<</SYS>>\n\n{} [/INST]", system_message, content));
                         } else {
-                            prompt.push_str(&format!("<s>[INST] {} [/INST]", content));
+                            prompt.push_str(&format!("[INST] {} [/INST]", content));
                         }
                     } else {
                         prompt.push_str(&format!(" [INST] {} [/INST]", content));
@@ -340,7 +340,7 @@ impl Llama {
     fn apply_llama3_template(&self, messages: &[serde_json::Value]) -> CandleResult<String> {
         let mut prompt = String::new();
-        prompt.push_str("<|begin_of_text|>");
+        // BOS token is added by the tokenizer's encode(prompt, add_special_tokens=true)
         for message in messages {
             let role = message["role"].as_str().unwrap_or("");

data/ext/candle/src/llm/quantized_gguf.rs CHANGED Viewed

@@ -386,9 +386,9 @@ impl QuantizedGGUF {
                 "user" => {
                     if i == 1 || (i == 0 && system_message.is_empty()) {
                         if !system_message.is_empty() {
-                            prompt.push_str(&format!("<s>[INST] <<SYS>>\n{}\n<</SYS>>\n\n{} [/INST]", system_message, content));
+                            prompt.push_str(&format!("[INST] <<SYS>>\n{}\n<</SYS>>\n\n{} [/INST]", system_message, content));
                         } else {
-                            prompt.push_str(&format!("<s>[INST] {} [/INST]", content));
+                            prompt.push_str(&format!("[INST] {} [/INST]", content));
                         }
                     } else {
                         prompt.push_str(&format!(" [INST] {} [/INST]", content));
@@ -406,7 +406,7 @@ impl QuantizedGGUF {
     fn apply_llama3_template(&self, messages: &[serde_json::Value]) -> CandleResult<String> {
         let mut prompt = String::new();
-        prompt.push_str("<|begin_of_text|>");
+        // BOS token is added by the tokenizer's encode(prompt, add_special_tokens=true)
         for message in messages {
             let role = message["role"].as_str().unwrap_or("");
@@ -481,15 +481,18 @@ impl QuantizedGGUF {
                 "assistant" => {
                     prompt.push_str(&format!("<|im_start|>assistant\n{}<|im_end|>\n", content));
                 }
+                "tool" => {
+                    prompt.push_str(&format!("<|im_start|>tool\n{}<|im_end|>\n", content));
+                }
                 _ => {}
             }
         }
         // Add generation prompt
         prompt.push_str("<|im_start|>assistant\n");
         Ok(prompt)
     }
     fn apply_phi_template(&self, messages: &[serde_json::Value]) -> CandleResult<String> {
         let mut prompt = String::new();

data/ext/candle/src/llm/qwen.rs CHANGED Viewed

@@ -128,6 +128,9 @@ impl Qwen {
                 "assistant" => {
                     prompt.push_str(&format!("<|im_start|>assistant\n{}<|im_end|>\n", content));
                 }
+                "tool" => {
+                    prompt.push_str(&format!("<|im_start|>tool\n{}<|im_end|>\n", content));
+                }
                 _ => {}
             }
         }

data/lib/candle/agent.rb ADDED Viewed

@@ -0,0 +1,68 @@
+# frozen_string_literal: true
+require "json"
+module Candle
+  class Agent
+    MAX_ITERATIONS = 10
+    attr_reader :llm, :tools, :system_prompt, :max_iterations
+    def initialize(llm, tools:, system_prompt: nil, max_iterations: MAX_ITERATIONS)
+      @llm = llm
+      @tools = tools
+      @system_prompt = system_prompt
+      @max_iterations = max_iterations
+    end
+    def run(user_message, **options)
+      messages = []
+      messages << { role: "system", content: @system_prompt } if @system_prompt
+      messages << { role: "user", content: user_message }
+      iterations = 0
+      loop do
+        iterations += 1
+        if iterations > @max_iterations
+          raise AgentMaxIterationsError,
+            "Agent exceeded maximum iterations (#{@max_iterations})"
+        end
+        result = @llm.chat_with_tools(messages, tools: @tools, execute: true, **options)
+        if result.has_tool_calls?
+          # If the model produced a substantial text answer alongside tool calls,
+          # treat it as a final response (model is done, trailing tool calls are noise).
+          # Strip <think> blocks so they don't count toward the length check.
+          text_without_thinking = result.text_response&.gsub(/<think>.*?<\/think>/m, "")&.strip
+          if text_without_thinking && text_without_thinking.length > 50
+            return AgentResult.new(
+              response: result.text_response,
+              messages: messages,
+              iterations: iterations,
+              tool_calls_made: messages.count { |m| m[:role] == "tool" }
+            )
+          end
+          messages << { role: "assistant", content: result.raw_response }
+          result.tool_results.each do |tr|
+            tool_name = tr[:tool_call]&.name || "unknown"
+            tool_output = tr[:error] ? "Error: #{tr[:error]}" : JSON.generate(tr[:result])
+            messages << { role: "tool", content: "[#{tool_name}] #{tool_output}" }
+          end
+        else
+          return AgentResult.new(
+            response: result.text_response || result.raw_response,
+            messages: messages,
+            iterations: iterations,
+            tool_calls_made: messages.count { |m| m[:role] == "tool" }
+          )
+        end
+      end
+    end
+  end
+  AgentResult = Struct.new(:response, :messages, :iterations, :tool_calls_made, keyword_init: true)
+  AgentMaxIterationsError = Class.new(StandardError)
+end

data/lib/candle/llm.rb CHANGED Viewed

@@ -252,7 +252,7 @@ module Candle
       base_model
     end
-    # Simple chat interface for instruction models
+    # Chat interface — always returns a String
     def chat(messages, **options)
       prompt = apply_chat_template(messages)
       generate(prompt, **options)
@@ -263,7 +263,48 @@ module Candle
       prompt = apply_chat_template(messages)
       generate_stream(prompt, **options, &block)
     end
+    # Chat with tool calling — always returns a ToolCallResult
+    # Set execute: true to automatically run the tools (default: false)
+    def chat_with_tools(messages, tools:, execute: false, **options)
+      tool_prompt = build_tool_system_prompt(tools)
+      augmented = inject_tool_instructions(messages, tool_prompt)
+      raw_response = chat(augmented, **options)
+      result = ToolCallParser.parse(raw_response, available_tools: tools)
+      if result.has_tool_calls? && execute
+        tool_results = result.tool_calls.map do |tool_call|
+          tool = tools.find { |t| t.name == tool_call.name }
+          unless tool
+            next { tool_call: tool_call, result: nil, error: "Unknown tool: #{tool_call.name}" }
+          end
+          begin
+            output = tool.call(tool_call.arguments)
+            { tool_call: tool_call, result: output, error: nil }
+          rescue Exception => e
+            { tool_call: tool_call, result: nil, error: e.message }
+          end
+        end
+        ToolCallResult.new(
+          tool_calls: result.tool_calls,
+          tool_results: tool_results,
+          text_response: result.text_response,
+          raw_response: raw_response
+        )
+      else
+        ToolCallResult.new(
+          tool_calls: result.tool_calls,
+          tool_results: [],
+          text_response: result.has_tool_calls? ? result.text_response : raw_response,
+          raw_response: raw_response
+        )
+      end
+    end
     # Inspect method for debugging and exploration
     def inspect
       opts = options rescue {}
@@ -354,6 +395,27 @@ module Candle
     private
+    def build_tool_system_prompt(tools)
+      tool_defs = tools.map { |t| JSON.generate(t.to_tool_definition) }.join("\n\n")
+      "You are a helpful assistant with access to the following tools:\n\n" \
+        "#{tool_defs}\n\n" \
+        "When you need to use a tool, respond with a tool call in the following format:\n" \
+        "<tool_call>\n" \
+        "{\"name\": \"tool_name\", \"arguments\": {\"arg1\": \"value1\"}}\n" \
+        "</tool_call>\n\n" \
+        "If you don't need to use a tool, respond normally with text."
+    end
+    def inject_tool_instructions(messages, tool_prompt)
+      msgs = messages.map { |m| m.dup }
+      if msgs.first && msgs.first[:role] == "system"
+        msgs.first[:content] = "#{tool_prompt}\n\n#{msgs.first[:content]}"
+      else
+        msgs.unshift({ role: "system", content: tool_prompt })
+      end
+      msgs
+    end
     # Extract JSON content from generated text, handling stop tokens and extra content
     def extract_json_content(text)
       # Remove any content after common stop tokens

data/lib/candle/tool.rb ADDED Viewed

@@ -0,0 +1,47 @@
+# frozen_string_literal: true
+module Candle
+  class Tool
+    attr_reader :name, :description, :parameters
+    def initialize(name:, description:, parameters: {}, &block)
+      @name = name
+      @description = description
+      @parameters = parameters
+      @callable = block
+    end
+    def call(arguments)
+      @callable.call(arguments)
+    end
+    def to_tool_definition
+      {
+        "type" => "function",
+        "function" => {
+          "name" => @name,
+          "description" => @description,
+          "parameters" => @parameters
+        }
+      }
+    end
+  end
+  ToolCall = Struct.new(:name, :arguments, keyword_init: true)
+  ToolCallResult = Struct.new(
+    :tool_calls,
+    :tool_results,
+    :text_response,
+    :raw_response,
+    keyword_init: true
+  ) do
+    def has_tool_calls?
+      tool_calls && !tool_calls.empty?
+    end
+    def success?
+      tool_results.all? { |r| r[:error].nil? }
+    end
+  end
+end

data/lib/candle/tool_call_parser.rb ADDED Viewed

@@ -0,0 +1,57 @@
+# frozen_string_literal: true
+require "json"
+module Candle
+  class ToolCallParser
+    DEFAULT_PATTERN = /<tool_call>\s*(.*?)\s*<\/tool_call>/m
+    attr_reader :pattern
+    def initialize(pattern: DEFAULT_PATTERN)
+      @pattern = pattern
+    end
+    ParseResult = Struct.new(:text_response, :tool_calls, keyword_init: true) do
+      def has_tool_calls?
+        tool_calls && !tool_calls.empty?
+      end
+    end
+    def parse(text, available_tools: [])
+      tool_calls = []
+      text.scan(@pattern) do |match|
+        json_str = match[0].strip
+        begin
+          parsed = JSON.parse(json_str)
+          name = parsed["name"]
+          arguments = parsed["arguments"] || parsed["parameters"] || {}
+          next unless name
+          if available_tools.empty? || available_tools.any? { |t| t.name == name }
+            tool_calls << ToolCall.new(name: name, arguments: arguments)
+          end
+        rescue JSON::ParserError
+          # Skip malformed tool calls
+        end
+      end
+      # Deduplicate identical tool calls (models sometimes repeat the same call)
+      tool_calls.uniq! { |tc| [tc.name, tc.arguments] }
+      remaining_text = text.gsub(@pattern, "").strip
+      remaining_text = nil if remaining_text.empty?
+      ParseResult.new(
+        text_response: remaining_text,
+        tool_calls: tool_calls
+      )
+    end
+    # Convenience class method using the default pattern
+    def self.parse(text, available_tools: [])
+      new.parse(text, available_tools: available_tools)
+    end
+  end
+end

data/lib/candle/version.rb CHANGED Viewed

@@ -1,5 +1,5 @@
 # :nocov:
 module Candle
-  VERSION = "1.5.0"
+  VERSION = "1.6.0"
 end
 # :nocov:

data/lib/candle.rb CHANGED Viewed

@@ -5,7 +5,10 @@ require_relative "candle/device_utils"
 require_relative "candle/embedding_model_type"
 require_relative "candle/embedding_model"
 require_relative "candle/reranker"
+require_relative "candle/tool"
+require_relative "candle/tool_call_parser"
 require_relative "candle/llm"
+require_relative "candle/agent"
 require_relative "candle/tokenizer"
 require_relative "candle/ner"
 require_relative "candle/build_info"

metadata CHANGED Viewed

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: red-candle
 version: !ruby/object:Gem::Version
-  version: 1.5.0
+  version: 1.6.0
 platform: ruby
 authors:
 - Christopher Petersen
@@ -254,6 +254,7 @@ files:
 - ext/candle/tests/device_tests.rs
 - ext/candle/tests/tensor_tests.rs
 - lib/candle.rb
+- lib/candle/agent.rb
 - lib/candle/build_info.rb
 - lib/candle/device_utils.rb
 - lib/candle/embedding_model.rb
@@ -264,6 +265,8 @@ files:
 - lib/candle/reranker.rb
 - lib/candle/tensor.rb
 - lib/candle/tokenizer.rb
+- lib/candle/tool.rb
+- lib/candle/tool_call_parser.rb
 - lib/candle/version.rb
 - lib/red-candle.rb
 homepage: https://github.com/scientist-labs/red-candle