ruby_llm-red_candle 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: c6a9af49d55182783c1cebd1e02748f3a9e0716a8e35bb87e250c38075731784
4
+ data.tar.gz: 32242fd560276ce4889ea640ffebc11c81f0e4f1afef83e7edfe5204cdb813ce
5
+ SHA512:
6
+ metadata.gz: 6c28f204a8faedfda2c578ec9a05227a36c69047e7bd31555e9603fae6ad5c86b9b4680ba2119af70e49d80f6705d9d1c5bd7b9290e6acda7042beb9102bf714
7
+ data.tar.gz: 40411b9e81a77b97a0dcb1a5dd01e9b5ef5d7b6d0a3cf3c211a72b8879f1985a9b129cdf384bc72077d48c0b9bef66ed6a83730ee3803adeec72b21dbe33db83
data/.rspec ADDED
@@ -0,0 +1,3 @@
1
+ --format documentation
2
+ --color
3
+ --require spec_helper
data/.rubocop.yml ADDED
@@ -0,0 +1,39 @@
1
+ AllCops:
2
+ TargetRubyVersion: 3.1
3
+ NewCops: enable
4
+ SuggestExtensions: false
5
+ Exclude:
6
+ - "vendor/**/*"
7
+ - "tmp/**/*"
8
+
9
+ Style/StringLiterals:
10
+ EnforcedStyle: double_quotes
11
+
12
+ Style/StringLiteralsInInterpolation:
13
+ EnforcedStyle: double_quotes
14
+
15
+ Layout/LineLength:
16
+ Max: 120
17
+
18
+ Metrics/BlockLength:
19
+ Exclude:
20
+ - "spec/**/*"
21
+ - "*.gemspec"
22
+
23
+ Metrics/MethodLength:
24
+ Max: 25
25
+
26
+ Metrics/AbcSize:
27
+ Max: 30
28
+
29
+ Metrics/ClassLength:
30
+ Max: 200
31
+
32
+ RSpec/MultipleExpectations:
33
+ Enabled: false
34
+
35
+ RSpec/ExampleLength:
36
+ Enabled: false
37
+
38
+ RSpec/NestedGroups:
39
+ Max: 4
data/CHANGELOG.md ADDED
@@ -0,0 +1,26 @@
1
+ # Changelog
2
+
3
+ All notable changes to this project will be documented in this file.
4
+
5
+ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
6
+ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
+
8
+ ## [Unreleased]
9
+
10
+ ## [0.1.0] - 2025-12-04
11
+
12
+ ### Added
13
+
14
+ - Initial release
15
+ - Red Candle provider for RubyLLM enabling local LLM execution
16
+ - Support for quantized GGUF models from HuggingFace
17
+ - Streaming token generation
18
+ - Structured output with JSON schemas
19
+ - Automatic model registration with RubyLLM
20
+ - Device selection (CPU, Metal, CUDA)
21
+ - Supported models:
22
+ - google/gemma-3-4b-it-qat-q4_0-gguf
23
+ - TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF
24
+ - TheBloke/Mistral-7B-Instruct-v0.2-GGUF
25
+ - Qwen/Qwen2.5-1.5B-Instruct-GGUF
26
+ - microsoft/Phi-3-mini-4k-instruct
data/LICENSE.txt ADDED
@@ -0,0 +1,21 @@
1
+ The MIT License (MIT)
2
+
3
+ Copyright (c) 2025 Chris Petersen
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in
13
+ all copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
21
+ THE SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,378 @@
1
+ # RubyLLM::RedCandle
2
+
3
+ A [RubyLLM](https://github.com/crmne/ruby_llm) plugin that enables local LLM execution using quantized GGUF models through the [Red Candle](https://github.com/scientist-labs/red-candle) gem.
4
+
5
+ ## What Makes This Different
6
+
7
+ While all other RubyLLM providers communicate via HTTP APIs, Red Candle runs models locally using the Candle Rust crate. This brings true local inference to Ruby with:
8
+
9
+ - **No network latency** - Models run directly on your machine
10
+ - **No API costs** - Free execution once models are downloaded
11
+ - **Privacy** - Your data never leaves your machine
12
+ - **Structured output** - Generate JSON conforming to schemas using grammar-constrained generation
13
+ - **Streaming** - Token-by-token output for responsive UIs
14
+ - **Hardware acceleration** - Metal (macOS), CUDA (NVIDIA), or CPU
15
+
16
+ ## Installation
17
+
18
+ Add this line to your application's Gemfile:
19
+
20
+ ```ruby
21
+ gem 'ruby_llm-red_candle'
22
+ ```
23
+
24
+ And then execute:
25
+
26
+ ```bash
27
+ $ bundle install
28
+ ```
29
+
30
+ **Note:** The `red-candle` gem requires a Rust toolchain to compile native extensions. See the [Red Candle installation guide](https://github.com/scientist-labs/red-candle#installation) for details.
31
+
32
+ ## Quick Start
33
+
34
+ ```ruby
35
+ require 'ruby_llm'
36
+ require 'ruby_llm-red_candle'
37
+
38
+ # Create a chat with a local model
39
+ chat = RubyLLM.chat(provider: :red_candle, model: 'TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF')
40
+
41
+ # Ask a question
42
+ response = chat.ask("What is the capital of France?")
43
+ puts response.content
44
+ ```
45
+
46
+ ## Usage
47
+
48
+ ### Basic Chat (Non-Streaming)
49
+
50
+ The simplest way to use Red Candle is with synchronous chat:
51
+
52
+ ```ruby
53
+ chat = RubyLLM.chat(
54
+ provider: :red_candle,
55
+ model: 'TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF'
56
+ )
57
+
58
+ response = chat.ask("What are the benefits of functional programming?")
59
+ puts response.content
60
+
61
+ # Access token estimates
62
+ puts "Input tokens: #{response.input_tokens}"
63
+ puts "Output tokens: #{response.output_tokens}"
64
+ ```
65
+
66
+ ### Multi-Turn Conversations
67
+
68
+ RubyLLM maintains conversation history automatically:
69
+
70
+ ```ruby
71
+ chat = RubyLLM.chat(
72
+ provider: :red_candle,
73
+ model: 'Qwen/Qwen2.5-1.5B-Instruct-GGUF'
74
+ )
75
+
76
+ chat.ask("My name is Alice.")
77
+ chat.ask("I'm a software engineer who loves Ruby.")
78
+ response = chat.ask("What do you know about me?")
79
+ # The model remembers previous messages
80
+ puts response.content
81
+ ```
82
+
83
+ ### Streaming Output
84
+
85
+ For responsive UIs, stream tokens as they're generated:
86
+
87
+ ```ruby
88
+ chat = RubyLLM.chat(
89
+ provider: :red_candle,
90
+ model: 'TheBloke/Mistral-7B-Instruct-v0.2-GGUF'
91
+ )
92
+
93
+ chat.ask("Explain recursion step by step") do |chunk|
94
+ print chunk.content # Print each token as it arrives
95
+ $stdout.flush
96
+ end
97
+ puts # Final newline
98
+ ```
99
+
100
+ The block receives `RubyLLM::Chunk` objects with each generated token.
101
+
102
+ ### Structured Output (JSON Schema)
103
+
104
+ Generate JSON output that conforms to a schema using grammar-constrained generation:
105
+
106
+ ```ruby
107
+ chat = RubyLLM.chat(
108
+ provider: :red_candle,
109
+ model: 'TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF'
110
+ )
111
+
112
+ schema = {
113
+ type: 'object',
114
+ properties: {
115
+ name: { type: 'string' },
116
+ age: { type: 'integer' },
117
+ occupation: { type: 'string' }
118
+ },
119
+ required: ['name', 'age', 'occupation']
120
+ }
121
+
122
+ response = chat.with_schema(schema).ask("Generate a profile for a 30-year-old software engineer named Alice")
123
+
124
+ # response.content is automatically parsed as a Hash
125
+ puts response.content
126
+ # => {"name"=>"Alice", "age"=>30, "occupation"=>"Software Engineer"}
127
+
128
+ puts "Name: #{response.content['name']}"
129
+ puts "Age: #{response.content['age']}"
130
+ ```
131
+
132
+ **How it works:** Red Candle uses the Rust `outlines-core` crate to constrain token generation to only produce valid JSON matching your schema. This ensures 100% valid output structure.
133
+
134
+ ### Structured Output with Enums
135
+
136
+ Constrain values to specific options:
137
+
138
+ ```ruby
139
+ schema = {
140
+ type: 'object',
141
+ properties: {
142
+ sentiment: {
143
+ type: 'string',
144
+ enum: ['positive', 'negative', 'neutral']
145
+ },
146
+ confidence: { type: 'number' }
147
+ },
148
+ required: ['sentiment', 'confidence']
149
+ }
150
+
151
+ response = chat.with_schema(schema).ask("Analyze the sentiment: 'I love this product!'")
152
+ puts response.content['sentiment'] # => "positive"
153
+ ```
154
+
155
+ ### Using with ruby_llm-schema
156
+
157
+ For complex schemas, use [ruby_llm-schema](https://github.com/danielfriis/ruby_llm-schema):
158
+
159
+ ```ruby
160
+ require 'ruby_llm/schema'
161
+
162
+ class PersonProfile
163
+ include RubyLLM::Schema
164
+
165
+ schema do
166
+ string :name, description: "Person's full name"
167
+ integer :age, description: "Age in years"
168
+ string :occupation
169
+ array :skills, items: { type: 'string' }
170
+ end
171
+ end
172
+
173
+ chat = RubyLLM.chat(provider: :red_candle, model: 'TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF')
174
+ response = chat.with_schema(PersonProfile).ask("Generate a Ruby developer profile")
175
+ ```
176
+
177
+ ### Temperature Control
178
+
179
+ Adjust creativity vs determinism:
180
+
181
+ ```ruby
182
+ # More creative/varied responses (higher temperature)
183
+ chat = RubyLLM.chat(
184
+ provider: :red_candle,
185
+ model: 'TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF'
186
+ )
187
+ chat.with_temperature(1.2).ask("Write a creative story opening")
188
+
189
+ # More focused/deterministic responses (lower temperature)
190
+ chat.with_temperature(0.3).ask("What is 15 + 27?")
191
+ ```
192
+
193
+ Temperature range: 0.0 (deterministic) to 2.0 (very creative). Default is 0.7 for regular generation, 0.3 for structured output.
194
+
195
+ ### System Prompts
196
+
197
+ Set context for the conversation:
198
+
199
+ ```ruby
200
+ chat = RubyLLM.chat(
201
+ provider: :red_candle,
202
+ model: 'Qwen/Qwen2.5-1.5B-Instruct-GGUF'
203
+ )
204
+
205
+ chat.with_instructions("You are a helpful coding assistant specializing in Ruby. Always provide code examples.")
206
+
207
+ response = chat.ask("How do I read a file in Ruby?")
208
+ ```
209
+
210
+ ## Supported Models
211
+
212
+ | Model | Context Window | Size | Best For |
213
+ |-------|---------------|------|----------|
214
+ | `TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF` | 2,048 | ~600MB | Testing, quick prototypes |
215
+ | `Qwen/Qwen2.5-1.5B-Instruct-GGUF` | 32,768 | ~900MB | General chat, long context |
216
+ | `google/gemma-3-4b-it-qat-q4_0-gguf` | 8,192 | ~2.5GB | Balanced quality/speed |
217
+ | `microsoft/Phi-3-mini-4k-instruct` | 4,096 | ~2GB | Reasoning tasks |
218
+ | `TheBloke/Mistral-7B-Instruct-v0.2-GGUF` | 32,768 | ~4GB | High quality responses |
219
+
220
+ Models are automatically downloaded from HuggingFace on first use.
221
+
222
+ ### Listing Available Models
223
+
224
+ ```ruby
225
+ # Get all Red Candle models
226
+ models = RubyLLM.models.all.select { |m| m.provider == 'red_candle' }
227
+ models.each do |m|
228
+ puts "#{m.id} - #{m.context_window} tokens"
229
+ end
230
+ ```
231
+
232
+ ## Configuration
233
+
234
+ ### Device Selection
235
+
236
+ By default, Red Candle uses the best available device:
237
+ - **Metal** on macOS (Apple Silicon)
238
+ - **CUDA** if NVIDIA GPU is available
239
+ - **CPU** as fallback
240
+
241
+ Override with:
242
+
243
+ ```ruby
244
+ RubyLLM.configure do |config|
245
+ config.red_candle_device = 'cpu' # Force CPU
246
+ config.red_candle_device = 'metal' # Force Metal (macOS)
247
+ config.red_candle_device = 'cuda' # Force CUDA (NVIDIA)
248
+ end
249
+ ```
250
+
251
+ ### HuggingFace Authentication
252
+
253
+ Some models require HuggingFace authentication (especially gated models like Mistral):
254
+
255
+ ```bash
256
+ # Install the HuggingFace CLI
257
+ pip install huggingface_hub
258
+
259
+ # Login (creates ~/.huggingface/token)
260
+ huggingface-cli login
261
+ ```
262
+
263
+ See the [Red Candle HuggingFace guide](https://github.com/scientist-labs/red-candle/blob/main/docs/HUGGINGFACE.md) for details.
264
+
265
+ ### Custom JSON Instruction Template
266
+
267
+ By default, structured generation appends instructions to guide the model to output JSON. You can customize this template for different models or use cases:
268
+
269
+ ```ruby
270
+ # View the default template
271
+ RubyLLM::RedCandle::Configuration.json_instruction_template
272
+ # => "\n\nRespond with ONLY a valid JSON object containing: {schema_description}"
273
+
274
+ # Set a custom template (use {schema_description} as placeholder)
275
+ RubyLLM::RedCandle::Configuration.json_instruction_template = <<~TEMPLATE
276
+
277
+ You must respond with valid JSON matching this structure: {schema_description}
278
+ Do not include any other text, only the JSON object.
279
+ TEMPLATE
280
+
281
+ # Reset to default
282
+ RubyLLM::RedCandle::Configuration.reset!
283
+ ```
284
+
285
+ Different models may respond better to different phrasings. Experiment with templates if you're getting inconsistent structured output.
286
+
287
+ ### Debug Logging
288
+
289
+ Enable debug logging to troubleshoot issues:
290
+
291
+ ```ruby
292
+ RubyLLM.logger.level = Logger::DEBUG
293
+ ```
294
+
295
+ This shows:
296
+ - Schema normalization details
297
+ - Prompt construction
298
+ - Generation parameters
299
+ - Raw model outputs
300
+
301
+ ## Error Handling
302
+
303
+ ```ruby
304
+ begin
305
+ chat = RubyLLM.chat(provider: :red_candle, model: 'TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF')
306
+ response = chat.ask("Hello!")
307
+ rescue RubyLLM::Error => e
308
+ puts "Error: #{e.message}"
309
+ end
310
+ ```
311
+
312
+ Common errors:
313
+ - **Model not found** - Check model ID spelling
314
+ - **Failed to load tokenizer** - Model may require HuggingFace login
315
+ - **Context length exceeded** - Reduce conversation length or use model with larger context window
316
+ - **Invalid schema** - Schema must be `type: 'object'` with `properties` defined
317
+ - **Structured generation failed** - Schema may be too complex; try simplifying
318
+
319
+ ### Schema Validation
320
+
321
+ Schemas are validated before generation. Invalid schemas produce helpful error messages:
322
+
323
+ ```ruby
324
+ # This will fail with a descriptive error
325
+ invalid_schema = { type: "array" } # Must be 'object' with properties
326
+ chat.with_schema(invalid_schema).ask("...")
327
+ # => RubyLLM::Error: Invalid schema for structured generation:
328
+ # - Schema type must be 'object' for structured generation, got 'array'
329
+ # - Schema must have a 'properties' field...
330
+ ```
331
+
332
+ Valid schemas must have:
333
+ - `type: 'object'`
334
+ - `properties` hash with at least one property
335
+ - Each property must have a `type` field
336
+
337
+ ## Limitations
338
+
339
+ - **No tool/function calling** - Red Candle models don't support tool use
340
+ - **No vision** - Text-only input supported
341
+ - **No embeddings** - Chat models only (embedding support planned)
342
+ - **No audio** - Text-only modality
343
+
344
+ ## Performance Tips
345
+
346
+ 1. **Choose the right model size** - TinyLlama (1.1B) is fast but less capable; Mistral (7B) is slower but higher quality
347
+ 2. **Use streaming for long responses** - Better UX than waiting for full generation
348
+ 3. **Lower temperature for structured output** - More deterministic JSON generation
349
+ 4. **Reuse chat instances** - Model loading is expensive; reuse loaded models
350
+
351
+ ## Development
352
+
353
+ After checking out the repo, run `bin/setup` to install dependencies.
354
+
355
+ ### Running Tests
356
+
357
+ ```bash
358
+ # Fast tests with mocked responses (default)
359
+ bundle exec rspec
360
+
361
+ # Real inference tests (slow, downloads models)
362
+ RED_CANDLE_REAL_INFERENCE=true bundle exec rspec
363
+ ```
364
+
365
+ ## Contributing
366
+
367
+ Bug reports and pull requests are welcome on GitHub at https://github.com/scientist-labs/ruby_llm-red_candle.
368
+
369
+ ## License
370
+
371
+ The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).
372
+
373
+ ## Related Projects
374
+
375
+ - [RubyLLM](https://github.com/crmne/ruby_llm) - The unified Ruby LLM interface
376
+ - [Red Candle](https://github.com/scientist-labs/red-candle) - Ruby bindings for the Candle ML framework
377
+ - [ruby_llm-mcp](https://github.com/patvice/ruby_llm-mcp) - MCP protocol support for RubyLLM
378
+ - [ruby_llm-schema](https://github.com/danielfriis/ruby_llm-schema) - JSON Schema DSL for RubyLLM
data/Rakefile ADDED
@@ -0,0 +1,10 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "bundler/gem_tasks"
4
+ require "rspec/core/rake_task"
5
+ require "rubocop/rake_task"
6
+
7
+ RSpec::Core::RakeTask.new(:spec)
8
+ RuboCop::RakeTask.new
9
+
10
+ task default: %i[rubocop spec]