RubyGems - rllama - Versions diffs - 1.0.2 → 1.1.0 - Mend

rllama 1.0.2 → 1.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (9) hide show

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: fb8e1ff25c77b78a11ec56d9d592c1825516252fe5001725600bf8e3310ebfb5
-  data.tar.gz: fb2f11bc811c883391caf0a8ca49f9121d18c0ba33464fa6f04ec42e26b0a694
+  metadata.gz: e5b617098c5919065ccd3ae3d9da4ed1c068f121415ec06caea749b50dcc17eb
+  data.tar.gz: 27da8c412f1327e425834ab6429d2d0de7131d8ea14e7a7f3c72629ae75b457a
 SHA512:
-  metadata.gz: 36110037915cc74267514796a8aa18b32f42c068b00b99ed7aecd414624e2b1c109505ae0ba851045ee294b21b63bd7dac70cee404ab3ecab170820a0a948168
-  data.tar.gz: 44048959ce0e1ec6354fbff3b7ff11e10fab8a658d343b4efa23c7e6bdb5c94b815b0b4aa01618fc61b7a14f3e6ff83e1e86bc529e7a08e4420be5c8be3e342c
+  metadata.gz: cf4c1df062c79fa36b1a5a641513ee8a9b960c79b8b077dc25d97931c5d630b618c8dac950845ccad394fb4f60814ec756255d3a2f1847db68ae622719fecb30
+  data.tar.gz: '03000528b45aa265d6acc8696eb495b5c9ac57b143cba0e8ab3fc333ce0cf824b8ba86f77d1ab38522080e62e7887a934f2543d1d16f8e6e3c333f48a0ff9585'

data/README.md CHANGED Viewed

@@ -2,7 +2,7 @@
 # Rllama
-Ruby bindings for [llama.cpp](https://github.com/ggerganov/llama.cpp) to run open-source language models locally. Run models like GPT-OSS, Qwen 3, Gemma 3, Llama 3, and many others directly in your Ruby application code.
+Ruby bindings for [llama.cpp](https://github.com/ggml-org/llama.cpp) to run open-source language models locally. Run models like Gemma 4, Qwen 3.5, GLM 4.7, Nemotron, LFM2, Llama 3, and many others directly in your Ruby application code.
 ## Installation
@@ -24,6 +24,12 @@ Or install it yourself as:
 gem install rllama
 ```
+### Troubleshooting
+#### `llama_model_load_from_file_impl: no backends are loaded`
+If you're running on an Intel (x86_64) CPU and encounter this error while loading a model, make sure you're using the latest version of the gem. Rllama now preloads the bundled GGML backend libraries and automatically augments the `GGML_BACKEND_PATH`, so upgrading resolves the issue without any manual steps. If you build from source, ensure that directory is included in `GGML_BACKEND_PATH` before booting your Ruby process.
 ## CLI Chat
 The `rllama` command-line utility provides an interactive chat interface for conversing with language models. After installing the gem, you can start chatting immediately:
@@ -36,11 +42,14 @@ When you run `rllama` without arguments, it will display:
 - **Downloaded models**: Any models you've already downloaded to `~/.rllama/models/`
 - **Popular models**: A curated list of popular models available for download, including:
-  - Gemma 3 1B
+  - Gemma 4 E4B / Gemma 4 26B-A4B
+  - Nemotron 3 Nano 4B
+  - Qwen 3.5 35B-A3B
+  - LFM2 24B-A2B
+  - GLM 4.7 Flash
+  - GPT-OSS 20B
   - Llama 3.2 3B
   - Phi-4
-  - Qwen3 30B
-  - GPT-OSS
 Simply enter the number of the model you want to use. If you select a model that hasn't been downloaded yet, it will be automatically downloaded from Hugging Face.
@@ -204,6 +213,7 @@ You can download GGUF format models from various sources:
 - [Hugging Face](https://huggingface.co/models?library=gguf) - Search for models with "GGUF" format
 ## License
 MIT

data/bin/rllama CHANGED Viewed

@@ -1,8 +1,6 @@
 #!/usr/bin/env ruby
 # frozen_string_literal: true
-require 'bundler/setup'
 require 'rllama'
-require 'rllama/cli'
 Rllama::Cli.start(ARGV)

data/lib/rllama/cli.rb CHANGED Viewed

@@ -5,13 +5,19 @@ require 'readline'
 module Rllama
   class Cli
     POPULAR_MODELS = [
-      { path: 'lmstudio-community/gemma-3-1B-it-QAT-GGUF/gemma-3-1B-it-QAT-Q4_0.gguf', size: 720_425_472 },
-      { path: 'lmstudio-community/gpt-oss-20b-GGUF/gpt-oss-20b-MXFP4.gguf', size: 12_109_565_632 },
-      { path: 'bartowski/Llama-3.2-3B-Instruct-GGUF/Llama-3.2-3B-Instruct-Q4_K_M.gguf', size: 2_019_377_696 },
-      { path: 'unsloth/Qwen3-30B-A3B-GGUF/Qwen3-30B-A3B-Q3_K_S.gguf', size: 13_292_468_800 },
+      { path: 'lmstudio-community/gemma-4-E4B-it-GGUF/gemma-4-E4B-it-Q4_K_M.gguf', size: 5_335_285_280 },
+      { path: 'lmstudio-community/gemma-4-26B-A4B-it-GGUF/gemma-4-26B-A4B-it-Q4_K_M.gguf', size: 16_796_010_624 },
+      { path: 'unsloth/NVIDIA-Nemotron-3-Nano-4B-GGUF/NVIDIA-Nemotron-3-Nano-4B-Q4_K_M.gguf', size: 2_900_295_712 },
+      { path: 'unsloth/Qwen3.5-35B-A3B-GGUF/Qwen3.5-35B-A3B-Q4_K_M.gguf', size: 22_016_023_168 },
+      { path: 'lmstudio-community/LFM2-24B-A2B-GGUF/LFM2-24B-A2B-Q4_K_M.gguf', size: 14_415_473_952 },
+      { path: 'unsloth/GLM-4.7-Flash-GGUF/GLM-4.7-Flash-Q4_K_M.gguf', size: 18_312_339_808 },
       { path: 'inclusionAI/Ling-mini-2.0-GGUF/Ling-mini-2.0-Q4_K_M.gguf', size: 9_911_575_072 },
+      { path: 'lmstudio-community/gpt-oss-20b-GGUF/gpt-oss-20b-MXFP4.gguf', size: 12_109_565_632 },
       { path: 'unsloth/gemma-3n-E4B-it-GGUF/gemma-3n-E4B-it-Q4_K_S.gguf', size: 4_404_697_216 },
-      { path: 'microsoft/phi-4-gguf/phi-4-Q4_K_S.gguf', size: 8_440_762_560 }
+      { path: 'unsloth/Qwen3-30B-A3B-GGUF/Qwen3-30B-A3B-Q3_K_S.gguf', size: 13_292_468_800 },
+      { path: 'lmstudio-community/gemma-3-1B-it-QAT-GGUF/gemma-3-1B-it-QAT-Q4_0.gguf', size: 720_425_472 },
+      { path: 'microsoft/phi-4-gguf/phi-4-Q4_K_S.gguf', size: 8_440_762_560 },
+      { path: 'bartowski/Llama-3.2-3B-Instruct-GGUF/Llama-3.2-3B-Instruct-Q4_K_M.gguf', size: 2_019_377_696 }
     ].freeze
     COLOR_CODES = {

data/lib/rllama/context.rb CHANGED Viewed

@@ -1,10 +1,12 @@
 # frozen_string_literal: true
+require 'etc'
 module Rllama
   class Context
     attr_reader :messages, :n_ctx, :n_batch, :n_past
-    def initialize(model, embeddings: false, n_ctx: nil, n_batch: nil)
+    def initialize(model, embeddings: false, n_ctx: nil, n_batch: nil, n_threads: Etc.nprocessors)
       @model = model
       @n_ctx = n_ctx
       @n_batch = n_batch
@@ -15,6 +17,9 @@ module Rllama
       @ctx_params[:n_ctx] = @n_ctx if @n_ctx
       @ctx_params[:n_batch] = @n_batch if @n_batch
+      @ctx_params[:n_threads] = n_threads
+      @ctx_params[:n_threads_batch] = n_threads
       if @embeddings
         seq_cap = @model.n_seq_max

data/lib/rllama/cpp.rb CHANGED Viewed

@@ -15,7 +15,11 @@ module Rllama
       when 'windows', 'mingw32'
         'x64-mingw32'
       else
-        FFI::Platform::ARCH == 'aarch64' ? 'aarch64-linux' : 'x86_64-linux'
+        arch = FFI::Platform::ARCH == 'aarch64' ? 'aarch64' : 'x86_64'
+        is_musl = defined?(FFI::Platform::IS_GNU) ? !FFI::Platform::IS_GNU : RbConfig::CONFIG['host_os'].include?('musl')
+        is_musl ? "#{arch}-linux-musl" : "#{arch}-linux"
       end
     lib_file =
@@ -359,7 +363,9 @@ module Rllama
              :no_perf, :bool,
              :op_offload, :bool,
              :swa_full, :bool,
-             :kv_unified, :bool
+             :kv_unified, :bool,
+             :samplers, :pointer,
+             :n_samplers, :size_t
     end
     class LlamaModelQuantizeParams < FFI::Struct
@@ -533,10 +539,8 @@ module Rllama
     attach_function :llama_adapter_lora_free, [:llama_adapter_lora_p], :void
     attach_function :llama_adapter_get_alora_n_invocation_tokens, [:llama_adapter_lora_p], :uint64
     attach_function :llama_adapter_get_alora_invocation_tokens, [:llama_adapter_lora_p], :pointer # const llama_token*
-    attach_function :llama_set_adapter_lora, %i[llama_context_p llama_adapter_lora_p float], :int32
-    attach_function :llama_rm_adapter_lora, %i[llama_context_p llama_adapter_lora_p], :int32
-    attach_function :llama_clear_adapter_lora, [:llama_context_p], :void
-    attach_function :llama_apply_adapter_cvec, %i[llama_context_p pointer size_t int32 int32 int32], :int32
+    attach_function :llama_set_adapters_lora, %i[llama_context_p pointer size_t pointer], :int32
+    attach_function :llama_set_adapter_cvec, %i[llama_context_p pointer size_t int32 int32 int32], :int32
     # Memory management
     attach_function :llama_memory_clear, %i[llama_memory_t bool], :void

data/lib/rllama/model.rb CHANGED Viewed

@@ -4,6 +4,16 @@ module Rllama
   class Model
     DEFAULT_CONTEXT_LENGTH = 2**13
+    FALLBACK_TEMPLATES = {
+      'gemma4' => {
+        bos: '<bos>',
+        role_map: { 'assistant' => 'model' },
+        turn_start: ->(role) { "<|turn>#{role}\n" },
+        turn_end: "<turn|>\n",
+        generation_prompt: "<|turn>model\n"
+      }
+    }.freeze
     attr_reader :pointer
     def initialize(path_or_name, dir: nil)
@@ -36,6 +46,16 @@ module Rllama
       @n_ctx_train ||= Cpp.llama_model_n_ctx_train(@pointer)
     end
+    def architecture
+      @architecture ||= begin
+        buf = FFI::MemoryPointer.new(:char, 256)
+        n = Cpp.llama_model_meta_val_str(@pointer, 'general.architecture', buf, 256)
+        n.positive? ? buf.read_string(n) : nil
+      end
+    end
     def generate(prompt, max_tokens: DEFAULT_CONTEXT_LENGTH, temperature: 0.8, top_k: 40, top_p: 0.95, min_p: 0.05,
                  seed: nil, system: nil, &block)
       init_context(n_ctx: max_tokens) do |ctx|
@@ -98,6 +118,14 @@ module Rllama
     def build_chat_template(messages)
       raise Error, 'Model does not provide a chat template' if chat_template.nil? || chat_template.empty?
+      result = apply_chat_template(messages)
+      return result if result
+      apply_chat_template_fallback(messages)
+    end
+    def apply_chat_template(messages)
       count = messages.length
       struct_size = Cpp::LlamaChatMessage.size
       array_ptr = FFI::MemoryPointer.new(struct_size * count)
@@ -111,14 +139,34 @@ module Rllama
       needed = Cpp.llama_chat_apply_template(chat_template, array_ptr, count, true, nil, 0)
-      raise Error, 'Failed to apply chat template' if needed.negative?
+      return nil if needed.negative?
       buf = FFI::MemoryPointer.new(:char, needed)
       written = Cpp.llama_chat_apply_template(chat_template, array_ptr, count, true, buf, needed)
-      raise Error, 'Failed to apply chat template' if written.negative?
+      return nil if written.negative?
       buf.read_string(written)
     end
+    def apply_chat_template_fallback(messages)
+      tmpl = FALLBACK_TEMPLATES[architecture]
+      raise Error, "Unsupported chat template for architecture: #{architecture || 'unknown'}" unless tmpl
+      result = String.new(tmpl[:bos] || '')
+      role_map = tmpl[:role_map] || {}
+      messages.each do |m|
+        role = role_map[m[:role].to_s] || m[:role].to_s
+        result << tmpl[:turn_start].call(role)
+        result << m[:content].to_s
+        result << tmpl[:turn_end]
+      end
+      result << tmpl[:generation_prompt]
+      result
+    end
   end
 end

data/lib/rllama/version.rb CHANGED Viewed

@@ -1,5 +1,5 @@
 # frozen_string_literal: true
 module Rllama
-  VERSION = '1.0.2'
+  VERSION = '1.1.0'
 end

metadata CHANGED Viewed

@@ -1,13 +1,13 @@
 --- !ruby/object:Gem::Specification
 name: rllama
 version: !ruby/object:Gem::Version
-  version: 1.0.2
+  version: 1.1.0
 platform: ruby
 authors:
 - Pete Matsyburka
 bindir: bin
 cert_chain: []
-date: 2025-10-07 00:00:00.000000000 Z
+date: 1980-01-02 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: ffi
@@ -61,7 +61,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
     - !ruby/object:Gem::Version
       version: '0'
 requirements: []
-rubygems_version: 3.6.2
+rubygems_version: 4.0.3
 specification_version: 4
-summary: Ruby bindings for Llama API
+summary: Ruby bindings for llama.cpp to run local LLMs with Ruby.
 test_files: []