llama_cpp 0.2.2 → 0.3.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: e5e221d4831be790a990b121e6ac780d10b4cbfb85b2a9b4284d9c216f6e5604
4
- data.tar.gz: fba76ac1a70bfd7b02b8d123c57e4c8096a29ac7f658bb090cda91c6a54752d2
3
+ metadata.gz: 7a1f299e21bfe5b12d517a4254657cbc5bf9af6d0571285e2a5aff67b9175646
4
+ data.tar.gz: 62dd6e0d4f0b052a912d87b52cd0cff5bb873ab12378413a3ee0af5671331ef6
5
5
  SHA512:
6
- metadata.gz: 994029383219077e134d170177954251c20ede6d1c83843ecd22c42eeae83584079d124b41702f55add7f3f237e9bdb14382fbd37dde2d0e74f8cffcfed1715b
7
- data.tar.gz: ca4e94b6ddf4e4e9ddabbb2b8309cf4b2b06a881df09fdf4ad96e27c4f1f620ca0024ac46f69d9b474849c074a5c9ba9b0440777a0b52a12413bc356457a02f3
6
+ metadata.gz: b12dc73914e5c7ecdd951fd57b70e01aae1926a2adc88030b5f5310f99c789e129cf552811363ec99525b37b9ca167a708cb756057b94f5cf4dd2a0100b06b6e
7
+ data.tar.gz: d1d79696b08f89894de02a02fac91f0783c432efa641b21ee59f6987946b045681a60113392db6c85fe97bd0e1fc9860235faa358fb805bb0de21eb85926edd5
data/CHANGELOG.md CHANGED
@@ -1,3 +1,37 @@
1
+ ## [[0.3.1](https://github.com/yoshoku/llama_cpp.rb/compare/v0.3.0...v0.3.1)] - 2023-07-02
2
+
3
+ - Bump bundled llama.cpp from master-9d23589 to master-b8c8dda.
4
+ - Use unsigned values for random seed.
5
+ - Add `eval_embd` method to `Context` class.
6
+
7
+ ## [[0.3.0](https://github.com/yoshoku/llama_cpp.rb/compare/v0.2.2...v0.3.0)] - 2023-06-30
8
+
9
+ - Add no_k_quants and qkk_64 config options:
10
+ ```
11
+ $ gem install llama_cpp -- --with-no_k_quants
12
+ ```
13
+ ```
14
+ $ gem install llama_cpp -- --with-qkk_64
15
+ ```
16
+
17
+ **Breaking Changes**
18
+ - Remove `Client` class to concentrate on developing bindings.
19
+ - Bump bundled llama.cpp from master-7487137 to master-9d23589.
20
+ - llama_init_from_file and llama_apply_lora_from_file are deprecated.
21
+ - Add `Model` class for wrapping llama_model.
22
+ - Move the `apply_lora_from_file method`, `free`, `load`, and `empty?` methods to `Model` class from `Context` class.
23
+ - Change arguments of initialize method of Context. Its initialize method requires Model object instead of the model's file path.
24
+ ```ruby
25
+ requre 'llama_cpp'
26
+
27
+ params = LLaMACpp::ContextParams.new
28
+
29
+ model = LLaMACpp::Model.new(model_path: '/path/to/quantized-model.bin', params: params)
30
+ context = LLaMACpp::Context.new(model: model)
31
+
32
+ LLaMACpp.generate(context, 'Hello, world.')
33
+ ```
34
+
1
35
  ## [[0.2.2](https://github.com/yoshoku/llama_cpp.rb/compare/v0.2.1...v0.2.2)] - 2023-06-24
2
36
 
3
37
  - Bump bundled llama.cpp from master-a09f919 to master-7487137.
data/README.md CHANGED
@@ -20,21 +20,54 @@ If bundler is not being used to manage dependencies, install the gem by executin
20
20
 
21
21
  ## Usage
22
22
 
23
- Prepare the quantized model by refering to [the usage section on the llama.cpp README](https://github.com/ggerganov/llama.cpp#usage) or
24
- download the qunatized model, for example [ggml-vicuna-7b-4bit](https://github.com/ggerganov/llama.cpp/discussions/643#discussioncomment-5541351), from Hugging Face.
23
+ Prepare the quantized model by refering to [the usage section on the llama.cpp README](https://github.com/ggerganov/llama.cpp#usage).
24
+ For example, preparing the quatization model based on [open_llama_7b](https://huggingface.co/openlm-research/open_llama_7b) is as follows:
25
+
26
+ ```sh
27
+ $ cd ~/
28
+ $ brew install git-lfs
29
+ $ git lfs install
30
+ $ git clone https://github.com/ggerganov/llama.cpp.git
31
+ $ cd llama.cpp
32
+ $ python3 -m pip install -r requirements.txt
33
+ $ cd models
34
+ $ git clone https://huggingface.co/openlm-research/open_llama_7b
35
+ $ cd ../
36
+ $ python3 convert.py models/open_llama_7b
37
+ $ make
38
+ $ ./quantize ./models/open_llama_7b/ggml-model-f16.bin ./models/open_llama_7b/ggml-model-q4_0.bin q4_0
39
+ ```
40
+
41
+ An example of Ruby code that generates sentences with the quantization model is as follows:
25
42
 
26
43
  ```ruby
27
44
  require 'llama_cpp'
28
45
 
29
46
  params = LLaMACpp::ContextParams.new
30
- params.seed = 12
47
+ params.seed = 42
31
48
 
32
- context = LLaMACpp::Context.new(model_path: '/path/to/quantized-model.bin', params: params)
49
+ model = LLaMACpp::Model.new(model_path: '/home/user/llama.cpp/models/open_llama_7b/ggml-model-q4_0.bin', params: params)
50
+ context = LLaMACpp::Context.new(model: model)
33
51
 
34
- puts LLaMACpp.generate(context, 'Please tell me the largest city in Japan.', n_threads: 4)
35
- # => "There are two major cities in Japan, Tokyo and Osaka, which have about 30 million populations."
52
+ puts LLaMACpp.generate(context, 'Hello, World.', n_threads: 4)
36
53
  ```
37
54
 
55
+ ## Examples
56
+ There is a sample program in the [examples](https://github.com/yoshoku/llama_cpp.rb/tree/main/examples) directory that allow interactvie communication like ChatGPT.
57
+
58
+ ```sh
59
+ $ git clone https://github.com/yoshoku/llama_cpp.rb.git
60
+ $ cd examples
61
+ $ bundle install
62
+ $ ruby chat.rb --model /home/user/llama.cpp/models/open_llama_7b/ggml-model-q4_0.bin --seed 2023
63
+ ...
64
+ User: Who is the originator of the Ruby programming language?
65
+ Bob: The originator of the Ruby programming language is Mr. Yukihiro Matsumoto.
66
+ User:
67
+ ```
68
+
69
+ ![llama_cpp_chat_example](https://github.com/yoshoku/llama_cpp.rb/assets/5562409/374ae3d8-63a6-498f-ae6e-5552b464bdda)
70
+
38
71
  ## Contributing
39
72
 
40
73
  Bug reports and pull requests are welcome on GitHub at https://github.com/yoshoku/llama_cpp.rb.
data/examples/chat.rb CHANGED
@@ -35,7 +35,8 @@ class Chat < Thor # rubocop:disable Metrics/ClassLength, Style/Documentation
35
35
  params = LLaMACpp::ContextParams.new
36
36
  params.seed = options[:seed]
37
37
  params.n_gpu_layers = options[:n_gpu_layers]
38
- context = LLaMACpp::Context.new(model_path: options[:model], params: params)
38
+ model = LLaMACpp::Model.new(model_path: options[:model], params: params)
39
+ context = LLaMACpp::Context.new(model: model)
39
40
 
40
41
  antiprompt = options[:reverse_prompt] || 'User:'
41
42
  start_prompt = read_prompt(options[:file]) || default_prompt(antiprompt)
@@ -16,12 +16,13 @@ class Embedding < Thor # rubocop:disable Style/Documentation
16
16
  option :model, type: :string, aliases: '-m', desc: 'path to model file', required: true
17
17
  option :prompt, type: :string, aliases: '-p', desc: 'prompt to generate embedding', required: true
18
18
  option :n_gpu_layers, type: :numeric, desc: 'number of layers on GPU', default: 0
19
- def main # rubocop:disable Metrics/AbcSize
19
+ def main # rubocop:disable Metrics/AbcSize, Metrics/MethodLength
20
20
  params = LLaMACpp::ContextParams.new
21
21
  params.seed = options[:seed]
22
22
  params.n_gpu_layers = options[:n_gpu_layers]
23
23
  params.embedding = true
24
- context = LLaMACpp::Context.new(model_path: options[:model], params: params)
24
+ model = LLaMACpp::Model.new(model_path: options[:model], params: params)
25
+ context = LLaMACpp::Context.new(model: model)
25
26
 
26
27
  embd_input = context.tokenize(text: options[:prompt], add_bos: true)
27
28
 
@@ -17,6 +17,17 @@ if RUBY_PLATFORM.match?(/darwin|linux|bsd/) && try_compile('#include <stdio.h>',
17
17
  $CXXFLAGS << ' -pthread'
18
18
  end
19
19
 
20
+ unless with_config('no_k_quants')
21
+ $CFLAGS << ' -DGGML_USE_K_QUANTS'
22
+ $CXXFLAGS << ' -DGGML_USE_K_QUANTS'
23
+ $srcs << 'k_quants.c'
24
+ end
25
+
26
+ if with_config('qkk_64')
27
+ $CFLAGS << ' -DGGML_QKK_64'
28
+ $CXXFLAGS << ' -DGGML_QKK_64'
29
+ end
30
+
20
31
  if with_config('openblas')
21
32
  abort 'libopenblas is not found.' unless have_library('openblas')
22
33
  abort 'cblas.h is not found.' unless have_header('cblas.h')
@@ -42,6 +53,7 @@ if with_config('metal')
42
53
  $CXXFLAGS << ' -DGGML_USE_METAL'
43
54
  $LDFLAGS << ' -framework Foundation -framework Metal -framework MetalKit -framework MetalPerformanceShaders'
44
55
  $objs = %w[ggml.o llama.o llama_cpp.o ggml-metal.o]
56
+ $objs << 'k_quants.o' unless with_config('no_k_quants')
45
57
  end
46
58
 
47
59
  if with_config('cublas')
@@ -49,6 +61,7 @@ if with_config('cublas')
49
61
  $CXXFLAGS << ' -DGGML_USE_CUBLAS -I/usr/local/cuda/include'
50
62
  $LDFLAGS << ' -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L/usr/local/cuda/lib64'
51
63
  $objs = %w[ggml-cuda.o ggml.o llama.o llama_cpp.o]
64
+ $objs << 'k_quants.o' unless with_config('no_k_quants')
52
65
  end
53
66
 
54
67
  if with_config('clblast')