llama_cpp 0.2.2 → 0.3.1

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: e5e221d4831be790a990b121e6ac780d10b4cbfb85b2a9b4284d9c216f6e5604
4
- data.tar.gz: fba76ac1a70bfd7b02b8d123c57e4c8096a29ac7f658bb090cda91c6a54752d2
3
+ metadata.gz: 7a1f299e21bfe5b12d517a4254657cbc5bf9af6d0571285e2a5aff67b9175646
4
+ data.tar.gz: 62dd6e0d4f0b052a912d87b52cd0cff5bb873ab12378413a3ee0af5671331ef6
5
5
  SHA512:
6
- metadata.gz: 994029383219077e134d170177954251c20ede6d1c83843ecd22c42eeae83584079d124b41702f55add7f3f237e9bdb14382fbd37dde2d0e74f8cffcfed1715b
7
- data.tar.gz: ca4e94b6ddf4e4e9ddabbb2b8309cf4b2b06a881df09fdf4ad96e27c4f1f620ca0024ac46f69d9b474849c074a5c9ba9b0440777a0b52a12413bc356457a02f3
6
+ metadata.gz: b12dc73914e5c7ecdd951fd57b70e01aae1926a2adc88030b5f5310f99c789e129cf552811363ec99525b37b9ca167a708cb756057b94f5cf4dd2a0100b06b6e
7
+ data.tar.gz: d1d79696b08f89894de02a02fac91f0783c432efa641b21ee59f6987946b045681a60113392db6c85fe97bd0e1fc9860235faa358fb805bb0de21eb85926edd5
data/CHANGELOG.md CHANGED
@@ -1,3 +1,37 @@
1
+ ## [[0.3.1](https://github.com/yoshoku/llama_cpp.rb/compare/v0.3.0...v0.3.1)] - 2023-07-02
2
+
3
+ - Bump bundled llama.cpp from master-9d23589 to master-b8c8dda.
4
+ - Use unsigned values for random seed.
5
+ - Add `eval_embd` method to `Context` class.
6
+
7
+ ## [[0.3.0](https://github.com/yoshoku/llama_cpp.rb/compare/v0.2.2...v0.3.0)] - 2023-06-30
8
+
9
+ - Add no_k_quants and qkk_64 config options:
10
+ ```
11
+ $ gem install llama_cpp -- --with-no_k_quants
12
+ ```
13
+ ```
14
+ $ gem install llama_cpp -- --with-qkk_64
15
+ ```
16
+
17
+ **Breaking Changes**
18
+ - Remove `Client` class to concentrate on developing bindings.
19
+ - Bump bundled llama.cpp from master-7487137 to master-9d23589.
20
+ - llama_init_from_file and llama_apply_lora_from_file are deprecated.
21
+ - Add `Model` class for wrapping llama_model.
22
+ - Move the `apply_lora_from_file method`, `free`, `load`, and `empty?` methods to `Model` class from `Context` class.
23
+ - Change arguments of initialize method of Context. Its initialize method requires Model object instead of the model's file path.
24
+ ```ruby
25
+ requre 'llama_cpp'
26
+
27
+ params = LLaMACpp::ContextParams.new
28
+
29
+ model = LLaMACpp::Model.new(model_path: '/path/to/quantized-model.bin', params: params)
30
+ context = LLaMACpp::Context.new(model: model)
31
+
32
+ LLaMACpp.generate(context, 'Hello, world.')
33
+ ```
34
+
1
35
  ## [[0.2.2](https://github.com/yoshoku/llama_cpp.rb/compare/v0.2.1...v0.2.2)] - 2023-06-24
2
36
 
3
37
  - Bump bundled llama.cpp from master-a09f919 to master-7487137.
data/README.md CHANGED
@@ -20,21 +20,54 @@ If bundler is not being used to manage dependencies, install the gem by executin
20
20
 
21
21
  ## Usage
22
22
 
23
- Prepare the quantized model by refering to [the usage section on the llama.cpp README](https://github.com/ggerganov/llama.cpp#usage) or
24
- download the qunatized model, for example [ggml-vicuna-7b-4bit](https://github.com/ggerganov/llama.cpp/discussions/643#discussioncomment-5541351), from Hugging Face.
23
+ Prepare the quantized model by refering to [the usage section on the llama.cpp README](https://github.com/ggerganov/llama.cpp#usage).
24
+ For example, preparing the quatization model based on [open_llama_7b](https://huggingface.co/openlm-research/open_llama_7b) is as follows:
25
+
26
+ ```sh
27
+ $ cd ~/
28
+ $ brew install git-lfs
29
+ $ git lfs install
30
+ $ git clone https://github.com/ggerganov/llama.cpp.git
31
+ $ cd llama.cpp
32
+ $ python3 -m pip install -r requirements.txt
33
+ $ cd models
34
+ $ git clone https://huggingface.co/openlm-research/open_llama_7b
35
+ $ cd ../
36
+ $ python3 convert.py models/open_llama_7b
37
+ $ make
38
+ $ ./quantize ./models/open_llama_7b/ggml-model-f16.bin ./models/open_llama_7b/ggml-model-q4_0.bin q4_0
39
+ ```
40
+
41
+ An example of Ruby code that generates sentences with the quantization model is as follows:
25
42
 
26
43
  ```ruby
27
44
  require 'llama_cpp'
28
45
 
29
46
  params = LLaMACpp::ContextParams.new
30
- params.seed = 12
47
+ params.seed = 42
31
48
 
32
- context = LLaMACpp::Context.new(model_path: '/path/to/quantized-model.bin', params: params)
49
+ model = LLaMACpp::Model.new(model_path: '/home/user/llama.cpp/models/open_llama_7b/ggml-model-q4_0.bin', params: params)
50
+ context = LLaMACpp::Context.new(model: model)
33
51
 
34
- puts LLaMACpp.generate(context, 'Please tell me the largest city in Japan.', n_threads: 4)
35
- # => "There are two major cities in Japan, Tokyo and Osaka, which have about 30 million populations."
52
+ puts LLaMACpp.generate(context, 'Hello, World.', n_threads: 4)
36
53
  ```
37
54
 
55
+ ## Examples
56
+ There is a sample program in the [examples](https://github.com/yoshoku/llama_cpp.rb/tree/main/examples) directory that allow interactvie communication like ChatGPT.
57
+
58
+ ```sh
59
+ $ git clone https://github.com/yoshoku/llama_cpp.rb.git
60
+ $ cd examples
61
+ $ bundle install
62
+ $ ruby chat.rb --model /home/user/llama.cpp/models/open_llama_7b/ggml-model-q4_0.bin --seed 2023
63
+ ...
64
+ User: Who is the originator of the Ruby programming language?
65
+ Bob: The originator of the Ruby programming language is Mr. Yukihiro Matsumoto.
66
+ User:
67
+ ```
68
+
69
+ ![llama_cpp_chat_example](https://github.com/yoshoku/llama_cpp.rb/assets/5562409/374ae3d8-63a6-498f-ae6e-5552b464bdda)
70
+
38
71
  ## Contributing
39
72
 
40
73
  Bug reports and pull requests are welcome on GitHub at https://github.com/yoshoku/llama_cpp.rb.
data/examples/chat.rb CHANGED
@@ -35,7 +35,8 @@ class Chat < Thor # rubocop:disable Metrics/ClassLength, Style/Documentation
35
35
  params = LLaMACpp::ContextParams.new
36
36
  params.seed = options[:seed]
37
37
  params.n_gpu_layers = options[:n_gpu_layers]
38
- context = LLaMACpp::Context.new(model_path: options[:model], params: params)
38
+ model = LLaMACpp::Model.new(model_path: options[:model], params: params)
39
+ context = LLaMACpp::Context.new(model: model)
39
40
 
40
41
  antiprompt = options[:reverse_prompt] || 'User:'
41
42
  start_prompt = read_prompt(options[:file]) || default_prompt(antiprompt)
@@ -16,12 +16,13 @@ class Embedding < Thor # rubocop:disable Style/Documentation
16
16
  option :model, type: :string, aliases: '-m', desc: 'path to model file', required: true
17
17
  option :prompt, type: :string, aliases: '-p', desc: 'prompt to generate embedding', required: true
18
18
  option :n_gpu_layers, type: :numeric, desc: 'number of layers on GPU', default: 0
19
- def main # rubocop:disable Metrics/AbcSize
19
+ def main # rubocop:disable Metrics/AbcSize, Metrics/MethodLength
20
20
  params = LLaMACpp::ContextParams.new
21
21
  params.seed = options[:seed]
22
22
  params.n_gpu_layers = options[:n_gpu_layers]
23
23
  params.embedding = true
24
- context = LLaMACpp::Context.new(model_path: options[:model], params: params)
24
+ model = LLaMACpp::Model.new(model_path: options[:model], params: params)
25
+ context = LLaMACpp::Context.new(model: model)
25
26
 
26
27
  embd_input = context.tokenize(text: options[:prompt], add_bos: true)
27
28
 
@@ -17,6 +17,17 @@ if RUBY_PLATFORM.match?(/darwin|linux|bsd/) && try_compile('#include <stdio.h>',
17
17
  $CXXFLAGS << ' -pthread'
18
18
  end
19
19
 
20
+ unless with_config('no_k_quants')
21
+ $CFLAGS << ' -DGGML_USE_K_QUANTS'
22
+ $CXXFLAGS << ' -DGGML_USE_K_QUANTS'
23
+ $srcs << 'k_quants.c'
24
+ end
25
+
26
+ if with_config('qkk_64')
27
+ $CFLAGS << ' -DGGML_QKK_64'
28
+ $CXXFLAGS << ' -DGGML_QKK_64'
29
+ end
30
+
20
31
  if with_config('openblas')
21
32
  abort 'libopenblas is not found.' unless have_library('openblas')
22
33
  abort 'cblas.h is not found.' unless have_header('cblas.h')
@@ -42,6 +53,7 @@ if with_config('metal')
42
53
  $CXXFLAGS << ' -DGGML_USE_METAL'
43
54
  $LDFLAGS << ' -framework Foundation -framework Metal -framework MetalKit -framework MetalPerformanceShaders'
44
55
  $objs = %w[ggml.o llama.o llama_cpp.o ggml-metal.o]
56
+ $objs << 'k_quants.o' unless with_config('no_k_quants')
45
57
  end
46
58
 
47
59
  if with_config('cublas')
@@ -49,6 +61,7 @@ if with_config('cublas')
49
61
  $CXXFLAGS << ' -DGGML_USE_CUBLAS -I/usr/local/cuda/include'
50
62
  $LDFLAGS << ' -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L/usr/local/cuda/lib64'
51
63
  $objs = %w[ggml-cuda.o ggml.o llama.o llama_cpp.o]
64
+ $objs << 'k_quants.o' unless with_config('no_k_quants')
52
65
  end
53
66
 
54
67
  if with_config('clblast')