llama_cpp 0.2.2 → 0.3.1
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/CHANGELOG.md +34 -0
- data/README.md +39 -6
- data/examples/chat.rb +2 -1
- data/examples/embedding.rb +3 -2
- data/ext/llama_cpp/extconf.rb +13 -0
- data/ext/llama_cpp/llama_cpp.cpp +305 -133
- data/ext/llama_cpp/src/ggml-cuda.cu +367 -69
- data/ext/llama_cpp/src/ggml-cuda.h +1 -0
- data/ext/llama_cpp/src/ggml-metal.m +36 -30
- data/ext/llama_cpp/src/ggml-metal.metal +328 -84
- data/ext/llama_cpp/src/ggml-opencl.cpp +352 -175
- data/ext/llama_cpp/src/ggml.c +800 -303
- data/ext/llama_cpp/src/ggml.h +68 -5
- data/ext/llama_cpp/src/k_quants.c +1712 -56
- data/ext/llama_cpp/src/k_quants.h +41 -6
- data/ext/llama_cpp/src/llama-util.h +19 -5
- data/ext/llama_cpp/src/llama.cpp +262 -291
- data/ext/llama_cpp/src/llama.h +49 -11
- data/lib/llama_cpp/version.rb +2 -2
- data/lib/llama_cpp.rb +0 -2
- data/sig/llama_cpp.rbs +14 -17
- metadata +2 -3
- data/lib/llama_cpp/client.rb +0 -172
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 7a1f299e21bfe5b12d517a4254657cbc5bf9af6d0571285e2a5aff67b9175646
|
4
|
+
data.tar.gz: 62dd6e0d4f0b052a912d87b52cd0cff5bb873ab12378413a3ee0af5671331ef6
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: b12dc73914e5c7ecdd951fd57b70e01aae1926a2adc88030b5f5310f99c789e129cf552811363ec99525b37b9ca167a708cb756057b94f5cf4dd2a0100b06b6e
|
7
|
+
data.tar.gz: d1d79696b08f89894de02a02fac91f0783c432efa641b21ee59f6987946b045681a60113392db6c85fe97bd0e1fc9860235faa358fb805bb0de21eb85926edd5
|
data/CHANGELOG.md
CHANGED
@@ -1,3 +1,37 @@
|
|
1
|
+
## [[0.3.1](https://github.com/yoshoku/llama_cpp.rb/compare/v0.3.0...v0.3.1)] - 2023-07-02
|
2
|
+
|
3
|
+
- Bump bundled llama.cpp from master-9d23589 to master-b8c8dda.
|
4
|
+
- Use unsigned values for random seed.
|
5
|
+
- Add `eval_embd` method to `Context` class.
|
6
|
+
|
7
|
+
## [[0.3.0](https://github.com/yoshoku/llama_cpp.rb/compare/v0.2.2...v0.3.0)] - 2023-06-30
|
8
|
+
|
9
|
+
- Add no_k_quants and qkk_64 config options:
|
10
|
+
```
|
11
|
+
$ gem install llama_cpp -- --with-no_k_quants
|
12
|
+
```
|
13
|
+
```
|
14
|
+
$ gem install llama_cpp -- --with-qkk_64
|
15
|
+
```
|
16
|
+
|
17
|
+
**Breaking Changes**
|
18
|
+
- Remove `Client` class to concentrate on developing bindings.
|
19
|
+
- Bump bundled llama.cpp from master-7487137 to master-9d23589.
|
20
|
+
- llama_init_from_file and llama_apply_lora_from_file are deprecated.
|
21
|
+
- Add `Model` class for wrapping llama_model.
|
22
|
+
- Move the `apply_lora_from_file method`, `free`, `load`, and `empty?` methods to `Model` class from `Context` class.
|
23
|
+
- Change arguments of initialize method of Context. Its initialize method requires Model object instead of the model's file path.
|
24
|
+
```ruby
|
25
|
+
requre 'llama_cpp'
|
26
|
+
|
27
|
+
params = LLaMACpp::ContextParams.new
|
28
|
+
|
29
|
+
model = LLaMACpp::Model.new(model_path: '/path/to/quantized-model.bin', params: params)
|
30
|
+
context = LLaMACpp::Context.new(model: model)
|
31
|
+
|
32
|
+
LLaMACpp.generate(context, 'Hello, world.')
|
33
|
+
```
|
34
|
+
|
1
35
|
## [[0.2.2](https://github.com/yoshoku/llama_cpp.rb/compare/v0.2.1...v0.2.2)] - 2023-06-24
|
2
36
|
|
3
37
|
- Bump bundled llama.cpp from master-a09f919 to master-7487137.
|
data/README.md
CHANGED
@@ -20,21 +20,54 @@ If bundler is not being used to manage dependencies, install the gem by executin
|
|
20
20
|
|
21
21
|
## Usage
|
22
22
|
|
23
|
-
Prepare the quantized model by refering to [the usage section on the llama.cpp README](https://github.com/ggerganov/llama.cpp#usage)
|
24
|
-
|
23
|
+
Prepare the quantized model by refering to [the usage section on the llama.cpp README](https://github.com/ggerganov/llama.cpp#usage).
|
24
|
+
For example, preparing the quatization model based on [open_llama_7b](https://huggingface.co/openlm-research/open_llama_7b) is as follows:
|
25
|
+
|
26
|
+
```sh
|
27
|
+
$ cd ~/
|
28
|
+
$ brew install git-lfs
|
29
|
+
$ git lfs install
|
30
|
+
$ git clone https://github.com/ggerganov/llama.cpp.git
|
31
|
+
$ cd llama.cpp
|
32
|
+
$ python3 -m pip install -r requirements.txt
|
33
|
+
$ cd models
|
34
|
+
$ git clone https://huggingface.co/openlm-research/open_llama_7b
|
35
|
+
$ cd ../
|
36
|
+
$ python3 convert.py models/open_llama_7b
|
37
|
+
$ make
|
38
|
+
$ ./quantize ./models/open_llama_7b/ggml-model-f16.bin ./models/open_llama_7b/ggml-model-q4_0.bin q4_0
|
39
|
+
```
|
40
|
+
|
41
|
+
An example of Ruby code that generates sentences with the quantization model is as follows:
|
25
42
|
|
26
43
|
```ruby
|
27
44
|
require 'llama_cpp'
|
28
45
|
|
29
46
|
params = LLaMACpp::ContextParams.new
|
30
|
-
params.seed =
|
47
|
+
params.seed = 42
|
31
48
|
|
32
|
-
|
49
|
+
model = LLaMACpp::Model.new(model_path: '/home/user/llama.cpp/models/open_llama_7b/ggml-model-q4_0.bin', params: params)
|
50
|
+
context = LLaMACpp::Context.new(model: model)
|
33
51
|
|
34
|
-
puts LLaMACpp.generate(context, '
|
35
|
-
# => "There are two major cities in Japan, Tokyo and Osaka, which have about 30 million populations."
|
52
|
+
puts LLaMACpp.generate(context, 'Hello, World.', n_threads: 4)
|
36
53
|
```
|
37
54
|
|
55
|
+
## Examples
|
56
|
+
There is a sample program in the [examples](https://github.com/yoshoku/llama_cpp.rb/tree/main/examples) directory that allow interactvie communication like ChatGPT.
|
57
|
+
|
58
|
+
```sh
|
59
|
+
$ git clone https://github.com/yoshoku/llama_cpp.rb.git
|
60
|
+
$ cd examples
|
61
|
+
$ bundle install
|
62
|
+
$ ruby chat.rb --model /home/user/llama.cpp/models/open_llama_7b/ggml-model-q4_0.bin --seed 2023
|
63
|
+
...
|
64
|
+
User: Who is the originator of the Ruby programming language?
|
65
|
+
Bob: The originator of the Ruby programming language is Mr. Yukihiro Matsumoto.
|
66
|
+
User:
|
67
|
+
```
|
68
|
+
|
69
|
+
![llama_cpp_chat_example](https://github.com/yoshoku/llama_cpp.rb/assets/5562409/374ae3d8-63a6-498f-ae6e-5552b464bdda)
|
70
|
+
|
38
71
|
## Contributing
|
39
72
|
|
40
73
|
Bug reports and pull requests are welcome on GitHub at https://github.com/yoshoku/llama_cpp.rb.
|
data/examples/chat.rb
CHANGED
@@ -35,7 +35,8 @@ class Chat < Thor # rubocop:disable Metrics/ClassLength, Style/Documentation
|
|
35
35
|
params = LLaMACpp::ContextParams.new
|
36
36
|
params.seed = options[:seed]
|
37
37
|
params.n_gpu_layers = options[:n_gpu_layers]
|
38
|
-
|
38
|
+
model = LLaMACpp::Model.new(model_path: options[:model], params: params)
|
39
|
+
context = LLaMACpp::Context.new(model: model)
|
39
40
|
|
40
41
|
antiprompt = options[:reverse_prompt] || 'User:'
|
41
42
|
start_prompt = read_prompt(options[:file]) || default_prompt(antiprompt)
|
data/examples/embedding.rb
CHANGED
@@ -16,12 +16,13 @@ class Embedding < Thor # rubocop:disable Style/Documentation
|
|
16
16
|
option :model, type: :string, aliases: '-m', desc: 'path to model file', required: true
|
17
17
|
option :prompt, type: :string, aliases: '-p', desc: 'prompt to generate embedding', required: true
|
18
18
|
option :n_gpu_layers, type: :numeric, desc: 'number of layers on GPU', default: 0
|
19
|
-
def main # rubocop:disable Metrics/AbcSize
|
19
|
+
def main # rubocop:disable Metrics/AbcSize, Metrics/MethodLength
|
20
20
|
params = LLaMACpp::ContextParams.new
|
21
21
|
params.seed = options[:seed]
|
22
22
|
params.n_gpu_layers = options[:n_gpu_layers]
|
23
23
|
params.embedding = true
|
24
|
-
|
24
|
+
model = LLaMACpp::Model.new(model_path: options[:model], params: params)
|
25
|
+
context = LLaMACpp::Context.new(model: model)
|
25
26
|
|
26
27
|
embd_input = context.tokenize(text: options[:prompt], add_bos: true)
|
27
28
|
|
data/ext/llama_cpp/extconf.rb
CHANGED
@@ -17,6 +17,17 @@ if RUBY_PLATFORM.match?(/darwin|linux|bsd/) && try_compile('#include <stdio.h>',
|
|
17
17
|
$CXXFLAGS << ' -pthread'
|
18
18
|
end
|
19
19
|
|
20
|
+
unless with_config('no_k_quants')
|
21
|
+
$CFLAGS << ' -DGGML_USE_K_QUANTS'
|
22
|
+
$CXXFLAGS << ' -DGGML_USE_K_QUANTS'
|
23
|
+
$srcs << 'k_quants.c'
|
24
|
+
end
|
25
|
+
|
26
|
+
if with_config('qkk_64')
|
27
|
+
$CFLAGS << ' -DGGML_QKK_64'
|
28
|
+
$CXXFLAGS << ' -DGGML_QKK_64'
|
29
|
+
end
|
30
|
+
|
20
31
|
if with_config('openblas')
|
21
32
|
abort 'libopenblas is not found.' unless have_library('openblas')
|
22
33
|
abort 'cblas.h is not found.' unless have_header('cblas.h')
|
@@ -42,6 +53,7 @@ if with_config('metal')
|
|
42
53
|
$CXXFLAGS << ' -DGGML_USE_METAL'
|
43
54
|
$LDFLAGS << ' -framework Foundation -framework Metal -framework MetalKit -framework MetalPerformanceShaders'
|
44
55
|
$objs = %w[ggml.o llama.o llama_cpp.o ggml-metal.o]
|
56
|
+
$objs << 'k_quants.o' unless with_config('no_k_quants')
|
45
57
|
end
|
46
58
|
|
47
59
|
if with_config('cublas')
|
@@ -49,6 +61,7 @@ if with_config('cublas')
|
|
49
61
|
$CXXFLAGS << ' -DGGML_USE_CUBLAS -I/usr/local/cuda/include'
|
50
62
|
$LDFLAGS << ' -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L/usr/local/cuda/lib64'
|
51
63
|
$objs = %w[ggml-cuda.o ggml.o llama.o llama_cpp.o]
|
64
|
+
$objs << 'k_quants.o' unless with_config('no_k_quants')
|
52
65
|
end
|
53
66
|
|
54
67
|
if with_config('clblast')
|