llama_cpp 0.2.2 → 0.3.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +34 -0
- data/README.md +39 -6
- data/examples/chat.rb +2 -1
- data/examples/embedding.rb +3 -2
- data/ext/llama_cpp/extconf.rb +13 -0
- data/ext/llama_cpp/llama_cpp.cpp +305 -133
- data/ext/llama_cpp/src/ggml-cuda.cu +367 -69
- data/ext/llama_cpp/src/ggml-cuda.h +1 -0
- data/ext/llama_cpp/src/ggml-metal.m +36 -30
- data/ext/llama_cpp/src/ggml-metal.metal +328 -84
- data/ext/llama_cpp/src/ggml-opencl.cpp +352 -175
- data/ext/llama_cpp/src/ggml.c +800 -303
- data/ext/llama_cpp/src/ggml.h +68 -5
- data/ext/llama_cpp/src/k_quants.c +1712 -56
- data/ext/llama_cpp/src/k_quants.h +41 -6
- data/ext/llama_cpp/src/llama-util.h +19 -5
- data/ext/llama_cpp/src/llama.cpp +262 -291
- data/ext/llama_cpp/src/llama.h +49 -11
- data/lib/llama_cpp/version.rb +2 -2
- data/lib/llama_cpp.rb +0 -2
- data/sig/llama_cpp.rbs +14 -17
- metadata +2 -3
- data/lib/llama_cpp/client.rb +0 -172
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 7a1f299e21bfe5b12d517a4254657cbc5bf9af6d0571285e2a5aff67b9175646
|
4
|
+
data.tar.gz: 62dd6e0d4f0b052a912d87b52cd0cff5bb873ab12378413a3ee0af5671331ef6
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: b12dc73914e5c7ecdd951fd57b70e01aae1926a2adc88030b5f5310f99c789e129cf552811363ec99525b37b9ca167a708cb756057b94f5cf4dd2a0100b06b6e
|
7
|
+
data.tar.gz: d1d79696b08f89894de02a02fac91f0783c432efa641b21ee59f6987946b045681a60113392db6c85fe97bd0e1fc9860235faa358fb805bb0de21eb85926edd5
|
data/CHANGELOG.md
CHANGED
@@ -1,3 +1,37 @@
|
|
1
|
+
## [[0.3.1](https://github.com/yoshoku/llama_cpp.rb/compare/v0.3.0...v0.3.1)] - 2023-07-02
|
2
|
+
|
3
|
+
- Bump bundled llama.cpp from master-9d23589 to master-b8c8dda.
|
4
|
+
- Use unsigned values for random seed.
|
5
|
+
- Add `eval_embd` method to `Context` class.
|
6
|
+
|
7
|
+
## [[0.3.0](https://github.com/yoshoku/llama_cpp.rb/compare/v0.2.2...v0.3.0)] - 2023-06-30
|
8
|
+
|
9
|
+
- Add no_k_quants and qkk_64 config options:
|
10
|
+
```
|
11
|
+
$ gem install llama_cpp -- --with-no_k_quants
|
12
|
+
```
|
13
|
+
```
|
14
|
+
$ gem install llama_cpp -- --with-qkk_64
|
15
|
+
```
|
16
|
+
|
17
|
+
**Breaking Changes**
|
18
|
+
- Remove `Client` class to concentrate on developing bindings.
|
19
|
+
- Bump bundled llama.cpp from master-7487137 to master-9d23589.
|
20
|
+
- llama_init_from_file and llama_apply_lora_from_file are deprecated.
|
21
|
+
- Add `Model` class for wrapping llama_model.
|
22
|
+
- Move the `apply_lora_from_file method`, `free`, `load`, and `empty?` methods to `Model` class from `Context` class.
|
23
|
+
- Change arguments of initialize method of Context. Its initialize method requires Model object instead of the model's file path.
|
24
|
+
```ruby
|
25
|
+
requre 'llama_cpp'
|
26
|
+
|
27
|
+
params = LLaMACpp::ContextParams.new
|
28
|
+
|
29
|
+
model = LLaMACpp::Model.new(model_path: '/path/to/quantized-model.bin', params: params)
|
30
|
+
context = LLaMACpp::Context.new(model: model)
|
31
|
+
|
32
|
+
LLaMACpp.generate(context, 'Hello, world.')
|
33
|
+
```
|
34
|
+
|
1
35
|
## [[0.2.2](https://github.com/yoshoku/llama_cpp.rb/compare/v0.2.1...v0.2.2)] - 2023-06-24
|
2
36
|
|
3
37
|
- Bump bundled llama.cpp from master-a09f919 to master-7487137.
|
data/README.md
CHANGED
@@ -20,21 +20,54 @@ If bundler is not being used to manage dependencies, install the gem by executin
|
|
20
20
|
|
21
21
|
## Usage
|
22
22
|
|
23
|
-
Prepare the quantized model by refering to [the usage section on the llama.cpp README](https://github.com/ggerganov/llama.cpp#usage)
|
24
|
-
|
23
|
+
Prepare the quantized model by refering to [the usage section on the llama.cpp README](https://github.com/ggerganov/llama.cpp#usage).
|
24
|
+
For example, preparing the quatization model based on [open_llama_7b](https://huggingface.co/openlm-research/open_llama_7b) is as follows:
|
25
|
+
|
26
|
+
```sh
|
27
|
+
$ cd ~/
|
28
|
+
$ brew install git-lfs
|
29
|
+
$ git lfs install
|
30
|
+
$ git clone https://github.com/ggerganov/llama.cpp.git
|
31
|
+
$ cd llama.cpp
|
32
|
+
$ python3 -m pip install -r requirements.txt
|
33
|
+
$ cd models
|
34
|
+
$ git clone https://huggingface.co/openlm-research/open_llama_7b
|
35
|
+
$ cd ../
|
36
|
+
$ python3 convert.py models/open_llama_7b
|
37
|
+
$ make
|
38
|
+
$ ./quantize ./models/open_llama_7b/ggml-model-f16.bin ./models/open_llama_7b/ggml-model-q4_0.bin q4_0
|
39
|
+
```
|
40
|
+
|
41
|
+
An example of Ruby code that generates sentences with the quantization model is as follows:
|
25
42
|
|
26
43
|
```ruby
|
27
44
|
require 'llama_cpp'
|
28
45
|
|
29
46
|
params = LLaMACpp::ContextParams.new
|
30
|
-
params.seed =
|
47
|
+
params.seed = 42
|
31
48
|
|
32
|
-
|
49
|
+
model = LLaMACpp::Model.new(model_path: '/home/user/llama.cpp/models/open_llama_7b/ggml-model-q4_0.bin', params: params)
|
50
|
+
context = LLaMACpp::Context.new(model: model)
|
33
51
|
|
34
|
-
puts LLaMACpp.generate(context, '
|
35
|
-
# => "There are two major cities in Japan, Tokyo and Osaka, which have about 30 million populations."
|
52
|
+
puts LLaMACpp.generate(context, 'Hello, World.', n_threads: 4)
|
36
53
|
```
|
37
54
|
|
55
|
+
## Examples
|
56
|
+
There is a sample program in the [examples](https://github.com/yoshoku/llama_cpp.rb/tree/main/examples) directory that allow interactvie communication like ChatGPT.
|
57
|
+
|
58
|
+
```sh
|
59
|
+
$ git clone https://github.com/yoshoku/llama_cpp.rb.git
|
60
|
+
$ cd examples
|
61
|
+
$ bundle install
|
62
|
+
$ ruby chat.rb --model /home/user/llama.cpp/models/open_llama_7b/ggml-model-q4_0.bin --seed 2023
|
63
|
+
...
|
64
|
+
User: Who is the originator of the Ruby programming language?
|
65
|
+
Bob: The originator of the Ruby programming language is Mr. Yukihiro Matsumoto.
|
66
|
+
User:
|
67
|
+
```
|
68
|
+
|
69
|
+

|
70
|
+
|
38
71
|
## Contributing
|
39
72
|
|
40
73
|
Bug reports and pull requests are welcome on GitHub at https://github.com/yoshoku/llama_cpp.rb.
|
data/examples/chat.rb
CHANGED
@@ -35,7 +35,8 @@ class Chat < Thor # rubocop:disable Metrics/ClassLength, Style/Documentation
|
|
35
35
|
params = LLaMACpp::ContextParams.new
|
36
36
|
params.seed = options[:seed]
|
37
37
|
params.n_gpu_layers = options[:n_gpu_layers]
|
38
|
-
|
38
|
+
model = LLaMACpp::Model.new(model_path: options[:model], params: params)
|
39
|
+
context = LLaMACpp::Context.new(model: model)
|
39
40
|
|
40
41
|
antiprompt = options[:reverse_prompt] || 'User:'
|
41
42
|
start_prompt = read_prompt(options[:file]) || default_prompt(antiprompt)
|
data/examples/embedding.rb
CHANGED
@@ -16,12 +16,13 @@ class Embedding < Thor # rubocop:disable Style/Documentation
|
|
16
16
|
option :model, type: :string, aliases: '-m', desc: 'path to model file', required: true
|
17
17
|
option :prompt, type: :string, aliases: '-p', desc: 'prompt to generate embedding', required: true
|
18
18
|
option :n_gpu_layers, type: :numeric, desc: 'number of layers on GPU', default: 0
|
19
|
-
def main # rubocop:disable Metrics/AbcSize
|
19
|
+
def main # rubocop:disable Metrics/AbcSize, Metrics/MethodLength
|
20
20
|
params = LLaMACpp::ContextParams.new
|
21
21
|
params.seed = options[:seed]
|
22
22
|
params.n_gpu_layers = options[:n_gpu_layers]
|
23
23
|
params.embedding = true
|
24
|
-
|
24
|
+
model = LLaMACpp::Model.new(model_path: options[:model], params: params)
|
25
|
+
context = LLaMACpp::Context.new(model: model)
|
25
26
|
|
26
27
|
embd_input = context.tokenize(text: options[:prompt], add_bos: true)
|
27
28
|
|
data/ext/llama_cpp/extconf.rb
CHANGED
@@ -17,6 +17,17 @@ if RUBY_PLATFORM.match?(/darwin|linux|bsd/) && try_compile('#include <stdio.h>',
|
|
17
17
|
$CXXFLAGS << ' -pthread'
|
18
18
|
end
|
19
19
|
|
20
|
+
unless with_config('no_k_quants')
|
21
|
+
$CFLAGS << ' -DGGML_USE_K_QUANTS'
|
22
|
+
$CXXFLAGS << ' -DGGML_USE_K_QUANTS'
|
23
|
+
$srcs << 'k_quants.c'
|
24
|
+
end
|
25
|
+
|
26
|
+
if with_config('qkk_64')
|
27
|
+
$CFLAGS << ' -DGGML_QKK_64'
|
28
|
+
$CXXFLAGS << ' -DGGML_QKK_64'
|
29
|
+
end
|
30
|
+
|
20
31
|
if with_config('openblas')
|
21
32
|
abort 'libopenblas is not found.' unless have_library('openblas')
|
22
33
|
abort 'cblas.h is not found.' unless have_header('cblas.h')
|
@@ -42,6 +53,7 @@ if with_config('metal')
|
|
42
53
|
$CXXFLAGS << ' -DGGML_USE_METAL'
|
43
54
|
$LDFLAGS << ' -framework Foundation -framework Metal -framework MetalKit -framework MetalPerformanceShaders'
|
44
55
|
$objs = %w[ggml.o llama.o llama_cpp.o ggml-metal.o]
|
56
|
+
$objs << 'k_quants.o' unless with_config('no_k_quants')
|
45
57
|
end
|
46
58
|
|
47
59
|
if with_config('cublas')
|
@@ -49,6 +61,7 @@ if with_config('cublas')
|
|
49
61
|
$CXXFLAGS << ' -DGGML_USE_CUBLAS -I/usr/local/cuda/include'
|
50
62
|
$LDFLAGS << ' -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L/usr/local/cuda/lib64'
|
51
63
|
$objs = %w[ggml-cuda.o ggml.o llama.o llama_cpp.o]
|
64
|
+
$objs << 'k_quants.o' unless with_config('no_k_quants')
|
52
65
|
end
|
53
66
|
|
54
67
|
if with_config('clblast')
|