@fugood/llama.node 0.3.0 → 0.3.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (187) hide show
  1. package/CMakeLists.txt +1 -10
  2. package/bin/darwin/arm64/llama-node.node +0 -0
  3. package/bin/darwin/x64/llama-node.node +0 -0
  4. package/bin/linux/arm64/llama-node.node +0 -0
  5. package/bin/linux/x64/llama-node.node +0 -0
  6. package/bin/linux-vulkan/arm64/llama-node.node +0 -0
  7. package/bin/linux-vulkan/x64/llama-node.node +0 -0
  8. package/bin/win32/arm64/llama-node.node +0 -0
  9. package/bin/win32/arm64/node.lib +0 -0
  10. package/bin/win32/x64/llama-node.node +0 -0
  11. package/bin/win32/x64/node.lib +0 -0
  12. package/bin/win32-vulkan/arm64/llama-node.node +0 -0
  13. package/bin/win32-vulkan/arm64/node.lib +0 -0
  14. package/bin/win32-vulkan/x64/llama-node.node +0 -0
  15. package/bin/win32-vulkan/x64/node.lib +0 -0
  16. package/package.json +6 -4
  17. package/src/LlamaCompletionWorker.cpp +6 -6
  18. package/src/LlamaContext.cpp +7 -9
  19. package/src/common.hpp +2 -1
  20. package/src/llama.cpp/.github/workflows/build.yml +98 -24
  21. package/src/llama.cpp/.github/workflows/close-issue.yml +5 -0
  22. package/src/llama.cpp/.github/workflows/docker.yml +43 -34
  23. package/src/llama.cpp/.github/workflows/nix-ci-aarch64.yml +7 -0
  24. package/src/llama.cpp/.github/workflows/nix-ci.yml +7 -0
  25. package/src/llama.cpp/.github/workflows/python-check-requirements.yml +2 -4
  26. package/src/llama.cpp/.github/workflows/python-type-check.yml +3 -1
  27. package/src/llama.cpp/.github/workflows/server.yml +7 -0
  28. package/src/llama.cpp/CMakeLists.txt +20 -8
  29. package/src/llama.cpp/common/CMakeLists.txt +12 -10
  30. package/src/llama.cpp/common/arg.cpp +2006 -0
  31. package/src/llama.cpp/common/arg.h +77 -0
  32. package/src/llama.cpp/common/common.cpp +496 -1632
  33. package/src/llama.cpp/common/common.h +161 -63
  34. package/src/llama.cpp/common/console.cpp +3 -0
  35. package/src/llama.cpp/common/log.cpp +401 -0
  36. package/src/llama.cpp/common/log.h +66 -698
  37. package/src/llama.cpp/common/ngram-cache.cpp +3 -0
  38. package/src/llama.cpp/common/sampling.cpp +348 -350
  39. package/src/llama.cpp/common/sampling.h +62 -139
  40. package/src/llama.cpp/common/stb_image.h +5990 -6398
  41. package/src/llama.cpp/common/train.cpp +2 -0
  42. package/src/llama.cpp/docs/build.md +36 -1
  43. package/src/llama.cpp/examples/CMakeLists.txt +0 -1
  44. package/src/llama.cpp/examples/baby-llama/baby-llama.cpp +1 -2
  45. package/src/llama.cpp/examples/batched/batched.cpp +39 -55
  46. package/src/llama.cpp/examples/batched-bench/batched-bench.cpp +34 -44
  47. package/src/llama.cpp/examples/convert-llama2c-to-ggml/convert-llama2c-to-ggml.cpp +55 -52
  48. package/src/llama.cpp/examples/cvector-generator/cvector-generator.cpp +15 -15
  49. package/src/llama.cpp/examples/cvector-generator/pca.hpp +3 -13
  50. package/src/llama.cpp/examples/embedding/embedding.cpp +143 -87
  51. package/src/llama.cpp/examples/eval-callback/eval-callback.cpp +33 -33
  52. package/src/llama.cpp/examples/export-lora/export-lora.cpp +36 -35
  53. package/src/llama.cpp/examples/gbnf-validator/gbnf-validator.cpp +14 -39
  54. package/src/llama.cpp/examples/gen-docs/CMakeLists.txt +5 -0
  55. package/src/llama.cpp/examples/gen-docs/gen-docs.cpp +83 -0
  56. package/src/llama.cpp/examples/gguf-split/gguf-split.cpp +58 -39
  57. package/src/llama.cpp/examples/gritlm/gritlm.cpp +34 -27
  58. package/src/llama.cpp/examples/imatrix/imatrix.cpp +59 -62
  59. package/src/llama.cpp/examples/infill/infill.cpp +117 -132
  60. package/src/llama.cpp/examples/llama-bench/llama-bench.cpp +265 -58
  61. package/src/llama.cpp/examples/llama.android/llama/src/main/cpp/llama-android.cpp +29 -22
  62. package/src/llama.cpp/examples/llava/CMakeLists.txt +7 -0
  63. package/src/llama.cpp/examples/llava/clip.cpp +685 -150
  64. package/src/llama.cpp/examples/llava/clip.h +11 -2
  65. package/src/llama.cpp/examples/llava/llava-cli.cpp +47 -58
  66. package/src/llama.cpp/examples/llava/llava.cpp +110 -24
  67. package/src/llama.cpp/examples/llava/llava.h +2 -3
  68. package/src/llama.cpp/examples/llava/minicpmv-cli.cpp +323 -0
  69. package/src/llama.cpp/examples/llava/requirements.txt +1 -0
  70. package/src/llama.cpp/examples/lookahead/lookahead.cpp +42 -43
  71. package/src/llama.cpp/examples/lookup/lookup-create.cpp +10 -8
  72. package/src/llama.cpp/examples/lookup/lookup-stats.cpp +23 -22
  73. package/src/llama.cpp/examples/lookup/lookup.cpp +40 -43
  74. package/src/llama.cpp/examples/main/main.cpp +210 -262
  75. package/src/llama.cpp/examples/parallel/parallel.cpp +49 -49
  76. package/src/llama.cpp/examples/passkey/passkey.cpp +42 -50
  77. package/src/llama.cpp/examples/perplexity/perplexity.cpp +187 -200
  78. package/src/llama.cpp/examples/quantize/CMakeLists.txt +1 -1
  79. package/src/llama.cpp/examples/quantize/quantize.cpp +27 -9
  80. package/src/llama.cpp/examples/quantize-stats/quantize-stats.cpp +2 -3
  81. package/src/llama.cpp/examples/retrieval/retrieval.cpp +49 -44
  82. package/src/llama.cpp/examples/rpc/rpc-server.cpp +24 -1
  83. package/src/llama.cpp/examples/save-load-state/save-load-state.cpp +32 -35
  84. package/src/llama.cpp/examples/server/CMakeLists.txt +3 -5
  85. package/src/llama.cpp/examples/server/server.cpp +1027 -1073
  86. package/src/llama.cpp/examples/server/tests/requirements.txt +2 -1
  87. package/src/llama.cpp/examples/server/utils.hpp +107 -105
  88. package/src/llama.cpp/examples/simple/simple.cpp +35 -41
  89. package/src/llama.cpp/examples/speculative/speculative.cpp +129 -103
  90. package/src/llama.cpp/examples/sycl/run-llama2.sh +10 -19
  91. package/src/llama.cpp/examples/sycl/win-run-llama2.bat +1 -1
  92. package/src/llama.cpp/examples/tokenize/tokenize.cpp +25 -27
  93. package/src/llama.cpp/ggml/CMakeLists.txt +14 -3
  94. package/src/llama.cpp/ggml/include/ggml-alloc.h +3 -3
  95. package/src/llama.cpp/ggml/include/ggml-backend.h +145 -60
  96. package/src/llama.cpp/ggml/include/ggml-blas.h +3 -3
  97. package/src/llama.cpp/ggml/include/ggml-cann.h +15 -19
  98. package/src/llama.cpp/ggml/include/ggml-cuda.h +16 -16
  99. package/src/llama.cpp/ggml/include/ggml-metal.h +5 -8
  100. package/src/llama.cpp/ggml/include/ggml-rpc.h +5 -5
  101. package/src/llama.cpp/ggml/include/ggml-sycl.h +8 -8
  102. package/src/llama.cpp/ggml/include/ggml-vulkan.h +7 -7
  103. package/src/llama.cpp/ggml/include/ggml.h +293 -186
  104. package/src/llama.cpp/ggml/src/CMakeLists.txt +86 -44
  105. package/src/llama.cpp/ggml/src/ggml-aarch64.c +2135 -1119
  106. package/src/llama.cpp/ggml/src/ggml-alloc.c +6 -0
  107. package/src/llama.cpp/ggml/src/ggml-backend-impl.h +152 -70
  108. package/src/llama.cpp/ggml/src/{ggml-backend.c → ggml-backend.cpp} +606 -286
  109. package/src/llama.cpp/ggml/src/ggml-blas.cpp +9 -10
  110. package/src/llama.cpp/ggml/src/ggml-cann/acl_tensor.cpp +4 -27
  111. package/src/llama.cpp/ggml/src/ggml-cann/acl_tensor.h +32 -4
  112. package/src/llama.cpp/ggml/src/ggml-cann/aclnn_ops.cpp +179 -41
  113. package/src/llama.cpp/ggml/src/ggml-cann/common.h +1 -0
  114. package/src/llama.cpp/ggml/src/ggml-cann/kernels/CMakeLists.txt +2 -1
  115. package/src/llama.cpp/ggml/src/ggml-cann/kernels/ascendc_kernels.h +2 -0
  116. package/src/llama.cpp/ggml/src/ggml-cann/kernels/quantize_float_to_q4_0.cpp +278 -0
  117. package/src/llama.cpp/ggml/src/ggml-cann.cpp +215 -216
  118. package/src/llama.cpp/ggml/src/ggml-common.h +20 -0
  119. package/src/llama.cpp/ggml/src/ggml-cpu-impl.h +614 -0
  120. package/src/llama.cpp/ggml/src/ggml-cuda/vendors/cuda.h +14 -0
  121. package/src/llama.cpp/ggml/src/ggml-cuda/vendors/hip.h +178 -0
  122. package/src/llama.cpp/ggml/src/ggml-cuda/vendors/musa.h +134 -0
  123. package/src/llama.cpp/ggml/src/ggml-impl.h +49 -603
  124. package/src/llama.cpp/ggml/src/ggml-kompute.cpp +4 -24
  125. package/src/llama.cpp/ggml/src/ggml-quants.c +972 -92
  126. package/src/llama.cpp/ggml/src/ggml-quants.h +15 -0
  127. package/src/llama.cpp/ggml/src/ggml-rpc.cpp +116 -66
  128. package/src/llama.cpp/ggml/src/ggml-sycl/backend.hpp +3 -0
  129. package/src/llama.cpp/ggml/src/ggml-sycl/common.cpp +11 -0
  130. package/src/llama.cpp/ggml/src/ggml-sycl/common.hpp +52 -0
  131. package/src/llama.cpp/ggml/src/ggml-sycl/conv.cpp +99 -0
  132. package/src/llama.cpp/ggml/src/ggml-sycl/conv.hpp +21 -0
  133. package/src/llama.cpp/ggml/src/ggml-sycl/convert.cpp +57 -57
  134. package/src/llama.cpp/ggml/src/ggml-sycl/convert.hpp +1 -1
  135. package/src/llama.cpp/ggml/src/ggml-sycl/dequantize.hpp +106 -106
  136. package/src/llama.cpp/ggml/src/ggml-sycl/dmmv.cpp +4 -4
  137. package/src/llama.cpp/ggml/src/ggml-sycl/dpct/helper.hpp +16 -3
  138. package/src/llama.cpp/ggml/src/ggml-sycl/gemm.hpp +101 -0
  139. package/src/llama.cpp/ggml/src/ggml-sycl/im2col.cpp +125 -0
  140. package/src/llama.cpp/ggml/src/ggml-sycl/im2col.hpp +23 -0
  141. package/src/llama.cpp/ggml/src/ggml-sycl/mmvq.cpp +1 -1
  142. package/src/llama.cpp/ggml/src/ggml-sycl/norm.cpp +6 -3
  143. package/src/llama.cpp/ggml/src/ggml-sycl/presets.hpp +2 -0
  144. package/src/llama.cpp/ggml/src/ggml-sycl/rope.cpp +1 -1
  145. package/src/llama.cpp/ggml/src/ggml-sycl/tsembd.cpp +71 -0
  146. package/src/llama.cpp/ggml/src/ggml-sycl/tsembd.hpp +21 -0
  147. package/src/llama.cpp/ggml/src/ggml-sycl.cpp +97 -169
  148. package/src/llama.cpp/ggml/src/ggml-vulkan.cpp +1508 -1124
  149. package/src/llama.cpp/ggml/src/ggml.c +3001 -1647
  150. package/src/llama.cpp/ggml/src/llamafile/sgemm.cpp +192 -0
  151. package/src/llama.cpp/ggml/src/vulkan-shaders/CMakeLists.txt +2 -0
  152. package/src/llama.cpp/ggml/src/vulkan-shaders/vulkan-shaders-gen.cpp +88 -40
  153. package/src/llama.cpp/include/llama.h +241 -264
  154. package/src/llama.cpp/models/ggml-vocab-chameleon.gguf.inp +112 -0
  155. package/src/llama.cpp/models/ggml-vocab-chameleon.gguf.out +46 -0
  156. package/src/llama.cpp/requirements/requirements-convert_legacy_llama.txt +1 -1
  157. package/src/llama.cpp/src/llama-grammar.cpp +721 -122
  158. package/src/llama.cpp/src/llama-grammar.h +120 -15
  159. package/src/llama.cpp/src/llama-impl.h +156 -1
  160. package/src/llama.cpp/src/llama-sampling.cpp +1375 -303
  161. package/src/llama.cpp/src/llama-sampling.h +20 -47
  162. package/src/llama.cpp/src/llama-vocab.cpp +343 -120
  163. package/src/llama.cpp/src/llama-vocab.h +33 -17
  164. package/src/llama.cpp/src/llama.cpp +4247 -1525
  165. package/src/llama.cpp/src/unicode-data.cpp +6 -4
  166. package/src/llama.cpp/src/unicode-data.h +4 -4
  167. package/src/llama.cpp/src/unicode.cpp +15 -7
  168. package/src/llama.cpp/tests/CMakeLists.txt +3 -0
  169. package/src/llama.cpp/tests/test-arg-parser.cpp +131 -0
  170. package/src/llama.cpp/tests/test-backend-ops.cpp +1592 -289
  171. package/src/llama.cpp/tests/test-barrier.cpp +93 -0
  172. package/src/llama.cpp/tests/test-grad0.cpp +187 -70
  173. package/src/llama.cpp/tests/test-grammar-integration.cpp +23 -38
  174. package/src/llama.cpp/tests/test-grammar-parser.cpp +6 -4
  175. package/src/llama.cpp/tests/test-json-schema-to-grammar.cpp +6 -4
  176. package/src/llama.cpp/tests/test-llama-grammar.cpp +9 -8
  177. package/src/llama.cpp/tests/test-log.cpp +39 -0
  178. package/src/llama.cpp/tests/test-quantize-fns.cpp +6 -0
  179. package/src/llama.cpp/tests/test-rope.cpp +1 -1
  180. package/src/llama.cpp/tests/test-sampling.cpp +157 -98
  181. package/src/llama.cpp/tests/test-tokenizer-0.cpp +55 -35
  182. package/patches/llama.patch +0 -22
  183. package/src/llama.cpp/.github/workflows/bench.yml +0 -310
  184. package/src/llama.cpp/common/grammar-parser.cpp +0 -536
  185. package/src/llama.cpp/common/grammar-parser.h +0 -29
  186. package/src/llama.cpp/examples/benchmark/CMakeLists.txt +0 -6
  187. package/src/llama.cpp/examples/benchmark/benchmark-matmult.cpp +0 -275
@@ -1,310 +0,0 @@
1
- # Benchmark
2
- name: Benchmark
3
-
4
- on:
5
- workflow_dispatch:
6
- inputs:
7
- gpu-series:
8
- description: 'Azure GPU series to run with'
9
- required: true
10
- type: choice
11
- options:
12
- - Standard_NC4as_T4_v3
13
- - Standard_NC24ads_A100_v4
14
- - Standard_NC80adis_H100_v5
15
- sha:
16
- description: 'Commit SHA1 to build'
17
- required: false
18
- type: string
19
- duration:
20
- description: 'Duration of the bench'
21
- type: string
22
- default: 10m
23
-
24
- push:
25
- branches:
26
- - master
27
- paths: ['llama.cpp', 'ggml.c', 'ggml-backend.c', 'ggml-quants.c', '**/*.cu', 'examples/server/*.h*', 'examples/server/*.cpp']
28
- pull_request_target:
29
- types: [opened, synchronize, reopened]
30
- paths: ['llama.cpp', 'ggml.c', 'ggml-backend.c', 'ggml-quants.c', '**/*.cu', 'examples/server/*.h*', 'examples/server/*.cpp']
31
- schedule:
32
- - cron: '04 2 * * *'
33
-
34
- concurrency:
35
- group: ${{ github.workflow }}-${{ github.ref }}-${{ github.head_ref || github.run_id }}-${{ github.event.inputs.sha }}
36
- cancel-in-progress: true
37
-
38
- jobs:
39
- bench-server-baseline:
40
- runs-on: Standard_NC4as_T4_v3
41
- env:
42
- RUNNER_LABEL: Standard_NC4as_T4_v3 # FIXME Do not find a way to not duplicate it
43
- N_USERS: 8
44
- DURATION: 10m
45
-
46
- strategy:
47
- matrix:
48
- model: [phi-2]
49
- ftype: [q4_0, q8_0, f16]
50
- include:
51
- - model: phi-2
52
- ftype: q4_0
53
- pr_comment_enabled: "true"
54
-
55
- if: |
56
- inputs.gpu-series == 'Standard_NC4as_T4_v3'
57
- || (
58
- github.event_name == 'schedule'
59
- && github.ref_name == 'master'
60
- && github.repository_owner == 'ggerganov'
61
- )
62
- || github.event_name == 'pull_request_target'
63
- || (
64
- github.event_name == 'push'
65
- && github.event.ref == 'refs/heads/master'
66
- && github.repository_owner == 'ggerganov'
67
- )
68
- steps:
69
- - name: Clone
70
- id: checkout
71
- uses: actions/checkout@v4
72
- with:
73
- fetch-depth: 0
74
- ref: ${{ github.event.inputs.sha || github.event.pull_request.head.sha || github.sha || github.head_ref || github.ref_name }}
75
-
76
- - name: Install python env
77
- id: pipenv
78
- run: |
79
- cd examples/server/bench
80
- python3 -m venv venv
81
- source venv/bin/activate
82
- pip install -r requirements.txt
83
-
84
- - name: Prometheus
85
- id: install_prometheus
86
- run: |
87
- wget --quiet https://github.com/prometheus/prometheus/releases/download/v2.51.0/prometheus-2.51.0.linux-amd64.tar.gz
88
- tar xzf prometheus*.tar.gz --strip-components=1
89
- ./prometheus --config.file=examples/server/bench/prometheus.yml &
90
- while ! nc -z localhost 9090; do
91
- sleep 0.1
92
- done
93
-
94
- - name: Set up Go
95
- uses: actions/setup-go@v5
96
- with:
97
- go-version: '1.21'
98
-
99
- - name: Install k6 and xk6-sse
100
- id: k6_installation
101
- run: |
102
- cd examples/server/bench
103
- go install go.k6.io/xk6/cmd/xk6@latest
104
- xk6 build master \
105
- --with github.com/phymbert/xk6-sse
106
-
107
- - name: Build
108
- id: cmake_build
109
- run: |
110
- set -eux
111
- cmake -B build \
112
- -DGGML_NATIVE=OFF \
113
- -DLLAMA_BUILD_SERVER=ON \
114
- -DLLAMA_CURL=ON \
115
- -DLLAMA_CUBLAS=ON \
116
- -DCUDAToolkit_ROOT=/usr/local/cuda \
117
- -DCMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc \
118
- -DCMAKE_CUDA_ARCHITECTURES=75 \
119
- -DLLAMA_FATAL_WARNINGS=OFF \
120
- -DLLAMA_ALL_WARNINGS=OFF \
121
- -DCMAKE_BUILD_TYPE=Release;
122
- cmake --build build --config Release -j $(nproc) --target llama-server
123
-
124
- - name: Download the dataset
125
- id: download_dataset
126
- run: |
127
- cd examples/server/bench
128
- wget --quiet https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json
129
-
130
- - name: Server bench
131
- id: server_bench
132
- run: |
133
- set -eux
134
-
135
- cd examples/server/bench
136
- source venv/bin/activate
137
- python bench.py \
138
- --runner-label ${{ env.RUNNER_LABEL }} \
139
- --name ${{ github.job }} \
140
- --branch ${{ github.head_ref || github.ref_name }} \
141
- --commit ${{ github.event.inputs.sha || github.event.pull_request.head.sha || github.sha }} \
142
- --scenario script.js \
143
- --duration ${{ github.event.inputs.duration || env.DURATION }} \
144
- --hf-repo ggml-org/models \
145
- --hf-file ${{ matrix.model }}/ggml-model-${{ matrix.ftype }}.gguf \
146
- --model-path-prefix /models \
147
- --parallel ${{ env.N_USERS }} \
148
- -ngl 33 \
149
- --batch-size 2048 \
150
- --ubatch-size 256 \
151
- --ctx-size 16384 \
152
- --n-prompts 1000 \
153
- --max-prompt-tokens 1024 \
154
- --max-tokens 2048
155
-
156
- cat results.github.env >> $GITHUB_ENV
157
-
158
- # Remove dataset as we do not want it in the artefact
159
- rm ShareGPT_V3_unfiltered_cleaned_split.json
160
-
161
- - uses: actions/upload-artifact@v4
162
- with:
163
- name: bench-server-${{ github.job }}-${{ env.RUNNER_LABEL }}-${{ matrix.model }}-${{ matrix.ftype }}
164
- compression-level: 9
165
- path: |
166
- examples/server/bench/*.jpg
167
- examples/server/bench/*.json
168
- examples/server/bench/*.log
169
-
170
- - name: Commit status
171
- uses: Sibz/github-status-action@v1
172
- with:
173
- authToken: ${{secrets.GITHUB_TOKEN}}
174
- sha: ${{ inputs.sha || github.event.pull_request.head.sha || github.sha }}
175
- context: bench-server-${{ github.job }}-${{ env.RUNNER_LABEL }}-${{ matrix.model }}-${{ matrix.ftype }}
176
- description: |
177
- ${{ env.BENCH_RESULTS }}
178
- state: 'success'
179
-
180
- - name: Upload benchmark images
181
- uses: devicons/public-upload-to-imgur@v2.2.2
182
- continue-on-error: true # Important as it looks unstable: 503
183
- id: imgur_step
184
- with:
185
- client_id: ${{secrets.IMGUR_CLIENT_ID}}
186
- path: |
187
- examples/server/bench/prompt_tokens_seconds.jpg
188
- examples/server/bench/predicted_tokens_seconds.jpg
189
- examples/server/bench/kv_cache_usage_ratio.jpg
190
- examples/server/bench/requests_processing.jpg
191
-
192
- - name: Extract mermaid
193
- id: set_mermaid
194
- run: |
195
- set -eux
196
-
197
- cd examples/server/bench
198
- PROMPT_TOKENS_SECONDS=$(cat prompt_tokens_seconds.mermaid)
199
- echo "PROMPT_TOKENS_SECONDS<<EOF" >> $GITHUB_ENV
200
- echo "$PROMPT_TOKENS_SECONDS" >> $GITHUB_ENV
201
- echo "EOF" >> $GITHUB_ENV
202
-
203
- PREDICTED_TOKENS_SECONDS=$(cat predicted_tokens_seconds.mermaid)
204
- echo "PREDICTED_TOKENS_SECONDS<<EOF" >> $GITHUB_ENV
205
- echo "$PREDICTED_TOKENS_SECONDS" >> $GITHUB_ENV
206
- echo "EOF" >> $GITHUB_ENV
207
-
208
- KV_CACHE_USAGE_RATIO=$(cat kv_cache_usage_ratio.mermaid)
209
- echo "KV_CACHE_USAGE_RATIO<<EOF" >> $GITHUB_ENV
210
- echo "$KV_CACHE_USAGE_RATIO" >> $GITHUB_ENV
211
- echo "EOF" >> $GITHUB_ENV
212
-
213
- REQUESTS_PROCESSING=$(cat requests_processing.mermaid)
214
- echo "REQUESTS_PROCESSING<<EOF" >> $GITHUB_ENV
215
- echo "$REQUESTS_PROCESSING" >> $GITHUB_ENV
216
- echo "EOF" >> $GITHUB_ENV
217
-
218
- - name: Extract image url
219
- id: extract_image_url
220
- continue-on-error: true
221
- run: |
222
- set -eux
223
-
224
- echo "IMAGE_O=${{ fromJSON(steps.imgur_step.outputs.imgur_urls)[0] }}" >> $GITHUB_ENV
225
- echo "IMAGE_1=${{ fromJSON(steps.imgur_step.outputs.imgur_urls)[1] }}" >> $GITHUB_ENV
226
- echo "IMAGE_2=${{ fromJSON(steps.imgur_step.outputs.imgur_urls)[2] }}" >> $GITHUB_ENV
227
- echo "IMAGE_3=${{ fromJSON(steps.imgur_step.outputs.imgur_urls)[3] }}" >> $GITHUB_ENV
228
-
229
- - name: Comment PR
230
- uses: mshick/add-pr-comment@v2
231
- id: comment_pr
232
- if: ${{ github.event.pull_request != '' && matrix.pr_comment_enabled == 'true' }}
233
- with:
234
- message-id: bench-server-${{ github.job }}-${{ env.RUNNER_LABEL }}-${{ matrix.model }}-${{ matrix.ftype }}
235
- message: |
236
- <p align="center">
237
-
238
- 📈 **llama.cpp server** for _${{ github.job }}_ on _${{ env.RUNNER_LABEL }}_ for `${{ matrix.model }}`-`${{ matrix.ftype }}`: **${{ env.BENCH_ITERATIONS}} iterations** 🚀
239
-
240
- </p>
241
-
242
- <details>
243
-
244
- <summary>Expand details for performance related PR only</summary>
245
-
246
- - Concurrent users: ${{ env.N_USERS }}, duration: ${{ github.event.inputs.duration || env.DURATION }}
247
- - HTTP request : avg=${{ env.HTTP_REQ_DURATION_AVG }}ms p(95)=${{ env.HTTP_REQ_DURATION_P_95_ }}ms fails=${{ env.HTTP_REQ_FAILED_PASSES }}, finish reason: stop=${{ env.LLAMACPP_COMPLETIONS_STOP_RATE_PASSES }} truncated=${{ env.LLAMACPP_COMPLETIONS_TRUNCATED_RATE_PASSES }}
248
- - Prompt processing (pp): avg=${{ env.LLAMACPP_PROMPT_PROCESSING_SECOND_AVG }}tk/s p(95)=${{ env.LLAMACPP_PROMPT_PROCESSING_SECOND_P_95_ }}tk/s
249
- - Token generation (tg): avg=${{ env.LLAMACPP_TOKENS_SECOND_AVG }}tk/s p(95)=${{ env.LLAMACPP_TOKENS_SECOND_P_95_ }}tk/s
250
- - ${{ env.BENCH_GRAPH_XLABEL }}
251
-
252
-
253
- <p align="center">
254
-
255
- <img width="100%" height="100%" src="${{ env.IMAGE_O }}" alt="prompt_tokens_seconds" />
256
-
257
- <details>
258
-
259
- <summary>More</summary>
260
-
261
- ```mermaid
262
- ${{ env.PROMPT_TOKENS_SECONDS }}
263
- ```
264
-
265
- </details>
266
-
267
- <img width="100%" height="100%" src="${{ env.IMAGE_1 }}" alt="predicted_tokens_seconds"/>
268
-
269
- <details>
270
- <summary>More</summary>
271
-
272
- ```mermaid
273
- ${{ env.PREDICTED_TOKENS_SECONDS }}
274
- ```
275
-
276
- </details>
277
-
278
- </p>
279
-
280
- <details>
281
-
282
- <summary>Details</summary>
283
-
284
- <p align="center">
285
-
286
- <img width="100%" height="100%" src="${{ env.IMAGE_2 }}" alt="kv_cache_usage_ratio" />
287
-
288
- <details>
289
- <summary>More</summary>
290
-
291
- ```mermaid
292
- ${{ env.KV_CACHE_USAGE_RATIO }}
293
- ```
294
-
295
- </details>
296
-
297
- <img width="100%" height="100%" src="${{ env.IMAGE_3 }}" alt="requests_processing"/>
298
-
299
- <details>
300
- <summary>More</summary>
301
-
302
- ```mermaid
303
- ${{ env.REQUESTS_PROCESSING }}
304
- ```
305
-
306
- </details>
307
-
308
- </p>
309
- </details>
310
- </details>