llama-cpp-pydist 0.20.0__py3-none-any.whl → 0.21.0__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- llama_cpp/binaries/{llama-b7621-bin-win-cpu-x64.zip → llama-b7631-bin-win-cpu-x64.zip} +0 -0
- {llama_cpp_pydist-0.20.0.dist-info → llama_cpp_pydist-0.21.0.dist-info}/METADATA +146 -1
- {llama_cpp_pydist-0.20.0.dist-info → llama_cpp_pydist-0.21.0.dist-info}/RECORD +76 -73
- vendor_llama_cpp_pydist/llama.cpp/.github/workflows/build.yml +18 -6
- vendor_llama_cpp_pydist/llama.cpp/.github/workflows/release.yml +3 -1
- vendor_llama_cpp_pydist/llama.cpp/.github/workflows/server.yml +18 -0
- vendor_llama_cpp_pydist/llama.cpp/ci/run.sh +2 -1
- vendor_llama_cpp_pydist/llama.cpp/common/arg.cpp +7 -0
- vendor_llama_cpp_pydist/llama.cpp/common/chat.cpp +4 -4
- vendor_llama_cpp_pydist/llama.cpp/common/common.cpp +19 -0
- vendor_llama_cpp_pydist/llama.cpp/common/common.h +4 -0
- vendor_llama_cpp_pydist/llama.cpp/common/llguidance.cpp +10 -6
- vendor_llama_cpp_pydist/llama.cpp/common/regex-partial.cpp +13 -13
- vendor_llama_cpp_pydist/llama.cpp/common/sampling.cpp +58 -14
- vendor_llama_cpp_pydist/llama.cpp/common/sampling.h +3 -1
- vendor_llama_cpp_pydist/llama.cpp/convert_hf_to_gguf.py +10 -4
- vendor_llama_cpp_pydist/llama.cpp/docs/backend/CANN.md +4 -0
- vendor_llama_cpp_pydist/llama.cpp/docs/backend/OPENCL.md +50 -0
- vendor_llama_cpp_pydist/llama.cpp/ggml/src/ggml-cann/aclnn_ops.cpp +55 -0
- vendor_llama_cpp_pydist/llama.cpp/ggml/src/ggml-cann/aclnn_ops.h +14 -0
- vendor_llama_cpp_pydist/llama.cpp/ggml/src/ggml-cann/ggml-cann.cpp +44 -0
- vendor_llama_cpp_pydist/llama.cpp/ggml/src/ggml-cuda/CMakeLists.txt +24 -0
- vendor_llama_cpp_pydist/llama.cpp/ggml/src/ggml-cuda/argsort.cu +50 -29
- vendor_llama_cpp_pydist/llama.cpp/ggml/src/ggml-cuda/argsort.cuh +16 -0
- vendor_llama_cpp_pydist/llama.cpp/ggml/src/ggml-cuda/common.cuh +9 -9
- vendor_llama_cpp_pydist/llama.cpp/ggml/src/ggml-cuda/cumsum.cu +37 -3
- vendor_llama_cpp_pydist/llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu +22 -8
- vendor_llama_cpp_pydist/llama.cpp/ggml/src/ggml-cuda/softmax.cu +203 -6
- vendor_llama_cpp_pydist/llama.cpp/ggml/src/ggml-cuda/top-k.cu +96 -0
- vendor_llama_cpp_pydist/llama.cpp/ggml/src/ggml-cuda/top-k.cuh +3 -0
- vendor_llama_cpp_pydist/llama.cpp/ggml/src/ggml-cuda/vendors/hip.h +3 -0
- vendor_llama_cpp_pydist/llama.cpp/ggml/src/ggml-cuda/vendors/musa.h +1 -0
- vendor_llama_cpp_pydist/llama.cpp/ggml/src/ggml-vulkan/ggml-vulkan.cpp +32 -25
- vendor_llama_cpp_pydist/llama.cpp/ggml/src/ggml-vulkan/vulkan-shaders/quantize_q8_1.comp +8 -8
- vendor_llama_cpp_pydist/llama.cpp/ggml/src/ggml-vulkan/vulkan-shaders/topk_moe.comp +12 -7
- vendor_llama_cpp_pydist/llama.cpp/include/llama.h +86 -8
- vendor_llama_cpp_pydist/llama.cpp/src/llama-context.cpp +602 -18
- vendor_llama_cpp_pydist/llama.cpp/src/llama-context.h +43 -1
- vendor_llama_cpp_pydist/llama.cpp/src/llama-grammar.cpp +40 -13
- vendor_llama_cpp_pydist/llama.cpp/src/llama-grammar.h +2 -0
- vendor_llama_cpp_pydist/llama.cpp/src/llama-graph.cpp +166 -2
- vendor_llama_cpp_pydist/llama.cpp/src/llama-graph.h +71 -6
- vendor_llama_cpp_pydist/llama.cpp/src/llama-hparams.h +2 -2
- vendor_llama_cpp_pydist/llama.cpp/src/llama-model.cpp +43 -11
- vendor_llama_cpp_pydist/llama.cpp/src/llama-sampling.cpp +1232 -170
- vendor_llama_cpp_pydist/llama.cpp/src/llama-sampling.h +16 -7
- vendor_llama_cpp_pydist/llama.cpp/src/llama.cpp +1 -1
- vendor_llama_cpp_pydist/llama.cpp/src/models/afmoe.cpp +9 -5
- vendor_llama_cpp_pydist/llama.cpp/src/models/cohere2-iswa.cpp +3 -0
- vendor_llama_cpp_pydist/llama.cpp/src/models/gemma2-iswa.cpp +5 -2
- vendor_llama_cpp_pydist/llama.cpp/src/models/llama-iswa.cpp +6 -2
- vendor_llama_cpp_pydist/llama.cpp/src/models/modern-bert.cpp +4 -3
- vendor_llama_cpp_pydist/llama.cpp/src/models/openai-moe-iswa.cpp +5 -2
- vendor_llama_cpp_pydist/llama.cpp/src/models/smallthinker.cpp +11 -5
- vendor_llama_cpp_pydist/llama.cpp/tests/CMakeLists.txt +12 -2
- vendor_llama_cpp_pydist/llama.cpp/tests/test-backend-ops.cpp +93 -4
- vendor_llama_cpp_pydist/llama.cpp/tests/test-backend-sampler.cpp +1237 -0
- vendor_llama_cpp_pydist/llama.cpp/tests/test-regex-partial.cpp +14 -14
- vendor_llama_cpp_pydist/llama.cpp/tools/mtmd/clip.cpp +8 -0
- vendor_llama_cpp_pydist/llama.cpp/tools/mtmd/models/siglip.cpp +9 -4
- vendor_llama_cpp_pydist/llama.cpp/tools/server/public/index.html.gz +0 -0
- vendor_llama_cpp_pydist/llama.cpp/tools/server/server-common.cpp +12 -7
- vendor_llama_cpp_pydist/llama.cpp/tools/server/server-context.cpp +19 -0
- vendor_llama_cpp_pydist/llama.cpp/tools/server/server-models.cpp +47 -5
- vendor_llama_cpp_pydist/llama.cpp/tools/server/server-models.h +3 -3
- vendor_llama_cpp_pydist/llama.cpp/tools/server/server-task.cpp +3 -0
- vendor_llama_cpp_pydist/llama.cpp/tools/server/server.cpp +2 -2
- vendor_llama_cpp_pydist/llama.cpp/tools/server/webui/src/lib/components/app/chat/ChatSettings/ChatSettings.svelte +5 -0
- vendor_llama_cpp_pydist/llama.cpp/tools/server/webui/src/lib/constants/settings-config.ts +3 -0
- vendor_llama_cpp_pydist/llama.cpp/tools/server/webui/src/lib/services/chat.ts +3 -0
- vendor_llama_cpp_pydist/llama.cpp/tools/server/webui/src/lib/stores/chat.svelte.ts +2 -0
- vendor_llama_cpp_pydist/llama.cpp/tools/server/webui/src/lib/types/api.d.ts +3 -0
- vendor_llama_cpp_pydist/llama.cpp/tools/server/webui/src/lib/types/settings.d.ts +1 -0
- {llama_cpp_pydist-0.20.0.dist-info → llama_cpp_pydist-0.21.0.dist-info}/LICENSE +0 -0
- {llama_cpp_pydist-0.20.0.dist-info → llama_cpp_pydist-0.21.0.dist-info}/WHEEL +0 -0
- {llama_cpp_pydist-0.20.0.dist-info → llama_cpp_pydist-0.21.0.dist-info}/top_level.txt +0 -0
|
Binary file
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.1
|
|
2
2
|
Name: llama-cpp-pydist
|
|
3
|
-
Version: 0.
|
|
3
|
+
Version: 0.21.0
|
|
4
4
|
Summary: A Python package for Llama CPP.
|
|
5
5
|
Home-page: https://github.com/shamitv/llama_cpp
|
|
6
6
|
Author: Shamit Verma
|
|
@@ -136,6 +136,151 @@ For instructions on how to build the package from source, update the `llama.cpp`
|
|
|
136
136
|
|
|
137
137
|
# Changelog
|
|
138
138
|
|
|
139
|
+
## 2026-01-05: Update to llama.cpp b7631
|
|
140
|
+
|
|
141
|
+
- b7622 (b7622) – 2026-01-03 – https://github.com/ggml-org/llama.cpp/releases/tag/b7622
|
|
142
|
+
- [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b7622/llama-b7622-bin-macos-arm64.tar.gz)
|
|
143
|
+
- [macOS Intel (x64)](https://github.com/ggml-org/llama.cpp/releases/download/b7622/llama-b7622-bin-macos-x64.tar.gz)
|
|
144
|
+
- [iOS XCFramework](https://github.com/ggml-org/llama.cpp/releases/download/b7622/llama-b7622-xcframework.zip)
|
|
145
|
+
- [Ubuntu x64 (CPU)](https://github.com/ggml-org/llama.cpp/releases/download/b7622/llama-b7622-bin-ubuntu-x64.tar.gz)
|
|
146
|
+
- [Ubuntu x64 (Vulkan)](https://github.com/ggml-org/llama.cpp/releases/download/b7622/llama-b7622-bin-ubuntu-vulkan-x64.tar.gz)
|
|
147
|
+
- [Ubuntu s390x (CPU)](https://github.com/ggml-org/llama.cpp/releases/download/b7622/llama-b7622-bin-ubuntu-s390x.tar.gz)
|
|
148
|
+
- [Windows x64 (CPU)](https://github.com/ggml-org/llama.cpp/releases/download/b7622/llama-b7622-bin-win-cpu-x64.zip)
|
|
149
|
+
- [Windows arm64 (CPU)](https://github.com/ggml-org/llama.cpp/releases/download/b7622/llama-b7622-bin-win-cpu-arm64.zip)
|
|
150
|
+
- [Windows x64 (CUDA 12)](https://github.com/ggml-org/llama.cpp/releases/download/b7622/llama-b7622-bin-win-cuda-12.4-x64.zip) - [CUDA 12.4 DLLs](https://github.com/ggml-org/llama.cpp/releases/download/b7622/cudart-llama-bin-win-cuda-12.4-x64.zip)
|
|
151
|
+
- [Windows x64 (CUDA 13)](https://github.com/ggml-org/llama.cpp/releases/download/b7622/llama-b7622-bin-win-cuda-13.1-x64.zip) - [CUDA 13.1 DLLs](https://github.com/ggml-org/llama.cpp/releases/download/b7622/cudart-llama-bin-win-cuda-13.1-x64.zip)
|
|
152
|
+
- [Windows x64 (Vulkan)](https://github.com/ggml-org/llama.cpp/releases/download/b7622/llama-b7622-bin-win-vulkan-x64.zip)
|
|
153
|
+
- [Windows x64 (SYCL)](https://github.com/ggml-org/llama.cpp/releases/download/b7622/llama-b7622-bin-win-sycl-x64.zip)
|
|
154
|
+
- [Windows x64 (HIP)](https://github.com/ggml-org/llama.cpp/releases/download/b7622/llama-b7622-bin-win-hip-radeon-x64.zip)
|
|
155
|
+
- [openEuler x86 (310p)](https://github.com/ggml-org/llama.cpp/releases/download/b7622/llama-b7622-bin-310p-openEuler-x86.tar.gz)
|
|
156
|
+
- [openEuler x86 (910b)](https://github.com/ggml-org/llama.cpp/releases/download/b7622/llama-b7622-bin-910b-openEuler-x86.tar.gz)
|
|
157
|
+
- [openEuler aarch64 (310p)](https://github.com/ggml-org/llama.cpp/releases/download/b7622/llama-b7622-bin-310p-openEuler-aarch64.tar.gz)
|
|
158
|
+
- [openEuler aarch64 (910b)](https://github.com/ggml-org/llama.cpp/releases/download/b7622/llama-b7622-bin-910b-openEuler-aarch64.tar.gz)
|
|
159
|
+
- b7624 (b7624) – 2026-01-04 – https://github.com/ggml-org/llama.cpp/releases/tag/b7624
|
|
160
|
+
- [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b7624/llama-b7624-bin-macos-arm64.tar.gz)
|
|
161
|
+
- [macOS Intel (x64)](https://github.com/ggml-org/llama.cpp/releases/download/b7624/llama-b7624-bin-macos-x64.tar.gz)
|
|
162
|
+
- [iOS XCFramework](https://github.com/ggml-org/llama.cpp/releases/download/b7624/llama-b7624-xcframework.zip)
|
|
163
|
+
- [Ubuntu x64 (CPU)](https://github.com/ggml-org/llama.cpp/releases/download/b7624/llama-b7624-bin-ubuntu-x64.tar.gz)
|
|
164
|
+
- [Ubuntu x64 (Vulkan)](https://github.com/ggml-org/llama.cpp/releases/download/b7624/llama-b7624-bin-ubuntu-vulkan-x64.tar.gz)
|
|
165
|
+
- [Ubuntu s390x (CPU)](https://github.com/ggml-org/llama.cpp/releases/download/b7624/llama-b7624-bin-ubuntu-s390x.tar.gz)
|
|
166
|
+
- [Windows x64 (CPU)](https://github.com/ggml-org/llama.cpp/releases/download/b7624/llama-b7624-bin-win-cpu-x64.zip)
|
|
167
|
+
- [Windows arm64 (CPU)](https://github.com/ggml-org/llama.cpp/releases/download/b7624/llama-b7624-bin-win-cpu-arm64.zip)
|
|
168
|
+
- [Windows x64 (CUDA 12)](https://github.com/ggml-org/llama.cpp/releases/download/b7624/llama-b7624-bin-win-cuda-12.4-x64.zip) - [CUDA 12.4 DLLs](https://github.com/ggml-org/llama.cpp/releases/download/b7624/cudart-llama-bin-win-cuda-12.4-x64.zip)
|
|
169
|
+
- [Windows x64 (CUDA 13)](https://github.com/ggml-org/llama.cpp/releases/download/b7624/llama-b7624-bin-win-cuda-13.1-x64.zip) - [CUDA 13.1 DLLs](https://github.com/ggml-org/llama.cpp/releases/download/b7624/cudart-llama-bin-win-cuda-13.1-x64.zip)
|
|
170
|
+
- [Windows x64 (Vulkan)](https://github.com/ggml-org/llama.cpp/releases/download/b7624/llama-b7624-bin-win-vulkan-x64.zip)
|
|
171
|
+
- [Windows x64 (SYCL)](https://github.com/ggml-org/llama.cpp/releases/download/b7624/llama-b7624-bin-win-sycl-x64.zip)
|
|
172
|
+
- [Windows x64 (HIP)](https://github.com/ggml-org/llama.cpp/releases/download/b7624/llama-b7624-bin-win-hip-radeon-x64.zip)
|
|
173
|
+
- [openEuler x86 (310p)](https://github.com/ggml-org/llama.cpp/releases/download/b7624/llama-b7624-bin-310p-openEuler-x86.tar.gz)
|
|
174
|
+
- [openEuler x86 (910b)](https://github.com/ggml-org/llama.cpp/releases/download/b7624/llama-b7624-bin-910b-openEuler-x86.tar.gz)
|
|
175
|
+
- [openEuler aarch64 (310p)](https://github.com/ggml-org/llama.cpp/releases/download/b7624/llama-b7624-bin-310p-openEuler-aarch64.tar.gz)
|
|
176
|
+
- [openEuler aarch64 (910b)](https://github.com/ggml-org/llama.cpp/releases/download/b7624/llama-b7624-bin-910b-openEuler-aarch64.tar.gz)
|
|
177
|
+
- b7625 (b7625) – 2026-01-04 – https://github.com/ggml-org/llama.cpp/releases/tag/b7625
|
|
178
|
+
- CUDA: disable cuda graph when using n-cpu-moe
|
|
179
|
+
- call ggml_cuda_set_device
|
|
180
|
+
- [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b7625/llama-b7625-bin-macos-arm64.tar.gz)
|
|
181
|
+
- [macOS Intel (x64)](https://github.com/ggml-org/llama.cpp/releases/download/b7625/llama-b7625-bin-macos-x64.tar.gz)
|
|
182
|
+
- [iOS XCFramework](https://github.com/ggml-org/llama.cpp/releases/download/b7625/llama-b7625-xcframework.zip)
|
|
183
|
+
- [Ubuntu x64 (CPU)](https://github.com/ggml-org/llama.cpp/releases/download/b7625/llama-b7625-bin-ubuntu-x64.tar.gz)
|
|
184
|
+
- [Ubuntu x64 (Vulkan)](https://github.com/ggml-org/llama.cpp/releases/download/b7625/llama-b7625-bin-ubuntu-vulkan-x64.tar.gz)
|
|
185
|
+
- [Ubuntu s390x (CPU)](https://github.com/ggml-org/llama.cpp/releases/download/b7625/llama-b7625-bin-ubuntu-s390x.tar.gz)
|
|
186
|
+
- [Windows x64 (CPU)](https://github.com/ggml-org/llama.cpp/releases/download/b7625/llama-b7625-bin-win-cpu-x64.zip)
|
|
187
|
+
- [Windows arm64 (CPU)](https://github.com/ggml-org/llama.cpp/releases/download/b7625/llama-b7625-bin-win-cpu-arm64.zip)
|
|
188
|
+
- [Windows x64 (CUDA 12)](https://github.com/ggml-org/llama.cpp/releases/download/b7625/llama-b7625-bin-win-cuda-12.4-x64.zip) - [CUDA 12.4 DLLs](https://github.com/ggml-org/llama.cpp/releases/download/b7625/cudart-llama-bin-win-cuda-12.4-x64.zip)
|
|
189
|
+
- [Windows x64 (CUDA 13)](https://github.com/ggml-org/llama.cpp/releases/download/b7625/llama-b7625-bin-win-cuda-13.1-x64.zip) - [CUDA 13.1 DLLs](https://github.com/ggml-org/llama.cpp/releases/download/b7625/cudart-llama-bin-win-cuda-13.1-x64.zip)
|
|
190
|
+
- [Windows x64 (Vulkan)](https://github.com/ggml-org/llama.cpp/releases/download/b7625/llama-b7625-bin-win-vulkan-x64.zip)
|
|
191
|
+
- [Windows x64 (SYCL)](https://github.com/ggml-org/llama.cpp/releases/download/b7625/llama-b7625-bin-win-sycl-x64.zip)
|
|
192
|
+
- [Windows x64 (HIP)](https://github.com/ggml-org/llama.cpp/releases/download/b7625/llama-b7625-bin-win-hip-radeon-x64.zip)
|
|
193
|
+
- [openEuler x86 (310p)](https://github.com/ggml-org/llama.cpp/releases/download/b7625/llama-b7625-bin-310p-openEuler-x86.tar.gz)
|
|
194
|
+
- [openEuler x86 (910b)](https://github.com/ggml-org/llama.cpp/releases/download/b7625/llama-b7625-bin-910b-openEuler-x86.tar.gz)
|
|
195
|
+
- [openEuler aarch64 (310p)](https://github.com/ggml-org/llama.cpp/releases/download/b7625/llama-b7625-bin-310p-openEuler-aarch64.tar.gz)
|
|
196
|
+
- [openEuler aarch64 (910b)](https://github.com/ggml-org/llama.cpp/releases/download/b7625/llama-b7625-bin-910b-openEuler-aarch64.tar.gz)
|
|
197
|
+
- b7626 (b7626) – 2026-01-04 – https://github.com/ggml-org/llama.cpp/releases/tag/b7626
|
|
198
|
+
- [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b7626/llama-b7626-bin-macos-arm64.tar.gz)
|
|
199
|
+
- [macOS Intel (x64)](https://github.com/ggml-org/llama.cpp/releases/download/b7626/llama-b7626-bin-macos-x64.tar.gz)
|
|
200
|
+
- [iOS XCFramework](https://github.com/ggml-org/llama.cpp/releases/download/b7626/llama-b7626-xcframework.zip)
|
|
201
|
+
- [Ubuntu x64 (CPU)](https://github.com/ggml-org/llama.cpp/releases/download/b7626/llama-b7626-bin-ubuntu-x64.tar.gz)
|
|
202
|
+
- [Ubuntu x64 (Vulkan)](https://github.com/ggml-org/llama.cpp/releases/download/b7626/llama-b7626-bin-ubuntu-vulkan-x64.tar.gz)
|
|
203
|
+
- [Ubuntu s390x (CPU)](https://github.com/ggml-org/llama.cpp/releases/download/b7626/llama-b7626-bin-ubuntu-s390x.tar.gz)
|
|
204
|
+
- [Windows x64 (CPU)](https://github.com/ggml-org/llama.cpp/releases/download/b7626/llama-b7626-bin-win-cpu-x64.zip)
|
|
205
|
+
- [Windows arm64 (CPU)](https://github.com/ggml-org/llama.cpp/releases/download/b7626/llama-b7626-bin-win-cpu-arm64.zip)
|
|
206
|
+
- [Windows x64 (CUDA 12)](https://github.com/ggml-org/llama.cpp/releases/download/b7626/llama-b7626-bin-win-cuda-12.4-x64.zip) - [CUDA 12.4 DLLs](https://github.com/ggml-org/llama.cpp/releases/download/b7626/cudart-llama-bin-win-cuda-12.4-x64.zip)
|
|
207
|
+
- [Windows x64 (CUDA 13)](https://github.com/ggml-org/llama.cpp/releases/download/b7626/llama-b7626-bin-win-cuda-13.1-x64.zip) - [CUDA 13.1 DLLs](https://github.com/ggml-org/llama.cpp/releases/download/b7626/cudart-llama-bin-win-cuda-13.1-x64.zip)
|
|
208
|
+
- [Windows x64 (Vulkan)](https://github.com/ggml-org/llama.cpp/releases/download/b7626/llama-b7626-bin-win-vulkan-x64.zip)
|
|
209
|
+
- [Windows x64 (SYCL)](https://github.com/ggml-org/llama.cpp/releases/download/b7626/llama-b7626-bin-win-sycl-x64.zip)
|
|
210
|
+
- [Windows x64 (HIP)](https://github.com/ggml-org/llama.cpp/releases/download/b7626/llama-b7626-bin-win-hip-radeon-x64.zip)
|
|
211
|
+
- [openEuler x86 (310p)](https://github.com/ggml-org/llama.cpp/releases/download/b7626/llama-b7626-bin-310p-openEuler-x86.tar.gz)
|
|
212
|
+
- [openEuler x86 (910b)](https://github.com/ggml-org/llama.cpp/releases/download/b7626/llama-b7626-bin-910b-openEuler-x86.tar.gz)
|
|
213
|
+
- [openEuler aarch64 (310p)](https://github.com/ggml-org/llama.cpp/releases/download/b7626/llama-b7626-bin-310p-openEuler-aarch64.tar.gz)
|
|
214
|
+
- [openEuler aarch64 (910b)](https://github.com/ggml-org/llama.cpp/releases/download/b7626/llama-b7626-bin-910b-openEuler-aarch64.tar.gz)
|
|
215
|
+
- b7628 (b7628) – 2026-01-05 – https://github.com/ggml-org/llama.cpp/releases/tag/b7628
|
|
216
|
+
- [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b7628/llama-b7628-bin-macos-arm64.tar.gz)
|
|
217
|
+
- [macOS Intel (x64)](https://github.com/ggml-org/llama.cpp/releases/download/b7628/llama-b7628-bin-macos-x64.tar.gz)
|
|
218
|
+
- [iOS XCFramework](https://github.com/ggml-org/llama.cpp/releases/download/b7628/llama-b7628-xcframework.zip)
|
|
219
|
+
- [Ubuntu x64 (CPU)](https://github.com/ggml-org/llama.cpp/releases/download/b7628/llama-b7628-bin-ubuntu-x64.tar.gz)
|
|
220
|
+
- [Ubuntu x64 (Vulkan)](https://github.com/ggml-org/llama.cpp/releases/download/b7628/llama-b7628-bin-ubuntu-vulkan-x64.tar.gz)
|
|
221
|
+
- [Ubuntu s390x (CPU)](https://github.com/ggml-org/llama.cpp/releases/download/b7628/llama-b7628-bin-ubuntu-s390x.tar.gz)
|
|
222
|
+
- [Windows x64 (CPU)](https://github.com/ggml-org/llama.cpp/releases/download/b7628/llama-b7628-bin-win-cpu-x64.zip)
|
|
223
|
+
- [Windows arm64 (CPU)](https://github.com/ggml-org/llama.cpp/releases/download/b7628/llama-b7628-bin-win-cpu-arm64.zip)
|
|
224
|
+
- [Windows x64 (CUDA 12)](https://github.com/ggml-org/llama.cpp/releases/download/b7628/llama-b7628-bin-win-cuda-12.4-x64.zip) - [CUDA 12.4 DLLs](https://github.com/ggml-org/llama.cpp/releases/download/b7628/cudart-llama-bin-win-cuda-12.4-x64.zip)
|
|
225
|
+
- [Windows x64 (CUDA 13)](https://github.com/ggml-org/llama.cpp/releases/download/b7628/llama-b7628-bin-win-cuda-13.1-x64.zip) - [CUDA 13.1 DLLs](https://github.com/ggml-org/llama.cpp/releases/download/b7628/cudart-llama-bin-win-cuda-13.1-x64.zip)
|
|
226
|
+
- [Windows x64 (Vulkan)](https://github.com/ggml-org/llama.cpp/releases/download/b7628/llama-b7628-bin-win-vulkan-x64.zip)
|
|
227
|
+
- [Windows x64 (SYCL)](https://github.com/ggml-org/llama.cpp/releases/download/b7628/llama-b7628-bin-win-sycl-x64.zip)
|
|
228
|
+
- [Windows x64 (HIP)](https://github.com/ggml-org/llama.cpp/releases/download/b7628/llama-b7628-bin-win-hip-radeon-x64.zip)
|
|
229
|
+
- [openEuler x86 (310p)](https://github.com/ggml-org/llama.cpp/releases/download/b7628/llama-b7628-bin-310p-openEuler-x86.tar.gz)
|
|
230
|
+
- [openEuler x86 (910b)](https://github.com/ggml-org/llama.cpp/releases/download/b7628/llama-b7628-bin-910b-openEuler-x86.tar.gz)
|
|
231
|
+
- [openEuler aarch64 (310p)](https://github.com/ggml-org/llama.cpp/releases/download/b7628/llama-b7628-bin-310p-openEuler-aarch64.tar.gz)
|
|
232
|
+
- [openEuler aarch64 (910b)](https://github.com/ggml-org/llama.cpp/releases/download/b7628/llama-b7628-bin-910b-openEuler-aarch64.tar.gz)
|
|
233
|
+
- b7630 (b7630) – 2026-01-05 – https://github.com/ggml-org/llama.cpp/releases/tag/b7630
|
|
234
|
+
- Implement ggml_cann_op_add_rms_norm_fused() using ACLNN AddRmsNorm
|
|
235
|
+
- Add ggml_cann_can_fuse() to check fusion eligibility
|
|
236
|
+
- Integrate fusion logic into computation graph evaluation
|
|
237
|
+
- Add test cases for ADD + RMS_NORM fusion
|
|
238
|
+
- Update documentation with new environment variable
|
|
239
|
+
- [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b7630/llama-b7630-bin-macos-arm64.tar.gz)
|
|
240
|
+
- [macOS Intel (x64)](https://github.com/ggml-org/llama.cpp/releases/download/b7630/llama-b7630-bin-macos-x64.tar.gz)
|
|
241
|
+
- [iOS XCFramework](https://github.com/ggml-org/llama.cpp/releases/download/b7630/llama-b7630-xcframework.zip)
|
|
242
|
+
- [Ubuntu x64 (CPU)](https://github.com/ggml-org/llama.cpp/releases/download/b7630/llama-b7630-bin-ubuntu-x64.tar.gz)
|
|
243
|
+
- [Ubuntu x64 (Vulkan)](https://github.com/ggml-org/llama.cpp/releases/download/b7630/llama-b7630-bin-ubuntu-vulkan-x64.tar.gz)
|
|
244
|
+
- [Ubuntu s390x (CPU)](https://github.com/ggml-org/llama.cpp/releases/download/b7630/llama-b7630-bin-ubuntu-s390x.tar.gz)
|
|
245
|
+
- [Windows x64 (CPU)](https://github.com/ggml-org/llama.cpp/releases/download/b7630/llama-b7630-bin-win-cpu-x64.zip)
|
|
246
|
+
- [Windows arm64 (CPU)](https://github.com/ggml-org/llama.cpp/releases/download/b7630/llama-b7630-bin-win-cpu-arm64.zip)
|
|
247
|
+
- [Windows x64 (CUDA 12)](https://github.com/ggml-org/llama.cpp/releases/download/b7630/llama-b7630-bin-win-cuda-12.4-x64.zip) - [CUDA 12.4 DLLs](https://github.com/ggml-org/llama.cpp/releases/download/b7630/cudart-llama-bin-win-cuda-12.4-x64.zip)
|
|
248
|
+
- [Windows x64 (CUDA 13)](https://github.com/ggml-org/llama.cpp/releases/download/b7630/llama-b7630-bin-win-cuda-13.1-x64.zip) - [CUDA 13.1 DLLs](https://github.com/ggml-org/llama.cpp/releases/download/b7630/cudart-llama-bin-win-cuda-13.1-x64.zip)
|
|
249
|
+
- [Windows x64 (Vulkan)](https://github.com/ggml-org/llama.cpp/releases/download/b7630/llama-b7630-bin-win-vulkan-x64.zip)
|
|
250
|
+
- [Windows x64 (SYCL)](https://github.com/ggml-org/llama.cpp/releases/download/b7630/llama-b7630-bin-win-sycl-x64.zip)
|
|
251
|
+
- [Windows x64 (HIP)](https://github.com/ggml-org/llama.cpp/releases/download/b7630/llama-b7630-bin-win-hip-radeon-x64.zip)
|
|
252
|
+
- [openEuler x86 (310p)](https://github.com/ggml-org/llama.cpp/releases/download/b7630/llama-b7630-bin-310p-openEuler-x86.tar.gz)
|
|
253
|
+
- [openEuler x86 (910b)](https://github.com/ggml-org/llama.cpp/releases/download/b7630/llama-b7630-bin-910b-openEuler-x86.tar.gz)
|
|
254
|
+
- [openEuler aarch64 (310p)](https://github.com/ggml-org/llama.cpp/releases/download/b7630/llama-b7630-bin-310p-openEuler-aarch64.tar.gz)
|
|
255
|
+
- [openEuler aarch64 (910b)](https://github.com/ggml-org/llama.cpp/releases/download/b7630/llama-b7630-bin-910b-openEuler-aarch64.tar.gz)
|
|
256
|
+
- b7631 (b7631) – 2026-01-05 – https://github.com/ggml-org/llama.cpp/releases/tag/b7631
|
|
257
|
+
- refactor rope_freq_base/scale_swa conversion and init
|
|
258
|
+
- safe defaults for unknowns
|
|
259
|
+
- update relevant models
|
|
260
|
+
- grammar
|
|
261
|
+
- add get_rope_freq_scale to modern-bert
|
|
262
|
+
- const
|
|
263
|
+
- const
|
|
264
|
+
- log swa info
|
|
265
|
+
- [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b7631/llama-b7631-bin-macos-arm64.tar.gz)
|
|
266
|
+
- [macOS Intel (x64)](https://github.com/ggml-org/llama.cpp/releases/download/b7631/llama-b7631-bin-macos-x64.tar.gz)
|
|
267
|
+
- [iOS XCFramework](https://github.com/ggml-org/llama.cpp/releases/download/b7631/llama-b7631-xcframework.zip)
|
|
268
|
+
- [Ubuntu x64 (CPU)](https://github.com/ggml-org/llama.cpp/releases/download/b7631/llama-b7631-bin-ubuntu-x64.tar.gz)
|
|
269
|
+
- [Ubuntu x64 (Vulkan)](https://github.com/ggml-org/llama.cpp/releases/download/b7631/llama-b7631-bin-ubuntu-vulkan-x64.tar.gz)
|
|
270
|
+
- [Ubuntu s390x (CPU)](https://github.com/ggml-org/llama.cpp/releases/download/b7631/llama-b7631-bin-ubuntu-s390x.tar.gz)
|
|
271
|
+
- [Windows x64 (CPU)](https://github.com/ggml-org/llama.cpp/releases/download/b7631/llama-b7631-bin-win-cpu-x64.zip)
|
|
272
|
+
- [Windows arm64 (CPU)](https://github.com/ggml-org/llama.cpp/releases/download/b7631/llama-b7631-bin-win-cpu-arm64.zip)
|
|
273
|
+
- [Windows x64 (CUDA 12)](https://github.com/ggml-org/llama.cpp/releases/download/b7631/llama-b7631-bin-win-cuda-12.4-x64.zip) - [CUDA 12.4 DLLs](https://github.com/ggml-org/llama.cpp/releases/download/b7631/cudart-llama-bin-win-cuda-12.4-x64.zip)
|
|
274
|
+
- [Windows x64 (CUDA 13)](https://github.com/ggml-org/llama.cpp/releases/download/b7631/llama-b7631-bin-win-cuda-13.1-x64.zip) - [CUDA 13.1 DLLs](https://github.com/ggml-org/llama.cpp/releases/download/b7631/cudart-llama-bin-win-cuda-13.1-x64.zip)
|
|
275
|
+
- [Windows x64 (Vulkan)](https://github.com/ggml-org/llama.cpp/releases/download/b7631/llama-b7631-bin-win-vulkan-x64.zip)
|
|
276
|
+
- [Windows x64 (SYCL)](https://github.com/ggml-org/llama.cpp/releases/download/b7631/llama-b7631-bin-win-sycl-x64.zip)
|
|
277
|
+
- [Windows x64 (HIP)](https://github.com/ggml-org/llama.cpp/releases/download/b7631/llama-b7631-bin-win-hip-radeon-x64.zip)
|
|
278
|
+
- [openEuler x86 (310p)](https://github.com/ggml-org/llama.cpp/releases/download/b7631/llama-b7631-bin-310p-openEuler-x86.tar.gz)
|
|
279
|
+
- [openEuler x86 (910b)](https://github.com/ggml-org/llama.cpp/releases/download/b7631/llama-b7631-bin-910b-openEuler-x86.tar.gz)
|
|
280
|
+
- [openEuler aarch64 (310p)](https://github.com/ggml-org/llama.cpp/releases/download/b7631/llama-b7631-bin-310p-openEuler-aarch64.tar.gz)
|
|
281
|
+
- [openEuler aarch64 (910b)](https://github.com/ggml-org/llama.cpp/releases/download/b7631/llama-b7631-bin-910b-openEuler-aarch64.tar.gz)
|
|
282
|
+
|
|
283
|
+
|
|
139
284
|
## 2026-01-03: Update to llama.cpp b7621
|
|
140
285
|
|
|
141
286
|
- b7489 (b7489) – 2025-12-20 – https://github.com/ggml-org/llama.cpp/releases/tag/b7489
|