llama-cpp-pydist 0.20.0__py3-none-any.whl → 0.21.0__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (76) hide show
  1. llama_cpp/binaries/{llama-b7621-bin-win-cpu-x64.zip → llama-b7631-bin-win-cpu-x64.zip} +0 -0
  2. {llama_cpp_pydist-0.20.0.dist-info → llama_cpp_pydist-0.21.0.dist-info}/METADATA +146 -1
  3. {llama_cpp_pydist-0.20.0.dist-info → llama_cpp_pydist-0.21.0.dist-info}/RECORD +76 -73
  4. vendor_llama_cpp_pydist/llama.cpp/.github/workflows/build.yml +18 -6
  5. vendor_llama_cpp_pydist/llama.cpp/.github/workflows/release.yml +3 -1
  6. vendor_llama_cpp_pydist/llama.cpp/.github/workflows/server.yml +18 -0
  7. vendor_llama_cpp_pydist/llama.cpp/ci/run.sh +2 -1
  8. vendor_llama_cpp_pydist/llama.cpp/common/arg.cpp +7 -0
  9. vendor_llama_cpp_pydist/llama.cpp/common/chat.cpp +4 -4
  10. vendor_llama_cpp_pydist/llama.cpp/common/common.cpp +19 -0
  11. vendor_llama_cpp_pydist/llama.cpp/common/common.h +4 -0
  12. vendor_llama_cpp_pydist/llama.cpp/common/llguidance.cpp +10 -6
  13. vendor_llama_cpp_pydist/llama.cpp/common/regex-partial.cpp +13 -13
  14. vendor_llama_cpp_pydist/llama.cpp/common/sampling.cpp +58 -14
  15. vendor_llama_cpp_pydist/llama.cpp/common/sampling.h +3 -1
  16. vendor_llama_cpp_pydist/llama.cpp/convert_hf_to_gguf.py +10 -4
  17. vendor_llama_cpp_pydist/llama.cpp/docs/backend/CANN.md +4 -0
  18. vendor_llama_cpp_pydist/llama.cpp/docs/backend/OPENCL.md +50 -0
  19. vendor_llama_cpp_pydist/llama.cpp/ggml/src/ggml-cann/aclnn_ops.cpp +55 -0
  20. vendor_llama_cpp_pydist/llama.cpp/ggml/src/ggml-cann/aclnn_ops.h +14 -0
  21. vendor_llama_cpp_pydist/llama.cpp/ggml/src/ggml-cann/ggml-cann.cpp +44 -0
  22. vendor_llama_cpp_pydist/llama.cpp/ggml/src/ggml-cuda/CMakeLists.txt +24 -0
  23. vendor_llama_cpp_pydist/llama.cpp/ggml/src/ggml-cuda/argsort.cu +50 -29
  24. vendor_llama_cpp_pydist/llama.cpp/ggml/src/ggml-cuda/argsort.cuh +16 -0
  25. vendor_llama_cpp_pydist/llama.cpp/ggml/src/ggml-cuda/common.cuh +9 -9
  26. vendor_llama_cpp_pydist/llama.cpp/ggml/src/ggml-cuda/cumsum.cu +37 -3
  27. vendor_llama_cpp_pydist/llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu +22 -8
  28. vendor_llama_cpp_pydist/llama.cpp/ggml/src/ggml-cuda/softmax.cu +203 -6
  29. vendor_llama_cpp_pydist/llama.cpp/ggml/src/ggml-cuda/top-k.cu +96 -0
  30. vendor_llama_cpp_pydist/llama.cpp/ggml/src/ggml-cuda/top-k.cuh +3 -0
  31. vendor_llama_cpp_pydist/llama.cpp/ggml/src/ggml-cuda/vendors/hip.h +3 -0
  32. vendor_llama_cpp_pydist/llama.cpp/ggml/src/ggml-cuda/vendors/musa.h +1 -0
  33. vendor_llama_cpp_pydist/llama.cpp/ggml/src/ggml-vulkan/ggml-vulkan.cpp +32 -25
  34. vendor_llama_cpp_pydist/llama.cpp/ggml/src/ggml-vulkan/vulkan-shaders/quantize_q8_1.comp +8 -8
  35. vendor_llama_cpp_pydist/llama.cpp/ggml/src/ggml-vulkan/vulkan-shaders/topk_moe.comp +12 -7
  36. vendor_llama_cpp_pydist/llama.cpp/include/llama.h +86 -8
  37. vendor_llama_cpp_pydist/llama.cpp/src/llama-context.cpp +602 -18
  38. vendor_llama_cpp_pydist/llama.cpp/src/llama-context.h +43 -1
  39. vendor_llama_cpp_pydist/llama.cpp/src/llama-grammar.cpp +40 -13
  40. vendor_llama_cpp_pydist/llama.cpp/src/llama-grammar.h +2 -0
  41. vendor_llama_cpp_pydist/llama.cpp/src/llama-graph.cpp +166 -2
  42. vendor_llama_cpp_pydist/llama.cpp/src/llama-graph.h +71 -6
  43. vendor_llama_cpp_pydist/llama.cpp/src/llama-hparams.h +2 -2
  44. vendor_llama_cpp_pydist/llama.cpp/src/llama-model.cpp +43 -11
  45. vendor_llama_cpp_pydist/llama.cpp/src/llama-sampling.cpp +1232 -170
  46. vendor_llama_cpp_pydist/llama.cpp/src/llama-sampling.h +16 -7
  47. vendor_llama_cpp_pydist/llama.cpp/src/llama.cpp +1 -1
  48. vendor_llama_cpp_pydist/llama.cpp/src/models/afmoe.cpp +9 -5
  49. vendor_llama_cpp_pydist/llama.cpp/src/models/cohere2-iswa.cpp +3 -0
  50. vendor_llama_cpp_pydist/llama.cpp/src/models/gemma2-iswa.cpp +5 -2
  51. vendor_llama_cpp_pydist/llama.cpp/src/models/llama-iswa.cpp +6 -2
  52. vendor_llama_cpp_pydist/llama.cpp/src/models/modern-bert.cpp +4 -3
  53. vendor_llama_cpp_pydist/llama.cpp/src/models/openai-moe-iswa.cpp +5 -2
  54. vendor_llama_cpp_pydist/llama.cpp/src/models/smallthinker.cpp +11 -5
  55. vendor_llama_cpp_pydist/llama.cpp/tests/CMakeLists.txt +12 -2
  56. vendor_llama_cpp_pydist/llama.cpp/tests/test-backend-ops.cpp +93 -4
  57. vendor_llama_cpp_pydist/llama.cpp/tests/test-backend-sampler.cpp +1237 -0
  58. vendor_llama_cpp_pydist/llama.cpp/tests/test-regex-partial.cpp +14 -14
  59. vendor_llama_cpp_pydist/llama.cpp/tools/mtmd/clip.cpp +8 -0
  60. vendor_llama_cpp_pydist/llama.cpp/tools/mtmd/models/siglip.cpp +9 -4
  61. vendor_llama_cpp_pydist/llama.cpp/tools/server/public/index.html.gz +0 -0
  62. vendor_llama_cpp_pydist/llama.cpp/tools/server/server-common.cpp +12 -7
  63. vendor_llama_cpp_pydist/llama.cpp/tools/server/server-context.cpp +19 -0
  64. vendor_llama_cpp_pydist/llama.cpp/tools/server/server-models.cpp +47 -5
  65. vendor_llama_cpp_pydist/llama.cpp/tools/server/server-models.h +3 -3
  66. vendor_llama_cpp_pydist/llama.cpp/tools/server/server-task.cpp +3 -0
  67. vendor_llama_cpp_pydist/llama.cpp/tools/server/server.cpp +2 -2
  68. vendor_llama_cpp_pydist/llama.cpp/tools/server/webui/src/lib/components/app/chat/ChatSettings/ChatSettings.svelte +5 -0
  69. vendor_llama_cpp_pydist/llama.cpp/tools/server/webui/src/lib/constants/settings-config.ts +3 -0
  70. vendor_llama_cpp_pydist/llama.cpp/tools/server/webui/src/lib/services/chat.ts +3 -0
  71. vendor_llama_cpp_pydist/llama.cpp/tools/server/webui/src/lib/stores/chat.svelte.ts +2 -0
  72. vendor_llama_cpp_pydist/llama.cpp/tools/server/webui/src/lib/types/api.d.ts +3 -0
  73. vendor_llama_cpp_pydist/llama.cpp/tools/server/webui/src/lib/types/settings.d.ts +1 -0
  74. {llama_cpp_pydist-0.20.0.dist-info → llama_cpp_pydist-0.21.0.dist-info}/LICENSE +0 -0
  75. {llama_cpp_pydist-0.20.0.dist-info → llama_cpp_pydist-0.21.0.dist-info}/WHEEL +0 -0
  76. {llama_cpp_pydist-0.20.0.dist-info → llama_cpp_pydist-0.21.0.dist-info}/top_level.txt +0 -0
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.1
2
2
  Name: llama-cpp-pydist
3
- Version: 0.20.0
3
+ Version: 0.21.0
4
4
  Summary: A Python package for Llama CPP.
5
5
  Home-page: https://github.com/shamitv/llama_cpp
6
6
  Author: Shamit Verma
@@ -136,6 +136,151 @@ For instructions on how to build the package from source, update the `llama.cpp`
136
136
 
137
137
  # Changelog
138
138
 
139
+ ## 2026-01-05: Update to llama.cpp b7631
140
+
141
+ - b7622 (b7622) – 2026-01-03 – https://github.com/ggml-org/llama.cpp/releases/tag/b7622
142
+ - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b7622/llama-b7622-bin-macos-arm64.tar.gz)
143
+ - [macOS Intel (x64)](https://github.com/ggml-org/llama.cpp/releases/download/b7622/llama-b7622-bin-macos-x64.tar.gz)
144
+ - [iOS XCFramework](https://github.com/ggml-org/llama.cpp/releases/download/b7622/llama-b7622-xcframework.zip)
145
+ - [Ubuntu x64 (CPU)](https://github.com/ggml-org/llama.cpp/releases/download/b7622/llama-b7622-bin-ubuntu-x64.tar.gz)
146
+ - [Ubuntu x64 (Vulkan)](https://github.com/ggml-org/llama.cpp/releases/download/b7622/llama-b7622-bin-ubuntu-vulkan-x64.tar.gz)
147
+ - [Ubuntu s390x (CPU)](https://github.com/ggml-org/llama.cpp/releases/download/b7622/llama-b7622-bin-ubuntu-s390x.tar.gz)
148
+ - [Windows x64 (CPU)](https://github.com/ggml-org/llama.cpp/releases/download/b7622/llama-b7622-bin-win-cpu-x64.zip)
149
+ - [Windows arm64 (CPU)](https://github.com/ggml-org/llama.cpp/releases/download/b7622/llama-b7622-bin-win-cpu-arm64.zip)
150
+ - [Windows x64 (CUDA 12)](https://github.com/ggml-org/llama.cpp/releases/download/b7622/llama-b7622-bin-win-cuda-12.4-x64.zip) - [CUDA 12.4 DLLs](https://github.com/ggml-org/llama.cpp/releases/download/b7622/cudart-llama-bin-win-cuda-12.4-x64.zip)
151
+ - [Windows x64 (CUDA 13)](https://github.com/ggml-org/llama.cpp/releases/download/b7622/llama-b7622-bin-win-cuda-13.1-x64.zip) - [CUDA 13.1 DLLs](https://github.com/ggml-org/llama.cpp/releases/download/b7622/cudart-llama-bin-win-cuda-13.1-x64.zip)
152
+ - [Windows x64 (Vulkan)](https://github.com/ggml-org/llama.cpp/releases/download/b7622/llama-b7622-bin-win-vulkan-x64.zip)
153
+ - [Windows x64 (SYCL)](https://github.com/ggml-org/llama.cpp/releases/download/b7622/llama-b7622-bin-win-sycl-x64.zip)
154
+ - [Windows x64 (HIP)](https://github.com/ggml-org/llama.cpp/releases/download/b7622/llama-b7622-bin-win-hip-radeon-x64.zip)
155
+ - [openEuler x86 (310p)](https://github.com/ggml-org/llama.cpp/releases/download/b7622/llama-b7622-bin-310p-openEuler-x86.tar.gz)
156
+ - [openEuler x86 (910b)](https://github.com/ggml-org/llama.cpp/releases/download/b7622/llama-b7622-bin-910b-openEuler-x86.tar.gz)
157
+ - [openEuler aarch64 (310p)](https://github.com/ggml-org/llama.cpp/releases/download/b7622/llama-b7622-bin-310p-openEuler-aarch64.tar.gz)
158
+ - [openEuler aarch64 (910b)](https://github.com/ggml-org/llama.cpp/releases/download/b7622/llama-b7622-bin-910b-openEuler-aarch64.tar.gz)
159
+ - b7624 (b7624) – 2026-01-04 – https://github.com/ggml-org/llama.cpp/releases/tag/b7624
160
+ - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b7624/llama-b7624-bin-macos-arm64.tar.gz)
161
+ - [macOS Intel (x64)](https://github.com/ggml-org/llama.cpp/releases/download/b7624/llama-b7624-bin-macos-x64.tar.gz)
162
+ - [iOS XCFramework](https://github.com/ggml-org/llama.cpp/releases/download/b7624/llama-b7624-xcframework.zip)
163
+ - [Ubuntu x64 (CPU)](https://github.com/ggml-org/llama.cpp/releases/download/b7624/llama-b7624-bin-ubuntu-x64.tar.gz)
164
+ - [Ubuntu x64 (Vulkan)](https://github.com/ggml-org/llama.cpp/releases/download/b7624/llama-b7624-bin-ubuntu-vulkan-x64.tar.gz)
165
+ - [Ubuntu s390x (CPU)](https://github.com/ggml-org/llama.cpp/releases/download/b7624/llama-b7624-bin-ubuntu-s390x.tar.gz)
166
+ - [Windows x64 (CPU)](https://github.com/ggml-org/llama.cpp/releases/download/b7624/llama-b7624-bin-win-cpu-x64.zip)
167
+ - [Windows arm64 (CPU)](https://github.com/ggml-org/llama.cpp/releases/download/b7624/llama-b7624-bin-win-cpu-arm64.zip)
168
+ - [Windows x64 (CUDA 12)](https://github.com/ggml-org/llama.cpp/releases/download/b7624/llama-b7624-bin-win-cuda-12.4-x64.zip) - [CUDA 12.4 DLLs](https://github.com/ggml-org/llama.cpp/releases/download/b7624/cudart-llama-bin-win-cuda-12.4-x64.zip)
169
+ - [Windows x64 (CUDA 13)](https://github.com/ggml-org/llama.cpp/releases/download/b7624/llama-b7624-bin-win-cuda-13.1-x64.zip) - [CUDA 13.1 DLLs](https://github.com/ggml-org/llama.cpp/releases/download/b7624/cudart-llama-bin-win-cuda-13.1-x64.zip)
170
+ - [Windows x64 (Vulkan)](https://github.com/ggml-org/llama.cpp/releases/download/b7624/llama-b7624-bin-win-vulkan-x64.zip)
171
+ - [Windows x64 (SYCL)](https://github.com/ggml-org/llama.cpp/releases/download/b7624/llama-b7624-bin-win-sycl-x64.zip)
172
+ - [Windows x64 (HIP)](https://github.com/ggml-org/llama.cpp/releases/download/b7624/llama-b7624-bin-win-hip-radeon-x64.zip)
173
+ - [openEuler x86 (310p)](https://github.com/ggml-org/llama.cpp/releases/download/b7624/llama-b7624-bin-310p-openEuler-x86.tar.gz)
174
+ - [openEuler x86 (910b)](https://github.com/ggml-org/llama.cpp/releases/download/b7624/llama-b7624-bin-910b-openEuler-x86.tar.gz)
175
+ - [openEuler aarch64 (310p)](https://github.com/ggml-org/llama.cpp/releases/download/b7624/llama-b7624-bin-310p-openEuler-aarch64.tar.gz)
176
+ - [openEuler aarch64 (910b)](https://github.com/ggml-org/llama.cpp/releases/download/b7624/llama-b7624-bin-910b-openEuler-aarch64.tar.gz)
177
+ - b7625 (b7625) – 2026-01-04 – https://github.com/ggml-org/llama.cpp/releases/tag/b7625
178
+ - CUDA: disable cuda graph when using n-cpu-moe
179
+ - call ggml_cuda_set_device
180
+ - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b7625/llama-b7625-bin-macos-arm64.tar.gz)
181
+ - [macOS Intel (x64)](https://github.com/ggml-org/llama.cpp/releases/download/b7625/llama-b7625-bin-macos-x64.tar.gz)
182
+ - [iOS XCFramework](https://github.com/ggml-org/llama.cpp/releases/download/b7625/llama-b7625-xcframework.zip)
183
+ - [Ubuntu x64 (CPU)](https://github.com/ggml-org/llama.cpp/releases/download/b7625/llama-b7625-bin-ubuntu-x64.tar.gz)
184
+ - [Ubuntu x64 (Vulkan)](https://github.com/ggml-org/llama.cpp/releases/download/b7625/llama-b7625-bin-ubuntu-vulkan-x64.tar.gz)
185
+ - [Ubuntu s390x (CPU)](https://github.com/ggml-org/llama.cpp/releases/download/b7625/llama-b7625-bin-ubuntu-s390x.tar.gz)
186
+ - [Windows x64 (CPU)](https://github.com/ggml-org/llama.cpp/releases/download/b7625/llama-b7625-bin-win-cpu-x64.zip)
187
+ - [Windows arm64 (CPU)](https://github.com/ggml-org/llama.cpp/releases/download/b7625/llama-b7625-bin-win-cpu-arm64.zip)
188
+ - [Windows x64 (CUDA 12)](https://github.com/ggml-org/llama.cpp/releases/download/b7625/llama-b7625-bin-win-cuda-12.4-x64.zip) - [CUDA 12.4 DLLs](https://github.com/ggml-org/llama.cpp/releases/download/b7625/cudart-llama-bin-win-cuda-12.4-x64.zip)
189
+ - [Windows x64 (CUDA 13)](https://github.com/ggml-org/llama.cpp/releases/download/b7625/llama-b7625-bin-win-cuda-13.1-x64.zip) - [CUDA 13.1 DLLs](https://github.com/ggml-org/llama.cpp/releases/download/b7625/cudart-llama-bin-win-cuda-13.1-x64.zip)
190
+ - [Windows x64 (Vulkan)](https://github.com/ggml-org/llama.cpp/releases/download/b7625/llama-b7625-bin-win-vulkan-x64.zip)
191
+ - [Windows x64 (SYCL)](https://github.com/ggml-org/llama.cpp/releases/download/b7625/llama-b7625-bin-win-sycl-x64.zip)
192
+ - [Windows x64 (HIP)](https://github.com/ggml-org/llama.cpp/releases/download/b7625/llama-b7625-bin-win-hip-radeon-x64.zip)
193
+ - [openEuler x86 (310p)](https://github.com/ggml-org/llama.cpp/releases/download/b7625/llama-b7625-bin-310p-openEuler-x86.tar.gz)
194
+ - [openEuler x86 (910b)](https://github.com/ggml-org/llama.cpp/releases/download/b7625/llama-b7625-bin-910b-openEuler-x86.tar.gz)
195
+ - [openEuler aarch64 (310p)](https://github.com/ggml-org/llama.cpp/releases/download/b7625/llama-b7625-bin-310p-openEuler-aarch64.tar.gz)
196
+ - [openEuler aarch64 (910b)](https://github.com/ggml-org/llama.cpp/releases/download/b7625/llama-b7625-bin-910b-openEuler-aarch64.tar.gz)
197
+ - b7626 (b7626) – 2026-01-04 – https://github.com/ggml-org/llama.cpp/releases/tag/b7626
198
+ - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b7626/llama-b7626-bin-macos-arm64.tar.gz)
199
+ - [macOS Intel (x64)](https://github.com/ggml-org/llama.cpp/releases/download/b7626/llama-b7626-bin-macos-x64.tar.gz)
200
+ - [iOS XCFramework](https://github.com/ggml-org/llama.cpp/releases/download/b7626/llama-b7626-xcframework.zip)
201
+ - [Ubuntu x64 (CPU)](https://github.com/ggml-org/llama.cpp/releases/download/b7626/llama-b7626-bin-ubuntu-x64.tar.gz)
202
+ - [Ubuntu x64 (Vulkan)](https://github.com/ggml-org/llama.cpp/releases/download/b7626/llama-b7626-bin-ubuntu-vulkan-x64.tar.gz)
203
+ - [Ubuntu s390x (CPU)](https://github.com/ggml-org/llama.cpp/releases/download/b7626/llama-b7626-bin-ubuntu-s390x.tar.gz)
204
+ - [Windows x64 (CPU)](https://github.com/ggml-org/llama.cpp/releases/download/b7626/llama-b7626-bin-win-cpu-x64.zip)
205
+ - [Windows arm64 (CPU)](https://github.com/ggml-org/llama.cpp/releases/download/b7626/llama-b7626-bin-win-cpu-arm64.zip)
206
+ - [Windows x64 (CUDA 12)](https://github.com/ggml-org/llama.cpp/releases/download/b7626/llama-b7626-bin-win-cuda-12.4-x64.zip) - [CUDA 12.4 DLLs](https://github.com/ggml-org/llama.cpp/releases/download/b7626/cudart-llama-bin-win-cuda-12.4-x64.zip)
207
+ - [Windows x64 (CUDA 13)](https://github.com/ggml-org/llama.cpp/releases/download/b7626/llama-b7626-bin-win-cuda-13.1-x64.zip) - [CUDA 13.1 DLLs](https://github.com/ggml-org/llama.cpp/releases/download/b7626/cudart-llama-bin-win-cuda-13.1-x64.zip)
208
+ - [Windows x64 (Vulkan)](https://github.com/ggml-org/llama.cpp/releases/download/b7626/llama-b7626-bin-win-vulkan-x64.zip)
209
+ - [Windows x64 (SYCL)](https://github.com/ggml-org/llama.cpp/releases/download/b7626/llama-b7626-bin-win-sycl-x64.zip)
210
+ - [Windows x64 (HIP)](https://github.com/ggml-org/llama.cpp/releases/download/b7626/llama-b7626-bin-win-hip-radeon-x64.zip)
211
+ - [openEuler x86 (310p)](https://github.com/ggml-org/llama.cpp/releases/download/b7626/llama-b7626-bin-310p-openEuler-x86.tar.gz)
212
+ - [openEuler x86 (910b)](https://github.com/ggml-org/llama.cpp/releases/download/b7626/llama-b7626-bin-910b-openEuler-x86.tar.gz)
213
+ - [openEuler aarch64 (310p)](https://github.com/ggml-org/llama.cpp/releases/download/b7626/llama-b7626-bin-310p-openEuler-aarch64.tar.gz)
214
+ - [openEuler aarch64 (910b)](https://github.com/ggml-org/llama.cpp/releases/download/b7626/llama-b7626-bin-910b-openEuler-aarch64.tar.gz)
215
+ - b7628 (b7628) – 2026-01-05 – https://github.com/ggml-org/llama.cpp/releases/tag/b7628
216
+ - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b7628/llama-b7628-bin-macos-arm64.tar.gz)
217
+ - [macOS Intel (x64)](https://github.com/ggml-org/llama.cpp/releases/download/b7628/llama-b7628-bin-macos-x64.tar.gz)
218
+ - [iOS XCFramework](https://github.com/ggml-org/llama.cpp/releases/download/b7628/llama-b7628-xcframework.zip)
219
+ - [Ubuntu x64 (CPU)](https://github.com/ggml-org/llama.cpp/releases/download/b7628/llama-b7628-bin-ubuntu-x64.tar.gz)
220
+ - [Ubuntu x64 (Vulkan)](https://github.com/ggml-org/llama.cpp/releases/download/b7628/llama-b7628-bin-ubuntu-vulkan-x64.tar.gz)
221
+ - [Ubuntu s390x (CPU)](https://github.com/ggml-org/llama.cpp/releases/download/b7628/llama-b7628-bin-ubuntu-s390x.tar.gz)
222
+ - [Windows x64 (CPU)](https://github.com/ggml-org/llama.cpp/releases/download/b7628/llama-b7628-bin-win-cpu-x64.zip)
223
+ - [Windows arm64 (CPU)](https://github.com/ggml-org/llama.cpp/releases/download/b7628/llama-b7628-bin-win-cpu-arm64.zip)
224
+ - [Windows x64 (CUDA 12)](https://github.com/ggml-org/llama.cpp/releases/download/b7628/llama-b7628-bin-win-cuda-12.4-x64.zip) - [CUDA 12.4 DLLs](https://github.com/ggml-org/llama.cpp/releases/download/b7628/cudart-llama-bin-win-cuda-12.4-x64.zip)
225
+ - [Windows x64 (CUDA 13)](https://github.com/ggml-org/llama.cpp/releases/download/b7628/llama-b7628-bin-win-cuda-13.1-x64.zip) - [CUDA 13.1 DLLs](https://github.com/ggml-org/llama.cpp/releases/download/b7628/cudart-llama-bin-win-cuda-13.1-x64.zip)
226
+ - [Windows x64 (Vulkan)](https://github.com/ggml-org/llama.cpp/releases/download/b7628/llama-b7628-bin-win-vulkan-x64.zip)
227
+ - [Windows x64 (SYCL)](https://github.com/ggml-org/llama.cpp/releases/download/b7628/llama-b7628-bin-win-sycl-x64.zip)
228
+ - [Windows x64 (HIP)](https://github.com/ggml-org/llama.cpp/releases/download/b7628/llama-b7628-bin-win-hip-radeon-x64.zip)
229
+ - [openEuler x86 (310p)](https://github.com/ggml-org/llama.cpp/releases/download/b7628/llama-b7628-bin-310p-openEuler-x86.tar.gz)
230
+ - [openEuler x86 (910b)](https://github.com/ggml-org/llama.cpp/releases/download/b7628/llama-b7628-bin-910b-openEuler-x86.tar.gz)
231
+ - [openEuler aarch64 (310p)](https://github.com/ggml-org/llama.cpp/releases/download/b7628/llama-b7628-bin-310p-openEuler-aarch64.tar.gz)
232
+ - [openEuler aarch64 (910b)](https://github.com/ggml-org/llama.cpp/releases/download/b7628/llama-b7628-bin-910b-openEuler-aarch64.tar.gz)
233
+ - b7630 (b7630) – 2026-01-05 – https://github.com/ggml-org/llama.cpp/releases/tag/b7630
234
+ - Implement ggml_cann_op_add_rms_norm_fused() using ACLNN AddRmsNorm
235
+ - Add ggml_cann_can_fuse() to check fusion eligibility
236
+ - Integrate fusion logic into computation graph evaluation
237
+ - Add test cases for ADD + RMS_NORM fusion
238
+ - Update documentation with new environment variable
239
+ - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b7630/llama-b7630-bin-macos-arm64.tar.gz)
240
+ - [macOS Intel (x64)](https://github.com/ggml-org/llama.cpp/releases/download/b7630/llama-b7630-bin-macos-x64.tar.gz)
241
+ - [iOS XCFramework](https://github.com/ggml-org/llama.cpp/releases/download/b7630/llama-b7630-xcframework.zip)
242
+ - [Ubuntu x64 (CPU)](https://github.com/ggml-org/llama.cpp/releases/download/b7630/llama-b7630-bin-ubuntu-x64.tar.gz)
243
+ - [Ubuntu x64 (Vulkan)](https://github.com/ggml-org/llama.cpp/releases/download/b7630/llama-b7630-bin-ubuntu-vulkan-x64.tar.gz)
244
+ - [Ubuntu s390x (CPU)](https://github.com/ggml-org/llama.cpp/releases/download/b7630/llama-b7630-bin-ubuntu-s390x.tar.gz)
245
+ - [Windows x64 (CPU)](https://github.com/ggml-org/llama.cpp/releases/download/b7630/llama-b7630-bin-win-cpu-x64.zip)
246
+ - [Windows arm64 (CPU)](https://github.com/ggml-org/llama.cpp/releases/download/b7630/llama-b7630-bin-win-cpu-arm64.zip)
247
+ - [Windows x64 (CUDA 12)](https://github.com/ggml-org/llama.cpp/releases/download/b7630/llama-b7630-bin-win-cuda-12.4-x64.zip) - [CUDA 12.4 DLLs](https://github.com/ggml-org/llama.cpp/releases/download/b7630/cudart-llama-bin-win-cuda-12.4-x64.zip)
248
+ - [Windows x64 (CUDA 13)](https://github.com/ggml-org/llama.cpp/releases/download/b7630/llama-b7630-bin-win-cuda-13.1-x64.zip) - [CUDA 13.1 DLLs](https://github.com/ggml-org/llama.cpp/releases/download/b7630/cudart-llama-bin-win-cuda-13.1-x64.zip)
249
+ - [Windows x64 (Vulkan)](https://github.com/ggml-org/llama.cpp/releases/download/b7630/llama-b7630-bin-win-vulkan-x64.zip)
250
+ - [Windows x64 (SYCL)](https://github.com/ggml-org/llama.cpp/releases/download/b7630/llama-b7630-bin-win-sycl-x64.zip)
251
+ - [Windows x64 (HIP)](https://github.com/ggml-org/llama.cpp/releases/download/b7630/llama-b7630-bin-win-hip-radeon-x64.zip)
252
+ - [openEuler x86 (310p)](https://github.com/ggml-org/llama.cpp/releases/download/b7630/llama-b7630-bin-310p-openEuler-x86.tar.gz)
253
+ - [openEuler x86 (910b)](https://github.com/ggml-org/llama.cpp/releases/download/b7630/llama-b7630-bin-910b-openEuler-x86.tar.gz)
254
+ - [openEuler aarch64 (310p)](https://github.com/ggml-org/llama.cpp/releases/download/b7630/llama-b7630-bin-310p-openEuler-aarch64.tar.gz)
255
+ - [openEuler aarch64 (910b)](https://github.com/ggml-org/llama.cpp/releases/download/b7630/llama-b7630-bin-910b-openEuler-aarch64.tar.gz)
256
+ - b7631 (b7631) – 2026-01-05 – https://github.com/ggml-org/llama.cpp/releases/tag/b7631
257
+ - refactor rope_freq_base/scale_swa conversion and init
258
+ - safe defaults for unknowns
259
+ - update relevant models
260
+ - grammar
261
+ - add get_rope_freq_scale to modern-bert
262
+ - const
263
+ - const
264
+ - log swa info
265
+ - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b7631/llama-b7631-bin-macos-arm64.tar.gz)
266
+ - [macOS Intel (x64)](https://github.com/ggml-org/llama.cpp/releases/download/b7631/llama-b7631-bin-macos-x64.tar.gz)
267
+ - [iOS XCFramework](https://github.com/ggml-org/llama.cpp/releases/download/b7631/llama-b7631-xcframework.zip)
268
+ - [Ubuntu x64 (CPU)](https://github.com/ggml-org/llama.cpp/releases/download/b7631/llama-b7631-bin-ubuntu-x64.tar.gz)
269
+ - [Ubuntu x64 (Vulkan)](https://github.com/ggml-org/llama.cpp/releases/download/b7631/llama-b7631-bin-ubuntu-vulkan-x64.tar.gz)
270
+ - [Ubuntu s390x (CPU)](https://github.com/ggml-org/llama.cpp/releases/download/b7631/llama-b7631-bin-ubuntu-s390x.tar.gz)
271
+ - [Windows x64 (CPU)](https://github.com/ggml-org/llama.cpp/releases/download/b7631/llama-b7631-bin-win-cpu-x64.zip)
272
+ - [Windows arm64 (CPU)](https://github.com/ggml-org/llama.cpp/releases/download/b7631/llama-b7631-bin-win-cpu-arm64.zip)
273
+ - [Windows x64 (CUDA 12)](https://github.com/ggml-org/llama.cpp/releases/download/b7631/llama-b7631-bin-win-cuda-12.4-x64.zip) - [CUDA 12.4 DLLs](https://github.com/ggml-org/llama.cpp/releases/download/b7631/cudart-llama-bin-win-cuda-12.4-x64.zip)
274
+ - [Windows x64 (CUDA 13)](https://github.com/ggml-org/llama.cpp/releases/download/b7631/llama-b7631-bin-win-cuda-13.1-x64.zip) - [CUDA 13.1 DLLs](https://github.com/ggml-org/llama.cpp/releases/download/b7631/cudart-llama-bin-win-cuda-13.1-x64.zip)
275
+ - [Windows x64 (Vulkan)](https://github.com/ggml-org/llama.cpp/releases/download/b7631/llama-b7631-bin-win-vulkan-x64.zip)
276
+ - [Windows x64 (SYCL)](https://github.com/ggml-org/llama.cpp/releases/download/b7631/llama-b7631-bin-win-sycl-x64.zip)
277
+ - [Windows x64 (HIP)](https://github.com/ggml-org/llama.cpp/releases/download/b7631/llama-b7631-bin-win-hip-radeon-x64.zip)
278
+ - [openEuler x86 (310p)](https://github.com/ggml-org/llama.cpp/releases/download/b7631/llama-b7631-bin-310p-openEuler-x86.tar.gz)
279
+ - [openEuler x86 (910b)](https://github.com/ggml-org/llama.cpp/releases/download/b7631/llama-b7631-bin-910b-openEuler-x86.tar.gz)
280
+ - [openEuler aarch64 (310p)](https://github.com/ggml-org/llama.cpp/releases/download/b7631/llama-b7631-bin-310p-openEuler-aarch64.tar.gz)
281
+ - [openEuler aarch64 (910b)](https://github.com/ggml-org/llama.cpp/releases/download/b7631/llama-b7631-bin-910b-openEuler-aarch64.tar.gz)
282
+
283
+
139
284
  ## 2026-01-03: Update to llama.cpp b7621
140
285
 
141
286
  - b7489 (b7489) – 2025-12-20 – https://github.com/ggml-org/llama.cpp/releases/tag/b7489