PyPI - llama-cpp-python - Versions diffs - 0.2.68__tar.gz → 0.2.70__tar.gz - Mend

llama-cpp-python 0.2.68tar.gz → 0.2.70tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (1107) hide show

llama_cpp_python-0.2.70/.git/FETCH_HEAD ADDED Viewed

	@@ -0,0 +1 @@
1	+ 9ce5cb376a12a56028aec1fd3b0edc55949b996f '9ce5cb376a12a56028aec1fd3b0edc55949b996f' of https://github.com/abetlen/llama-cpp-python

llama_cpp_python-0.2.70/.git/HEAD ADDED Viewed

	@@ -0,0 +1 @@
1	+ 9ce5cb376a12a56028aec1fd3b0edc55949b996f

{llama_cpp_python-0.2.68 → llama_cpp_python-0.2.70}/.git/config RENAMED Viewed

@@ -9,7 +9,7 @@
 [gc]
 	auto = 0
 [http "https://github.com/"]
-	extraheader = AUTHORIZATION: basic eC1hY2Nlc3MtdG9rZW46Z2hzX0JDY0xlczRzUW4zRThuS0x6TEpwOFJRcjhPNDlpdTNwNFdKUg==
+	extraheader = AUTHORIZATION: basic eC1hY2Nlc3MtdG9rZW46Z2hzX1VKZzRtOWNLdzR0dno0b0Z5M1U4QkhFd1NrRGI3NTFSSWpZTw==
 [submodule "vendor/llama.cpp"]
 	active = true
 	url = https://github.com/ggerganov/llama.cpp.git

llama_cpp_python-0.2.70/.git/index ADDED Viewed

Binary file

llama_cpp_python-0.2.70/.git/logs/HEAD ADDED Viewed

	@@ -0,0 +1 @@
1	+ 0000000000000000000000000000000000000000 9ce5cb376a12a56028aec1fd3b0edc55949b996f runner <runner@fv-az564-924.(none)> 1715150262 +0000 checkout: moving from master to refs/tags/v0.2.70

llama_cpp_python-0.2.70/.git/modules/vendor/llama.cpp/FETCH_HEAD ADDED Viewed

	@@ -0,0 +1 @@
1	+ c0e6fbf8c380718102bd25fcb8d2e55f8f9480d1 'c0e6fbf8c380718102bd25fcb8d2e55f8f9480d1' of https://github.com/ggerganov/llama.cpp

llama_cpp_python-0.2.70/.git/modules/vendor/llama.cpp/HEAD ADDED Viewed

	@@ -0,0 +1 @@
1	+ c0e6fbf8c380718102bd25fcb8d2e55f8f9480d1

{llama_cpp_python-0.2.68 → llama_cpp_python-0.2.70}/.git/modules/vendor/llama.cpp/config RENAMED Viewed

@@ -16,7 +16,7 @@
 [gc]
 	auto = 0
 [http "https://github.com/"]
-	extraheader = AUTHORIZATION: basic eC1hY2Nlc3MtdG9rZW46Z2hzX0JDY0xlczRzUW4zRThuS0x6TEpwOFJRcjhPNDlpdTNwNFdKUg==
+	extraheader = AUTHORIZATION: basic eC1hY2Nlc3MtdG9rZW46Z2hzX1VKZzRtOWNLdzR0dno0b0Z5M1U4QkhFd1NrRGI3NTFSSWpZTw==
 [url "https://github.com/"]
 	insteadOf = git@github.com:
 	insteadOf = org-6826477@github.com:

llama_cpp_python-0.2.70/.git/modules/vendor/llama.cpp/index ADDED Viewed

Binary file

llama_cpp_python-0.2.70/.git/modules/vendor/llama.cpp/logs/HEAD ADDED Viewed

	@@ -0,0 +1,2 @@
1	+ 0000000000000000000000000000000000000000 3855416027cb25d9a708ffa5581cf503a87856a6 runner <runner@fv-az564-924.(none)> 1715150263 +0000 clone: from https://github.com/ggerganov/llama.cpp.git
2	+ 3855416027cb25d9a708ffa5581cf503a87856a6 c0e6fbf8c380718102bd25fcb8d2e55f8f9480d1 runner <runner@fv-az564-924.(none)> 1715150264 +0000 checkout: moving from master to c0e6fbf8c380718102bd25fcb8d2e55f8f9480d1

llama_cpp_python-0.2.70/.git/modules/vendor/llama.cpp/logs/refs/heads/master ADDED Viewed

	@@ -0,0 +1 @@
1	+ 0000000000000000000000000000000000000000 3855416027cb25d9a708ffa5581cf503a87856a6 runner <runner@fv-az564-924.(none)> 1715150263 +0000 clone: from https://github.com/ggerganov/llama.cpp.git

llama_cpp_python-0.2.70/.git/modules/vendor/llama.cpp/logs/refs/remotes/origin/HEAD ADDED Viewed

	@@ -0,0 +1 @@
1	+ 0000000000000000000000000000000000000000 3855416027cb25d9a708ffa5581cf503a87856a6 runner <runner@fv-az564-924.(none)> 1715150263 +0000 clone: from https://github.com/ggerganov/llama.cpp.git

{llama_cpp_python-0.2.68 → llama_cpp_python-0.2.70}/.git/modules/vendor/llama.cpp/modules/kompute/config RENAMED Viewed

@@ -13,7 +13,7 @@
 [gc]
 	auto = 0
 [http "https://github.com/"]
-	extraheader = AUTHORIZATION: basic eC1hY2Nlc3MtdG9rZW46Z2hzX0JDY0xlczRzUW4zRThuS0x6TEpwOFJRcjhPNDlpdTNwNFdKUg==
+	extraheader = AUTHORIZATION: basic eC1hY2Nlc3MtdG9rZW46Z2hzX1VKZzRtOWNLdzR0dno0b0Z5M1U4QkhFd1NrRGI3NTFSSWpZTw==
 [url "https://github.com/"]
 	insteadOf = git@github.com:
 	insteadOf = org-6826477@github.com:

llama_cpp_python-0.2.70/.git/modules/vendor/llama.cpp/modules/kompute/index ADDED Viewed

Binary file

llama_cpp_python-0.2.70/.git/modules/vendor/llama.cpp/modules/kompute/logs/HEAD ADDED Viewed

	@@ -0,0 +1,2 @@
1	+ 0000000000000000000000000000000000000000 d1e3b0953cf66acc94b2e29693e221427b2c1f3f runner <runner@fv-az564-924.(none)> 1715150265 +0000 clone: from https://github.com/nomic-ai/kompute.git
2	+ d1e3b0953cf66acc94b2e29693e221427b2c1f3f 4565194ed7c32d1d2efa32ceab4d3c6cae006306 runner <runner@fv-az564-924.(none)> 1715150266 +0000 checkout: moving from master to 4565194ed7c32d1d2efa32ceab4d3c6cae006306

llama_cpp_python-0.2.70/.git/modules/vendor/llama.cpp/modules/kompute/logs/refs/heads/master ADDED Viewed

	@@ -0,0 +1 @@
1	+ 0000000000000000000000000000000000000000 d1e3b0953cf66acc94b2e29693e221427b2c1f3f runner <runner@fv-az564-924.(none)> 1715150265 +0000 clone: from https://github.com/nomic-ai/kompute.git

llama_cpp_python-0.2.70/.git/modules/vendor/llama.cpp/modules/kompute/logs/refs/remotes/origin/HEAD ADDED Viewed

	@@ -0,0 +1 @@
1	+ 0000000000000000000000000000000000000000 d1e3b0953cf66acc94b2e29693e221427b2c1f3f runner <runner@fv-az564-924.(none)> 1715150265 +0000 clone: from https://github.com/nomic-ai/kompute.git

llama_cpp_python-0.2.68/.git/modules/vendor/llama.cpp/modules/kompute/objects/pack/pack-aea54470ccfced130dc113c076f9a5f9e05cddbf.idx → llama_cpp_python-0.2.70/.git/modules/vendor/llama.cpp/modules/kompute/objects/pack/pack-dfe06cade21d4a3c314f514ca2e7bec04aebe5ea.idx RENAMED Viewed

Binary file

llama_cpp_python-0.2.68/.git/modules/vendor/llama.cpp/modules/kompute/objects/pack/pack-aea54470ccfced130dc113c076f9a5f9e05cddbf.pack → llama_cpp_python-0.2.70/.git/modules/vendor/llama.cpp/modules/kompute/objects/pack/pack-dfe06cade21d4a3c314f514ca2e7bec04aebe5ea.pack RENAMED Viewed

Binary file

llama_cpp_python-0.2.68/.git/modules/vendor/llama.cpp/modules/kompute/objects/pack/pack-aea54470ccfced130dc113c076f9a5f9e05cddbf.rev → llama_cpp_python-0.2.70/.git/modules/vendor/llama.cpp/modules/kompute/objects/pack/pack-dfe06cade21d4a3c314f514ca2e7bec04aebe5ea.rev RENAMED Viewed

Binary file

llama_cpp_python-0.2.70/.git/modules/vendor/llama.cpp/objects/01/7b72ce9438337f2d9b47212cc6756883e2c7c5 ADDED Viewed

Binary file

llama_cpp_python-0.2.70/.git/modules/vendor/llama.cpp/objects/07/3612af1b8eed962fde03e16b8da8feb3a0d23c ADDED Viewed

Binary file

llama_cpp_python-0.2.70/.git/modules/vendor/llama.cpp/objects/07/fde361951704cbb5b8bf5f9396be8d00f95cae ADDED Viewed

Binary file

llama_cpp_python-0.2.70/.git/modules/vendor/llama.cpp/objects/3d/a5317b3d9104fb070fc45f855e92396ed97eb8 ADDED Viewed

Binary file

llama_cpp_python-0.2.70/.git/modules/vendor/llama.cpp/objects/43/2cc2b4feadff27c6ab01f49cfc961390f2f9d3 ADDED Viewed

Binary file

llama_cpp_python-0.2.70/.git/modules/vendor/llama.cpp/objects/44/4d1e55ebd5404f70ce62f32f5ad08bbbdebd2d ADDED Viewed

Binary file

llama_cpp_python-0.2.70/.git/modules/vendor/llama.cpp/objects/4f/232e18d96915ded0d8c12944b7abd40bc27bd7 ADDED Viewed

Binary file

llama_cpp_python-0.2.70/.git/modules/vendor/llama.cpp/objects/7a/a213bbdb7358b99a39dea01a26bfe6db65b3fa ADDED Viewed

Binary file

llama_cpp_python-0.2.70/.git/modules/vendor/llama.cpp/objects/7e/46f03e7f64f6e8b0104ba351e46ae2125fc888 ADDED Viewed

Binary file

llama_cpp_python-0.2.70/.git/modules/vendor/llama.cpp/objects/82/179a1257f30f2b64dced028ff994732694275d ADDED Viewed

Binary file

llama_cpp_python-0.2.70/.git/modules/vendor/llama.cpp/objects/87/f163bbd3fcebbe0b8d8df073ee559869948bd0 ADDED Viewed

Binary file

llama_cpp_python-0.2.70/.git/modules/vendor/llama.cpp/objects/94/a1cc66854f43c0d4f2292c2eeeb9ddd65c8c93 ADDED Viewed

Binary file

llama_cpp_python-0.2.70/.git/modules/vendor/llama.cpp/objects/a1/1795973ca3f7e54a2c407fe9d6d8fea450f645 ADDED Viewed

Binary file

llama_cpp_python-0.2.70/.git/modules/vendor/llama.cpp/objects/a9/eee7cfc8f5eef59c4d4fa805ad3bbe86ea86d8 ADDED Viewed

@@ -0,0 +1 @@

+ x+)JMU064e040031Q��,��+�d��t��i��m|�Yv�]�,9?��$1��_YO�f��7j�l_}��P�u��4��T��MME��>��o՘��II|QjbJjH��V�Ѯ,�6�O_�w�ي�ճ�U�e�@T�<��P��A�o��M�ܾ��P��z%��)��>��5g�n�u�Q7=� �US��W�_��XP��i)��oWed_-ޡ7i�U��$��U��{�:sk��|�A�۞��

llama_cpp_python-0.2.70/.git/modules/vendor/llama.cpp/objects/b2/7c1291e4088be86d97f742d3a5aa1ade2d24fa ADDED Viewed

Binary file

llama_cpp_python-0.2.70/.git/modules/vendor/llama.cpp/objects/c0/e6fbf8c380718102bd25fcb8d2e55f8f9480d1 ADDED Viewed

Binary file

llama_cpp_python-0.2.70/.git/modules/vendor/llama.cpp/objects/c4/ef15122655860b2ed615ceb918ce025c45865d ADDED Viewed

Binary file

llama_cpp_python-0.2.70/.git/modules/vendor/llama.cpp/objects/d7/f4bf8ea90d2d73885e4c201800fddb425c098c ADDED Viewed

Binary file

llama_cpp_python-0.2.70/.git/modules/vendor/llama.cpp/objects/f9/3cc522eb7f6bfdb0ec0adb499f4661d9fdbe93 ADDED Viewed

Binary file

llama_cpp_python-0.2.70/.git/modules/vendor/llama.cpp/objects/pack/pack-293eaa2a3e2852809e9943866a5773635399bf4c.idx ADDED Viewed

Binary file

llama_cpp_python-0.2.68/.git/modules/vendor/llama.cpp/objects/pack/pack-80678416707e3403714c6fedf67fc0629e198f4c.pack → llama_cpp_python-0.2.70/.git/modules/vendor/llama.cpp/objects/pack/pack-293eaa2a3e2852809e9943866a5773635399bf4c.pack RENAMED Viewed

Binary file

llama_cpp_python-0.2.70/.git/modules/vendor/llama.cpp/objects/pack/pack-293eaa2a3e2852809e9943866a5773635399bf4c.rev ADDED Viewed

Binary file

llama_cpp_python-0.2.70/.git/modules/vendor/llama.cpp/packed-refs ADDED Viewed

	@@ -0,0 +1,2 @@
1	+ # pack-refs with: peeled fully-peeled sorted
2	+ 3855416027cb25d9a708ffa5581cf503a87856a6 refs/remotes/origin/master

llama_cpp_python-0.2.70/.git/modules/vendor/llama.cpp/refs/heads/master ADDED Viewed

	@@ -0,0 +1 @@
1	+ 3855416027cb25d9a708ffa5581cf503a87856a6

llama_cpp_python-0.2.70/.git/modules/vendor/llama.cpp/shallow ADDED Viewed

	@@ -0,0 +1,2 @@
1	+ 3855416027cb25d9a708ffa5581cf503a87856a6
2	+ c0e6fbf8c380718102bd25fcb8d2e55f8f9480d1

llama_cpp_python-0.2.70/.git/objects/pack/pack-17b9384a6754c7e02e6df20dcc0e6bb5d5098e8a.idx ADDED Viewed

Binary file

llama_cpp_python-0.2.68/.git/objects/pack/pack-d80e9c2842087fe2b118d96efa116f60e3086b09.pack → llama_cpp_python-0.2.70/.git/objects/pack/pack-17b9384a6754c7e02e6df20dcc0e6bb5d5098e8a.pack RENAMED Viewed

Binary file

llama_cpp_python-0.2.70/.git/objects/pack/pack-17b9384a6754c7e02e6df20dcc0e6bb5d5098e8a.rev ADDED Viewed

Binary file

llama_cpp_python-0.2.70/.git/refs/tags/v0.2.70 ADDED Viewed

	@@ -0,0 +1 @@
1	+ 9ce5cb376a12a56028aec1fd3b0edc55949b996f

llama_cpp_python-0.2.70/.git/shallow ADDED Viewed

	@@ -0,0 +1 @@
1	+ 9ce5cb376a12a56028aec1fd3b0edc55949b996f

{llama_cpp_python-0.2.68 → llama_cpp_python-0.2.70}/.github/dependabot.yml RENAMED Viewed

@@ -8,8 +8,12 @@ updates:
   - package-ecosystem: "pip" # See documentation for possible values
     directory: "/" # Location of package manifests
     schedule:
-      interval: "weekly"
+      interval: "daily"
   - package-ecosystem: "github-actions"
     directory: "/"
     schedule:
-      interval: "weekly"
+      interval: "daily"
+  - package-ecosystem: "docker"
+    directory: "/"
+    schedule:
+      interval: "daily"

{llama_cpp_python-0.2.68 → llama_cpp_python-0.2.70}/CHANGELOG.md RENAMED Viewed

@@ -7,9 +7,32 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ## [Unreleased]
-## [0.2.68]
+## [0.2.70]
 - feat: Update llama.cpp to ggerganov/llama.cpp@
+- feat: fill-in-middle support by @CISC in #1386
+- fix: adding missing args in create_completion for functionary chat handler by @skalade in #1430
+- docs: update README.md @eltociear in #1432
+- fix: chat_format log where auto-detected format prints None by @balvisio in #1434
+- feat(server): Add support for setting root_path by @abetlen in 0318702cdc860999ee70f277425edbbfe0e60419
+- feat(ci): Add docker checks and check deps more frequently by @Smartappli in #1426
+- fix: detokenization case where first token does not start with a leading space by @noamgat in #1375
+- feat: Implement streaming for Functionary v2 + Bug fixes by @jeffrey-fong in #1419
+- fix: Use memmove to copy str_value kv_override by @abetlen in 9f7a85571ae80d3b6ddbd3e1bae407b9f1e3448a
+- feat(server): Remove temperature bounds checks for server by @abetlen in 0a454bebe67d12a446981eb16028c168ca5faa81
+- fix(server): Propagate flash_attn to model load by @dthuerck in #1424
+## [0.2.69]
+- feat: Update llama.cpp to ggerganov/llama.cpp@6ecf3189e00a1e8e737a78b6d10e1d7006e050a2
+- feat: Add llama-3-vision-alpha chat format by @abetlen in 31b1d95a6c19f5b615a3286069f181a415f872e8
+- fix: Change default verbose value of verbose in image chat format handlers to True to match Llama by @abetlen in 4f01c452b6c738dc56eacac3758119b12c57ea94
+- fix: Suppress all logs when verbose=False, use hardcoded fileno's to work in colab notebooks by @abetlen in f116175a5a7c84569c88cad231855c1e6e59ff6e
+- fix: UTF-8 handling with grammars by @jsoma in #1415
+## [0.2.68]
+- feat: Update llama.cpp to ggerganov/llama.cpp@77e15bec6217a39be59b9cc83d6b9afb6b0d8167
 - feat: Add option to enable flash_attn to Lllama params and ModelSettings by @abetlen in 22d77eefd2edaf0148f53374d0cac74d0e25d06e
 - fix(ci): Fix build-and-release.yaml by @Smartappli in #1413

{llama_cpp_python-0.2.68 → llama_cpp_python-0.2.70}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: llama_cpp_python
-Version: 0.2.68
+Version: 0.2.70
 Summary: Python bindings for the llama.cpp library
 Author-Email: Andrei Betlen <abetlen@gmail.com>
 License: MIT
@@ -321,20 +321,26 @@ The high-level API provides a simple managed interface through the [`Llama`](htt
 Below is a short example demonstrating how to use the high-level API to for basic text completion:
 ```python
->>> from llama_cpp import Llama
->>> llm = Llama(
+from llama_cpp import Llama
+llm = Llama(
       model_path="./models/7B/llama-model.gguf",
       # n_gpu_layers=-1, # Uncomment to use GPU acceleration
       # seed=1337, # Uncomment to set a specific seed
       # n_ctx=2048, # Uncomment to increase the context window
 )
->>> output = llm(
+output = llm(
       "Q: Name the planets in the solar system? A: ", # Prompt
       max_tokens=32, # Generate up to 32 tokens, set to None to generate up to the end of the context window
       stop=["Q:", "\n"], # Stop generating just before the model would generate a new question
       echo=True # Echo the prompt back in the output
 ) # Generate a completion, can also call create_completion
->>> print(output)
+print(output)
+```
+By default `llama-cpp-python` generates completions in an OpenAI compatible format:
+```python
 {
   "id": "cmpl-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
   "object": "text_completion",
@@ -389,12 +395,12 @@ The model will will format the messages into a single prompt using the following
 Set `verbose=True` to see the selected chat format.
 ```python
->>> from llama_cpp import Llama
->>> llm = Llama(
+from llama_cpp import Llama
+llm = Llama(
       model_path="path/to/llama-2/llama-model.gguf",
       chat_format="llama-2"
 )
->>> llm.create_chat_completion(
+llm.create_chat_completion(
       messages = [
           {"role": "system", "content": "You are an assistant who perfectly describes images."},
           {
@@ -419,9 +425,9 @@ To constrain chat responses to only valid JSON or a specific JSON Schema use the
 The following example will constrain the response to valid JSON strings only.
 ```python
->>> from llama_cpp import Llama
->>> llm = Llama(model_path="path/to/model.gguf", chat_format="chatml")
->>> llm.create_chat_completion(
+from llama_cpp import Llama
+llm = Llama(model_path="path/to/model.gguf", chat_format="chatml")
+llm.create_chat_completion(
     messages=[
         {
             "role": "system",
@@ -441,9 +447,9 @@ The following example will constrain the response to valid JSON strings only.
 To constrain the response further to a specific JSON Schema add the schema to the `schema` property of the `response_format` argument.
 ```python
->>> from llama_cpp import Llama
->>> llm = Llama(model_path="path/to/model.gguf", chat_format="chatml")
->>> llm.create_chat_completion(
+from llama_cpp import Llama
+llm = Llama(model_path="path/to/model.gguf", chat_format="chatml")
+llm.create_chat_completion(
     messages=[
         {
             "role": "system",
@@ -468,9 +474,9 @@ To constrain the response further to a specific JSON Schema add the schema to th
 The high-level API supports OpenAI compatible function and tool calling. This is possible through the `functionary` pre-trained models chat format or through the generic `chatml-function-calling` chat format.
 ```python
->>> from llama_cpp import Llama
->>> llm = Llama(model_path="path/to/chatml/llama-model.gguf", chat_format="chatml-function-calling")
->>> llm.create_chat_completion(
+from llama_cpp import Llama
+llm = Llama(model_path="path/to/chatml/llama-model.gguf", chat_format="chatml-function-calling")
+llm.create_chat_completion(
       messages = [
         {
           "role": "system",
@@ -520,9 +526,9 @@ The various gguf-converted files for this set of models can be found [here](http
 Due to discrepancies between llama.cpp and HuggingFace's tokenizers, it is required to provide HF Tokenizer for functionary. The `LlamaHFTokenizer` class can be initialized and passed into the Llama class. This will override the default llama.cpp tokenizer used in Llama class. The tokenizer files are already included in the respective HF repositories hosting the gguf files.
 ```python
->>> from llama_cpp import Llama
->>> from llama_cpp.llama_tokenizer import LlamaHFTokenizer
->>> llm = Llama.from_pretrained(
+from llama_cpp import Llama
+from llama_cpp.llama_tokenizer import LlamaHFTokenizer
+llm = Llama.from_pretrained(
   repo_id="meetkai/functionary-small-v2.2-GGUF",
   filename="functionary-small-v2.2.q4_0.gguf",
   chat_format="functionary-v2",
@@ -548,15 +554,15 @@ You'll first need to download one of the available multi-modal models in GGUF fo
 Then you'll need to use a custom chat handler to load the clip model and process the chat messages and images.
 ```python
->>> from llama_cpp import Llama
->>> from llama_cpp.llama_chat_format import Llava15ChatHandler
->>> chat_handler = Llava15ChatHandler(clip_model_path="path/to/llava/mmproj.bin")
->>> llm = Llama(
+from llama_cpp import Llama
+from llama_cpp.llama_chat_format import Llava15ChatHandler
+chat_handler = Llava15ChatHandler(clip_model_path="path/to/llava/mmproj.bin")
+llm = Llama(
   model_path="./path/to/llava/llama-model.gguf",
   chat_handler=chat_handler,
-  n_ctx=2048, # n_ctx should be increased to accomodate the image embedding
+  n_ctx=2048, # n_ctx should be increased to accommodate the image embedding
 )
->>> llm.create_chat_completion(
+llm.create_chat_completion(
     messages = [
         {"role": "system", "content": "You are an assistant who perfectly describes images."},
         {
@@ -573,19 +579,22 @@ Then you'll need to use a custom chat handler to load the clip model and process
 You can also pull the model from the Hugging Face Hub using the `from_pretrained` method.
 ```python
->>> from llama_cpp import Llama
->>> from llama_cpp.llama_chat_format import MoondreamChatHandler
->>> chat_handler = MoondreamChatHandler.from_pretrained(
+from llama_cpp import Llama
+from llama_cpp.llama_chat_format import MoondreamChatHandler
+chat_handler = MoondreamChatHandler.from_pretrained(
   repo_id="vikhyatk/moondream2",
   filename="*mmproj*",
 )
->>> llm = Llama.from_pretrained(
-  repo_id="vikhyatk/moondream2"
+llm = Llama.from_pretrained(
+  repo_id="vikhyatk/moondream2",
   filename="*text-model*",
   chat_handler=chat_handler,
-  n_ctx=2048, # n_ctx should be increased to accomodate the image embedding
+  n_ctx=2048, # n_ctx should be increased to accommodate the image embedding
 )
->>> llm.create_chat_completion(
+respoonse = llm.create_chat_completion(
     messages = [
         {
             "role": "user",
@@ -597,6 +606,7 @@ You can also pull the model from the Hugging Face Hub using the `from_pretrained
         }
     ]
 )
+print(response["choices"][0]["text"])
 ```
 **Note**: Multi-modal models also support tool calling and JSON mode.
@@ -749,18 +759,18 @@ The entire low-level API can be found in [llama_cpp/llama_cpp.py](https://github
 Below is a short example demonstrating how to use the low-level API to tokenize a prompt:
 ```python
->>> import llama_cpp
->>> import ctypes
->>> llama_cpp.llama_backend_init(False) # Must be called once at the start of each program
->>> params = llama_cpp.llama_context_default_params()
+import llama_cpp
+import ctypes
+llama_cpp.llama_backend_init(False) # Must be called once at the start of each program
+params = llama_cpp.llama_context_default_params()
 # use bytes for char * params
->>> model = llama_cpp.llama_load_model_from_file(b"./models/7b/llama-model.gguf", params)
->>> ctx = llama_cpp.llama_new_context_with_model(model, params)
->>> max_tokens = params.n_ctx
+model = llama_cpp.llama_load_model_from_file(b"./models/7b/llama-model.gguf", params)
+ctx = llama_cpp.llama_new_context_with_model(model, params)
+max_tokens = params.n_ctx
 # use ctypes arrays for array params
->>> tokens = (llama_cpp.llama_token * int(max_tokens))()
->>> n_tokens = llama_cpp.llama_tokenize(ctx, b"Q: Name the planets in the solar system? A: ", tokens, max_tokens, llama_cpp.c_bool(True))
->>> llama_cpp.llama_free(ctx)
+tokens = (llama_cpp.llama_token * int(max_tokens))()
+n_tokens = llama_cpp.llama_tokenize(ctx, b"Q: Name the planets in the solar system? A: ", tokens, max_tokens, llama_cpp.c_bool(True))
+llama_cpp.llama_free(ctx)
 ```
 Check out the [examples folder](examples/low_level_api) for more examples of using the low-level API.

{llama_cpp_python-0.2.68 → llama_cpp_python-0.2.70}/README.md RENAMED Viewed

@@ -277,20 +277,26 @@ The high-level API provides a simple managed interface through the [`Llama`](htt
 Below is a short example demonstrating how to use the high-level API to for basic text completion:
 ```python
->>> from llama_cpp import Llama
->>> llm = Llama(
+from llama_cpp import Llama
+llm = Llama(
       model_path="./models/7B/llama-model.gguf",
       # n_gpu_layers=-1, # Uncomment to use GPU acceleration
       # seed=1337, # Uncomment to set a specific seed
       # n_ctx=2048, # Uncomment to increase the context window
 )
->>> output = llm(
+output = llm(
       "Q: Name the planets in the solar system? A: ", # Prompt
       max_tokens=32, # Generate up to 32 tokens, set to None to generate up to the end of the context window
       stop=["Q:", "\n"], # Stop generating just before the model would generate a new question
       echo=True # Echo the prompt back in the output
 ) # Generate a completion, can also call create_completion
->>> print(output)
+print(output)
+```
+By default `llama-cpp-python` generates completions in an OpenAI compatible format:
+```python
 {
   "id": "cmpl-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
   "object": "text_completion",
@@ -345,12 +351,12 @@ The model will will format the messages into a single prompt using the following
 Set `verbose=True` to see the selected chat format.
 ```python
->>> from llama_cpp import Llama
->>> llm = Llama(
+from llama_cpp import Llama
+llm = Llama(
       model_path="path/to/llama-2/llama-model.gguf",
       chat_format="llama-2"
 )
->>> llm.create_chat_completion(
+llm.create_chat_completion(
       messages = [
           {"role": "system", "content": "You are an assistant who perfectly describes images."},
           {
@@ -375,9 +381,9 @@ To constrain chat responses to only valid JSON or a specific JSON Schema use the
 The following example will constrain the response to valid JSON strings only.
 ```python
->>> from llama_cpp import Llama
->>> llm = Llama(model_path="path/to/model.gguf", chat_format="chatml")
->>> llm.create_chat_completion(
+from llama_cpp import Llama
+llm = Llama(model_path="path/to/model.gguf", chat_format="chatml")
+llm.create_chat_completion(
     messages=[
         {
             "role": "system",
@@ -397,9 +403,9 @@ The following example will constrain the response to valid JSON strings only.
 To constrain the response further to a specific JSON Schema add the schema to the `schema` property of the `response_format` argument.
 ```python
->>> from llama_cpp import Llama
->>> llm = Llama(model_path="path/to/model.gguf", chat_format="chatml")
->>> llm.create_chat_completion(
+from llama_cpp import Llama
+llm = Llama(model_path="path/to/model.gguf", chat_format="chatml")
+llm.create_chat_completion(
     messages=[
         {
             "role": "system",
@@ -424,9 +430,9 @@ To constrain the response further to a specific JSON Schema add the schema to th
 The high-level API supports OpenAI compatible function and tool calling. This is possible through the `functionary` pre-trained models chat format or through the generic `chatml-function-calling` chat format.
 ```python
->>> from llama_cpp import Llama
->>> llm = Llama(model_path="path/to/chatml/llama-model.gguf", chat_format="chatml-function-calling")
->>> llm.create_chat_completion(
+from llama_cpp import Llama
+llm = Llama(model_path="path/to/chatml/llama-model.gguf", chat_format="chatml-function-calling")
+llm.create_chat_completion(
       messages = [
         {
           "role": "system",
@@ -476,9 +482,9 @@ The various gguf-converted files for this set of models can be found [here](http
 Due to discrepancies between llama.cpp and HuggingFace's tokenizers, it is required to provide HF Tokenizer for functionary. The `LlamaHFTokenizer` class can be initialized and passed into the Llama class. This will override the default llama.cpp tokenizer used in Llama class. The tokenizer files are already included in the respective HF repositories hosting the gguf files.
 ```python
->>> from llama_cpp import Llama
->>> from llama_cpp.llama_tokenizer import LlamaHFTokenizer
->>> llm = Llama.from_pretrained(
+from llama_cpp import Llama
+from llama_cpp.llama_tokenizer import LlamaHFTokenizer
+llm = Llama.from_pretrained(
   repo_id="meetkai/functionary-small-v2.2-GGUF",
   filename="functionary-small-v2.2.q4_0.gguf",
   chat_format="functionary-v2",
@@ -504,15 +510,15 @@ You'll first need to download one of the available multi-modal models in GGUF fo
 Then you'll need to use a custom chat handler to load the clip model and process the chat messages and images.
 ```python
->>> from llama_cpp import Llama
->>> from llama_cpp.llama_chat_format import Llava15ChatHandler
->>> chat_handler = Llava15ChatHandler(clip_model_path="path/to/llava/mmproj.bin")
->>> llm = Llama(
+from llama_cpp import Llama
+from llama_cpp.llama_chat_format import Llava15ChatHandler
+chat_handler = Llava15ChatHandler(clip_model_path="path/to/llava/mmproj.bin")
+llm = Llama(
   model_path="./path/to/llava/llama-model.gguf",
   chat_handler=chat_handler,
-  n_ctx=2048, # n_ctx should be increased to accomodate the image embedding
+  n_ctx=2048, # n_ctx should be increased to accommodate the image embedding
 )
->>> llm.create_chat_completion(
+llm.create_chat_completion(
     messages = [
         {"role": "system", "content": "You are an assistant who perfectly describes images."},
         {
@@ -529,19 +535,22 @@ Then you'll need to use a custom chat handler to load the clip model and process
 You can also pull the model from the Hugging Face Hub using the `from_pretrained` method.
 ```python
->>> from llama_cpp import Llama
->>> from llama_cpp.llama_chat_format import MoondreamChatHandler
->>> chat_handler = MoondreamChatHandler.from_pretrained(
+from llama_cpp import Llama
+from llama_cpp.llama_chat_format import MoondreamChatHandler
+chat_handler = MoondreamChatHandler.from_pretrained(
   repo_id="vikhyatk/moondream2",
   filename="*mmproj*",
 )
->>> llm = Llama.from_pretrained(
-  repo_id="vikhyatk/moondream2"
+llm = Llama.from_pretrained(
+  repo_id="vikhyatk/moondream2",
   filename="*text-model*",
   chat_handler=chat_handler,
-  n_ctx=2048, # n_ctx should be increased to accomodate the image embedding
+  n_ctx=2048, # n_ctx should be increased to accommodate the image embedding
 )
->>> llm.create_chat_completion(
+respoonse = llm.create_chat_completion(
     messages = [
         {
             "role": "user",
@@ -553,6 +562,7 @@ You can also pull the model from the Hugging Face Hub using the `from_pretrained
         }
     ]
 )
+print(response["choices"][0]["text"])
 ```
 **Note**: Multi-modal models also support tool calling and JSON mode.
@@ -705,18 +715,18 @@ The entire low-level API can be found in [llama_cpp/llama_cpp.py](https://github
 Below is a short example demonstrating how to use the low-level API to tokenize a prompt:
 ```python
->>> import llama_cpp
->>> import ctypes
->>> llama_cpp.llama_backend_init(False) # Must be called once at the start of each program
->>> params = llama_cpp.llama_context_default_params()
+import llama_cpp
+import ctypes
+llama_cpp.llama_backend_init(False) # Must be called once at the start of each program
+params = llama_cpp.llama_context_default_params()
 # use bytes for char * params
->>> model = llama_cpp.llama_load_model_from_file(b"./models/7b/llama-model.gguf", params)
->>> ctx = llama_cpp.llama_new_context_with_model(model, params)
->>> max_tokens = params.n_ctx
+model = llama_cpp.llama_load_model_from_file(b"./models/7b/llama-model.gguf", params)
+ctx = llama_cpp.llama_new_context_with_model(model, params)
+max_tokens = params.n_ctx
 # use ctypes arrays for array params
->>> tokens = (llama_cpp.llama_token * int(max_tokens))()
->>> n_tokens = llama_cpp.llama_tokenize(ctx, b"Q: Name the planets in the solar system? A: ", tokens, max_tokens, llama_cpp.c_bool(True))
->>> llama_cpp.llama_free(ctx)
+tokens = (llama_cpp.llama_token * int(max_tokens))()
+n_tokens = llama_cpp.llama_tokenize(ctx, b"Q: Name the planets in the solar system? A: ", tokens, max_tokens, llama_cpp.c_bool(True))
+llama_cpp.llama_free(ctx)
 ```
 Check out the [examples folder](examples/low_level_api) for more examples of using the low-level API.

{llama_cpp_python-0.2.68 → llama_cpp_python-0.2.70}/llama_cpp/__init__.py RENAMED Viewed

@@ -1,4 +1,4 @@
 from .llama_cpp import *
 from .llama import *
-__version__ = "0.2.68"
+__version__ = "0.2.70"

llama-cpp-python 0.2.68__tar.gz → 0.2.70__tar.gz

llama-cpp-python 0.2.68tar.gz → 0.2.70tar.gz