ovos-gguf-plugin 1.2.0a3__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (31) hide show
  1. ovos_gguf_plugin-1.2.0a3/LICENSE +21 -0
  2. ovos_gguf_plugin-1.2.0a3/PKG-INFO +226 -0
  3. ovos_gguf_plugin-1.2.0a3/README.md +200 -0
  4. ovos_gguf_plugin-1.2.0a3/ovos_gguf_plugin/__init__.py +5 -0
  5. ovos_gguf_plugin-1.2.0a3/ovos_gguf_plugin/chat.py +193 -0
  6. ovos_gguf_plugin-1.2.0a3/ovos_gguf_plugin/dialog_transformers.py +50 -0
  7. ovos_gguf_plugin-1.2.0a3/ovos_gguf_plugin/embeddings.py +101 -0
  8. ovos_gguf_plugin-1.2.0a3/ovos_gguf_plugin/locale/en-us/detect_system.prompt +1 -0
  9. ovos_gguf_plugin-1.2.0a3/ovos_gguf_plugin/locale/en-us/detect_user.prompt +1 -0
  10. ovos_gguf_plugin-1.2.0a3/ovos_gguf_plugin/locale/en-us/dialog_transform_system.prompt +1 -0
  11. ovos_gguf_plugin-1.2.0a3/ovos_gguf_plugin/locale/en-us/summarize_system.prompt +1 -0
  12. ovos_gguf_plugin-1.2.0a3/ovos_gguf_plugin/locale/en-us/summarize_user.prompt +5 -0
  13. ovos_gguf_plugin-1.2.0a3/ovos_gguf_plugin/locale/en-us/translate_no_source.prompt +3 -0
  14. ovos_gguf_plugin-1.2.0a3/ovos_gguf_plugin/locale/en-us/translate_system.prompt +1 -0
  15. ovos_gguf_plugin-1.2.0a3/ovos_gguf_plugin/locale/en-us/translate_with_source.prompt +3 -0
  16. ovos_gguf_plugin-1.2.0a3/ovos_gguf_plugin/prompts.py +58 -0
  17. ovos_gguf_plugin-1.2.0a3/ovos_gguf_plugin/summarizer.py +45 -0
  18. ovos_gguf_plugin-1.2.0a3/ovos_gguf_plugin/translate.py +153 -0
  19. ovos_gguf_plugin-1.2.0a3/ovos_gguf_plugin/version.py +8 -0
  20. ovos_gguf_plugin-1.2.0a3/ovos_gguf_plugin.egg-info/PKG-INFO +226 -0
  21. ovos_gguf_plugin-1.2.0a3/ovos_gguf_plugin.egg-info/SOURCES.txt +29 -0
  22. ovos_gguf_plugin-1.2.0a3/ovos_gguf_plugin.egg-info/dependency_links.txt +1 -0
  23. ovos_gguf_plugin-1.2.0a3/ovos_gguf_plugin.egg-info/entry_points.txt +17 -0
  24. ovos_gguf_plugin-1.2.0a3/ovos_gguf_plugin.egg-info/requires.txt +13 -0
  25. ovos_gguf_plugin-1.2.0a3/ovos_gguf_plugin.egg-info/top_level.txt +1 -0
  26. ovos_gguf_plugin-1.2.0a3/pyproject.toml +71 -0
  27. ovos_gguf_plugin-1.2.0a3/setup.cfg +4 -0
  28. ovos_gguf_plugin-1.2.0a3/test/test_e2e.py +80 -0
  29. ovos_gguf_plugin-1.2.0a3/test/test_e2e_persona_pipeline.py +168 -0
  30. ovos_gguf_plugin-1.2.0a3/test/test_embeddings.py +121 -0
  31. ovos_gguf_plugin-1.2.0a3/test/test_prompts.py +59 -0
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2024 OpenVoiceOS
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,226 @@
1
+ Metadata-Version: 2.4
2
+ Name: ovos-gguf-plugin
3
+ Version: 1.2.0a3
4
+ Summary: local LLM plugin for OpenVoiceOS persona framework
5
+ Author-email: jarbasai <jarbasai@mailfence.com>
6
+ License-Expression: MIT
7
+ Project-URL: Homepage, https://github.com/OpenVoiceOS/ovos-gguf-plugin
8
+ Project-URL: Repository, https://github.com/OpenVoiceOS/ovos-gguf-plugin
9
+ Keywords: OVOS,openvoiceos,plugin,utterance,fallback,query
10
+ Requires-Python: >=3.9
11
+ Description-Content-Type: text/markdown
12
+ License-File: LICENSE
13
+ Requires-Dist: ovos-plugin-manager<3.0.0,>=2.2.3a1
14
+ Requires-Dist: ovos-spec-tools>=0.8.0a2
15
+ Requires-Dist: huggingface-hub
16
+ Requires-Dist: llama-cpp-python
17
+ Requires-Dist: numpy
18
+ Requires-Dist: sentence_stream
19
+ Provides-Extra: test
20
+ Requires-Dist: pytest; extra == "test"
21
+ Requires-Dist: numpy; extra == "test"
22
+ Requires-Dist: ovoscope; extra == "test"
23
+ Requires-Dist: ovos-persona; extra == "test"
24
+ Requires-Dist: ovos-solver-failure-plugin; extra == "test"
25
+ Dynamic: license-file
26
+
27
+ # ovos-gguf-plugin
28
+
29
+ Unified GGUF wrapper for OpenVoiceOS — chat, summarization, dialog rewriting, translation, language detection, and text embeddings, all backed by quantized GGUF models via `llama-cpp-python`.
30
+
31
+ ## Install
32
+
33
+ ```bash
34
+ pip install ovos-gguf-plugin
35
+ ```
36
+
37
+ For GPU inference, rebuild `llama-cpp-python` with CUDA support first:
38
+
39
+ ```bash
40
+ CMAKE_ARGS="-DGGML_CUDA=on" FORCE_CMAKE=1 pip install llama-cpp-python --force-reinstall --no-cache-dir
41
+ ```
42
+
43
+ ## Plugin entry points
44
+
45
+ | Entry-point group | Plugin name | Class | Role |
46
+ |---|---|---|---|
47
+ | `opm.agents.chat` | `ovos-chat-gguf-plugin` | `GGUFChatEngine` | conversational chat / question answering |
48
+ | `opm.agents.summarizer` | `ovos-summarizer-gguf-plugin` | `GGUFSummarizer` | text summarization |
49
+ | `opm.transformer.dialog` | `ovos-dialog-transformer-gguf-plugin` | `GGUFDialogTransformer` | dialog rewriting |
50
+ | `opm.lang.translate` | `ovos-translate-gguf-plugin` | `GGUFTextTranslator` | machine translation |
51
+ | `opm.lang.detect` | `ovos-lang-detect-gguf-plugin` | `GGUFTextLangDetector` | language detection |
52
+ | `opm.embeddings.text` | `ovos-gguf-embeddings-plugin` | `GGUFEmbeddings` | text embeddings |
53
+
54
+ ## Quickstart
55
+
56
+ ### Chat
57
+
58
+ ```python
59
+ from ovos_gguf_plugin.chat import GGUFChatEngine
60
+ from ovos_plugin_manager.templates.agents import AgentMessage, MessageRole
61
+
62
+ engine = GGUFChatEngine({
63
+ "model": "afrideva/Smol-Llama-101M-Chat-v1-GGUF",
64
+ "remote_filename": "*q2_k.gguf",
65
+ "max_tokens": 128,
66
+ })
67
+ msgs = [AgentMessage(role=MessageRole.USER, content="Tell me a joke.")]
68
+ # stream sentence-by-sentence (suitable for TTS)
69
+ for sentence in engine.stream_sentences(msgs):
70
+ print(sentence)
71
+ # or get the full response at once
72
+ reply = engine.continue_chat(msgs)
73
+ print(reply.content)
74
+ ```
75
+
76
+ ### Summarizer
77
+
78
+ ```python
79
+ from ovos_gguf_plugin.summarizer import GGUFSummarizer
80
+
81
+ s = GGUFSummarizer({
82
+ "model": "Qwen/Qwen2-0.5B-Instruct-GGUF",
83
+ "remote_filename": "*q8_0.gguf",
84
+ })
85
+ print(s.summarize("Long document text goes here ... " * 20))
86
+ ```
87
+
88
+ ### Dialog transformer
89
+
90
+ ```python
91
+ from ovos_gguf_plugin.dialog_transformers import GGUFDialogTransformer
92
+
93
+ dt = GGUFDialogTransformer({
94
+ "model": "Qwen/Qwen2-0.5B-Instruct-GGUF",
95
+ "remote_filename": "*q8_0.gguf",
96
+ })
97
+ print(dt.transform("gonna grab some food real quick"))
98
+ ```
99
+
100
+ ### Translation
101
+
102
+ ```python
103
+ from ovos_gguf_plugin.translate import GGUFTextTranslator
104
+
105
+ tx = GGUFTextTranslator({
106
+ "model": "TheBloke/TowerInstruct-7B-v0.1-GGUF",
107
+ "remote_filename": "*Q4_K_M.gguf",
108
+ })
109
+ print(tx.translate("the easiest way to contribute is to help with translations",
110
+ target="es-es"))
111
+ ```
112
+
113
+ ### Language detection
114
+
115
+ ```python
116
+ from ovos_gguf_plugin.translate import GGUFTextLangDetector
117
+
118
+ dt = GGUFTextLangDetector({
119
+ "model": "Qwen/Qwen2-0.5B-Instruct-GGUF",
120
+ "remote_filename": "*q8_0.gguf",
121
+ })
122
+ print(dt.detect("you can help without any programming knowledge")) # → en
123
+ ```
124
+
125
+ ### Text embeddings
126
+
127
+ ```python
128
+ from ovos_gguf_plugin.embeddings import GGUFEmbeddings
129
+
130
+ emb = GGUFEmbeddings({"model": "all-MiniLM-L6-v2"})
131
+ vector = emb.get_embeddings("hello world")
132
+ print(len(vector), "dims")
133
+ ```
134
+
135
+ `model` accepts a friendly name from `GGUFEmbeddings.DEFAULT_MODELS` (e.g. `labse`, `all-MiniLM-L6-v2`, `nomic-embed-text-v1.5`, `bge-large-en-v1.5`), a bare Hugging Face repo id (with `remote_filename`), or a local `.gguf` path. Default is `labse`.
136
+
137
+ As an OVOS text-embeddings plugin it is selected by name (`ovos-gguf-embeddings-plugin`), so it is a drop-in for anything that previously used the standalone embeddings plugin.
138
+
139
+ ## Configuration
140
+
141
+ All wrappers share the same config keys:
142
+
143
+ | Key | Default | Description |
144
+ |---|---|---|
145
+ | `model` | required | Local `.gguf` path, HuggingFace repo id, or friendly name (embeddings) |
146
+ | `remote_filename` | `*Q4_K_M.gguf` | Glob for selecting the file from a HF repo |
147
+ | `n_gpu_layers` | `0` | GPU layers to offload (`-1` = all) |
148
+ | `chat_format` | `None` | llama.cpp chat format (auto-detected for most models) |
149
+ | `verbose` | `True` | llama.cpp verbosity |
150
+ | `max_tokens` | `512` | Maximum tokens to generate |
151
+ | `system_prompt` | locale default | Override the system prompt |
152
+
153
+ See [`docs/configuration.md`](docs/configuration.md) for the full reference including per-wrapper options and GPU build instructions.
154
+
155
+ ## Localized prompts
156
+
157
+ System prompts and templates ship as `.prompt` resource files under `ovos_gguf_plugin/locale/<lang>/` and are loaded via [ovos-spec-tools](https://github.com/OpenVoiceOS/ovos-spec-tools) (OVOS-INTENT-2 §4.4). Drop translated `.prompt` files under a new `locale/<lang>/` to add a language; English `en-us` ships by default and is the fallback. A `system_prompt` in config overrides the locale file.
158
+
159
+ See [`docs/localization.md`](docs/localization.md) for the full guide.
160
+
161
+ ## OVOS Persona Framework
162
+
163
+ ```json
164
+ {
165
+ "name": "MyAssistant",
166
+ "solvers": ["ovos-solver-gguf-plugin"],
167
+ "ovos-solver-gguf-plugin": {
168
+ "model": "TheBloke/notus-7B-v1-GGUF",
169
+ "remote_filename": "*Q4_K_M.gguf",
170
+ "persona": "You are a helpful assistant.",
171
+ "verbose": false
172
+ }
173
+ }
174
+ ```
175
+
176
+ ```bash
177
+ ovos-persona-server --persona my_persona.json
178
+ ```
179
+
180
+ ## Documentation
181
+
182
+ - [`docs/configuration.md`](docs/configuration.md) — full config reference, GPU build, per-wrapper notes
183
+ - [`docs/localization.md`](docs/localization.md) — `.prompt` system, adding a language
184
+ - [`docs/models.md`](docs/models.md) — recommended models per wrapper (including tiny CI-friendly ones)
185
+
186
+ ## Examples
187
+
188
+ Runnable scripts under [`examples/`](examples/):
189
+
190
+ - [`chat_example.py`](examples/chat_example.py)
191
+ - [`embeddings_example.py`](examples/embeddings_example.py)
192
+ - [`translate_example.py`](examples/translate_example.py)
193
+ - [`lang_detect_example.py`](examples/lang_detect_example.py)
194
+ - [`summarize_example.py`](examples/summarize_example.py)
195
+
196
+ ## Testing
197
+
198
+ ```bash
199
+ pip install "ovos-gguf-plugin[test]"
200
+ python -m pytest test/ -v
201
+ ```
202
+
203
+ The test suite contains:
204
+
205
+ - `test/test_embeddings.py` — hermetic unit tests (mocked llama.cpp, no downloads)
206
+ - `test/test_prompts.py` — hermetic unit tests for localized prompt loading
207
+ - `test/test_e2e.py` — real-model end-to-end tests (downloads tiny GGUFs once, ~70 MB total):
208
+ - chat: `afrideva/Smol-Llama-101M-Chat-v1-GGUF` q2_k (~45 MB)
209
+ - embeddings: `leliuga/all-MiniLM-L6-v2-GGUF` Q4_K_M (~23 MB)
210
+
211
+ ## Credits
212
+
213
+ Originally developed by [TigreGótico](https://tigregotico.pt) for [OpenVoiceOS](https://openvoiceos.org),
214
+ sponsored by VisioLab. Modernized under the [NGI0 Commons Fund](https://nlnet.nl/commonsfund) / [NLnet](https://nlnet.nl).
215
+
216
+ <img src="https://github.com/user-attachments/assets/809588a2-32a2-406c-98c0-f88bf7753cb4" width="220" alt="VisioLab"/>
217
+
218
+ > This work was sponsored by VisioLab, part of [Royal Dutch Visio](https://visio.org/), is the test, education, and research center in the field of (innovative) assistive technology for blind and visually impaired people and professionals. We explore (new) technological developments such as Voice, VR and AI and make the knowledge and expertise we gain available to everyone.
219
+
220
+ [![NGI0 Commons Fund](./ngi.png)](https://nlnet.nl/project/OpenVoiceOS)
221
+
222
+ This project was funded through the [NGI0 Commons Fund](https://nlnet.nl/commonsfund),
223
+ a fund established by [NLnet](https://nlnet.nl) with financial support from the
224
+ European Commission's [Next Generation Internet](https://ngi.eu) programme, under
225
+ the aegis of [DG Communications Networks, Content and Technology](https://commission.europa.eu/about-european-commission/departments-and-executive-agencies/communications-networks-content-and-technology_en)
226
+ under grant agreement No [101135429](https://cordis.europa.eu/project/id/101135429).
@@ -0,0 +1,200 @@
1
+ # ovos-gguf-plugin
2
+
3
+ Unified GGUF wrapper for OpenVoiceOS — chat, summarization, dialog rewriting, translation, language detection, and text embeddings, all backed by quantized GGUF models via `llama-cpp-python`.
4
+
5
+ ## Install
6
+
7
+ ```bash
8
+ pip install ovos-gguf-plugin
9
+ ```
10
+
11
+ For GPU inference, rebuild `llama-cpp-python` with CUDA support first:
12
+
13
+ ```bash
14
+ CMAKE_ARGS="-DGGML_CUDA=on" FORCE_CMAKE=1 pip install llama-cpp-python --force-reinstall --no-cache-dir
15
+ ```
16
+
17
+ ## Plugin entry points
18
+
19
+ | Entry-point group | Plugin name | Class | Role |
20
+ |---|---|---|---|
21
+ | `opm.agents.chat` | `ovos-chat-gguf-plugin` | `GGUFChatEngine` | conversational chat / question answering |
22
+ | `opm.agents.summarizer` | `ovos-summarizer-gguf-plugin` | `GGUFSummarizer` | text summarization |
23
+ | `opm.transformer.dialog` | `ovos-dialog-transformer-gguf-plugin` | `GGUFDialogTransformer` | dialog rewriting |
24
+ | `opm.lang.translate` | `ovos-translate-gguf-plugin` | `GGUFTextTranslator` | machine translation |
25
+ | `opm.lang.detect` | `ovos-lang-detect-gguf-plugin` | `GGUFTextLangDetector` | language detection |
26
+ | `opm.embeddings.text` | `ovos-gguf-embeddings-plugin` | `GGUFEmbeddings` | text embeddings |
27
+
28
+ ## Quickstart
29
+
30
+ ### Chat
31
+
32
+ ```python
33
+ from ovos_gguf_plugin.chat import GGUFChatEngine
34
+ from ovos_plugin_manager.templates.agents import AgentMessage, MessageRole
35
+
36
+ engine = GGUFChatEngine({
37
+ "model": "afrideva/Smol-Llama-101M-Chat-v1-GGUF",
38
+ "remote_filename": "*q2_k.gguf",
39
+ "max_tokens": 128,
40
+ })
41
+ msgs = [AgentMessage(role=MessageRole.USER, content="Tell me a joke.")]
42
+ # stream sentence-by-sentence (suitable for TTS)
43
+ for sentence in engine.stream_sentences(msgs):
44
+ print(sentence)
45
+ # or get the full response at once
46
+ reply = engine.continue_chat(msgs)
47
+ print(reply.content)
48
+ ```
49
+
50
+ ### Summarizer
51
+
52
+ ```python
53
+ from ovos_gguf_plugin.summarizer import GGUFSummarizer
54
+
55
+ s = GGUFSummarizer({
56
+ "model": "Qwen/Qwen2-0.5B-Instruct-GGUF",
57
+ "remote_filename": "*q8_0.gguf",
58
+ })
59
+ print(s.summarize("Long document text goes here ... " * 20))
60
+ ```
61
+
62
+ ### Dialog transformer
63
+
64
+ ```python
65
+ from ovos_gguf_plugin.dialog_transformers import GGUFDialogTransformer
66
+
67
+ dt = GGUFDialogTransformer({
68
+ "model": "Qwen/Qwen2-0.5B-Instruct-GGUF",
69
+ "remote_filename": "*q8_0.gguf",
70
+ })
71
+ print(dt.transform("gonna grab some food real quick"))
72
+ ```
73
+
74
+ ### Translation
75
+
76
+ ```python
77
+ from ovos_gguf_plugin.translate import GGUFTextTranslator
78
+
79
+ tx = GGUFTextTranslator({
80
+ "model": "TheBloke/TowerInstruct-7B-v0.1-GGUF",
81
+ "remote_filename": "*Q4_K_M.gguf",
82
+ })
83
+ print(tx.translate("the easiest way to contribute is to help with translations",
84
+ target="es-es"))
85
+ ```
86
+
87
+ ### Language detection
88
+
89
+ ```python
90
+ from ovos_gguf_plugin.translate import GGUFTextLangDetector
91
+
92
+ dt = GGUFTextLangDetector({
93
+ "model": "Qwen/Qwen2-0.5B-Instruct-GGUF",
94
+ "remote_filename": "*q8_0.gguf",
95
+ })
96
+ print(dt.detect("you can help without any programming knowledge")) # → en
97
+ ```
98
+
99
+ ### Text embeddings
100
+
101
+ ```python
102
+ from ovos_gguf_plugin.embeddings import GGUFEmbeddings
103
+
104
+ emb = GGUFEmbeddings({"model": "all-MiniLM-L6-v2"})
105
+ vector = emb.get_embeddings("hello world")
106
+ print(len(vector), "dims")
107
+ ```
108
+
109
+ `model` accepts a friendly name from `GGUFEmbeddings.DEFAULT_MODELS` (e.g. `labse`, `all-MiniLM-L6-v2`, `nomic-embed-text-v1.5`, `bge-large-en-v1.5`), a bare Hugging Face repo id (with `remote_filename`), or a local `.gguf` path. Default is `labse`.
110
+
111
+ As an OVOS text-embeddings plugin it is selected by name (`ovos-gguf-embeddings-plugin`), so it is a drop-in for anything that previously used the standalone embeddings plugin.
112
+
113
+ ## Configuration
114
+
115
+ All wrappers share the same config keys:
116
+
117
+ | Key | Default | Description |
118
+ |---|---|---|
119
+ | `model` | required | Local `.gguf` path, HuggingFace repo id, or friendly name (embeddings) |
120
+ | `remote_filename` | `*Q4_K_M.gguf` | Glob for selecting the file from a HF repo |
121
+ | `n_gpu_layers` | `0` | GPU layers to offload (`-1` = all) |
122
+ | `chat_format` | `None` | llama.cpp chat format (auto-detected for most models) |
123
+ | `verbose` | `True` | llama.cpp verbosity |
124
+ | `max_tokens` | `512` | Maximum tokens to generate |
125
+ | `system_prompt` | locale default | Override the system prompt |
126
+
127
+ See [`docs/configuration.md`](docs/configuration.md) for the full reference including per-wrapper options and GPU build instructions.
128
+
129
+ ## Localized prompts
130
+
131
+ System prompts and templates ship as `.prompt` resource files under `ovos_gguf_plugin/locale/<lang>/` and are loaded via [ovos-spec-tools](https://github.com/OpenVoiceOS/ovos-spec-tools) (OVOS-INTENT-2 §4.4). Drop translated `.prompt` files under a new `locale/<lang>/` to add a language; English `en-us` ships by default and is the fallback. A `system_prompt` in config overrides the locale file.
132
+
133
+ See [`docs/localization.md`](docs/localization.md) for the full guide.
134
+
135
+ ## OVOS Persona Framework
136
+
137
+ ```json
138
+ {
139
+ "name": "MyAssistant",
140
+ "solvers": ["ovos-solver-gguf-plugin"],
141
+ "ovos-solver-gguf-plugin": {
142
+ "model": "TheBloke/notus-7B-v1-GGUF",
143
+ "remote_filename": "*Q4_K_M.gguf",
144
+ "persona": "You are a helpful assistant.",
145
+ "verbose": false
146
+ }
147
+ }
148
+ ```
149
+
150
+ ```bash
151
+ ovos-persona-server --persona my_persona.json
152
+ ```
153
+
154
+ ## Documentation
155
+
156
+ - [`docs/configuration.md`](docs/configuration.md) — full config reference, GPU build, per-wrapper notes
157
+ - [`docs/localization.md`](docs/localization.md) — `.prompt` system, adding a language
158
+ - [`docs/models.md`](docs/models.md) — recommended models per wrapper (including tiny CI-friendly ones)
159
+
160
+ ## Examples
161
+
162
+ Runnable scripts under [`examples/`](examples/):
163
+
164
+ - [`chat_example.py`](examples/chat_example.py)
165
+ - [`embeddings_example.py`](examples/embeddings_example.py)
166
+ - [`translate_example.py`](examples/translate_example.py)
167
+ - [`lang_detect_example.py`](examples/lang_detect_example.py)
168
+ - [`summarize_example.py`](examples/summarize_example.py)
169
+
170
+ ## Testing
171
+
172
+ ```bash
173
+ pip install "ovos-gguf-plugin[test]"
174
+ python -m pytest test/ -v
175
+ ```
176
+
177
+ The test suite contains:
178
+
179
+ - `test/test_embeddings.py` — hermetic unit tests (mocked llama.cpp, no downloads)
180
+ - `test/test_prompts.py` — hermetic unit tests for localized prompt loading
181
+ - `test/test_e2e.py` — real-model end-to-end tests (downloads tiny GGUFs once, ~70 MB total):
182
+ - chat: `afrideva/Smol-Llama-101M-Chat-v1-GGUF` q2_k (~45 MB)
183
+ - embeddings: `leliuga/all-MiniLM-L6-v2-GGUF` Q4_K_M (~23 MB)
184
+
185
+ ## Credits
186
+
187
+ Originally developed by [TigreGótico](https://tigregotico.pt) for [OpenVoiceOS](https://openvoiceos.org),
188
+ sponsored by VisioLab. Modernized under the [NGI0 Commons Fund](https://nlnet.nl/commonsfund) / [NLnet](https://nlnet.nl).
189
+
190
+ <img src="https://github.com/user-attachments/assets/809588a2-32a2-406c-98c0-f88bf7753cb4" width="220" alt="VisioLab"/>
191
+
192
+ > This work was sponsored by VisioLab, part of [Royal Dutch Visio](https://visio.org/), is the test, education, and research center in the field of (innovative) assistive technology for blind and visually impaired people and professionals. We explore (new) technological developments such as Voice, VR and AI and make the knowledge and expertise we gain available to everyone.
193
+
194
+ [![NGI0 Commons Fund](./ngi.png)](https://nlnet.nl/project/OpenVoiceOS)
195
+
196
+ This project was funded through the [NGI0 Commons Fund](https://nlnet.nl/commonsfund),
197
+ a fund established by [NLnet](https://nlnet.nl) with financial support from the
198
+ European Commission's [Next Generation Internet](https://ngi.eu) programme, under
199
+ the aegis of [DG Communications Networks, Content and Technology](https://commission.europa.eu/about-european-commission/departments-and-executive-agencies/communications-networks-content-and-technology_en)
200
+ under grant agreement No [101135429](https://cordis.europa.eu/project/id/101135429).
@@ -0,0 +1,5 @@
1
+ from ovos_gguf_plugin.chat import GGUFChatEngine
2
+ from ovos_gguf_plugin.summarizer import GGUFSummarizer
3
+ from ovos_gguf_plugin.dialog_transformers import GGUFDialogTransformer
4
+ from ovos_gguf_plugin.translate import GGUFTextLangDetector, GGUFTextTranslator
5
+ from ovos_gguf_plugin.embeddings import GGUFEmbeddings
@@ -0,0 +1,193 @@
1
+ import os
2
+ from typing import Dict, Optional, List, Any, Iterable
3
+ from sentence_stream import SentenceBoundaryDetector
4
+
5
+ from llama_cpp import Llama
6
+ from ovos_plugin_manager.templates.agents import ChatEngine, AgentMessage, MessageRole
7
+ from ovos_utils.log import LOG
8
+
9
+
10
+ class GGUFChatEngine(ChatEngine):
11
+ def __init__(self, config: Optional[Dict[str, Any]] = None,
12
+ gguf_engine: Optional[Llama] = None):
13
+ config = config or {}
14
+ super().__init__(config)
15
+ if gguf_engine:
16
+ self.model = gguf_engine
17
+ else:
18
+ if "model" not in self.config:
19
+ raise ValueError("no 'model' set in config")
20
+ model = self.config["model"]
21
+ if os.path.isfile(model): # local path
22
+ LOG.info(f"Loading GGUF model: {model}")
23
+ self.model = Llama(
24
+ model_path=model,
25
+ n_gpu_layers=self.config.get("n_gpu_layers", 0),
26
+ chat_format=self.config.get("chat_format"),
27
+ verbose=self.config.get("verbose", True))
28
+ else:
29
+ fname = self.config.get("remote_filename", "*Q4_K_M.gguf")
30
+ LOG.info(f"Loading GGUF model from hub: {model} from file: {fname}")
31
+ self.model = Llama.from_pretrained(
32
+ repo_id=model,
33
+ filename=fname,
34
+ n_gpu_layers=self.config.get("n_gpu_layers", 0),
35
+ chat_format=self.config.get("chat_format"),
36
+ verbose=self.config.get("verbose", True)
37
+ )
38
+ LOG.info("GGUF model loaded!")
39
+ self.system_prompt = self.config.get("system_prompt")
40
+ self.allow_system = self.config.get("allow_system_prompts") or False
41
+
42
+ def validate_messages(self, messages: List[AgentMessage]) -> List[AgentMessage]:
43
+ """
44
+ Prepares the message list by enforcing system prompt rules.
45
+
46
+ This method:
47
+ 1. Strips existing system messages if `allow_system` is False.
48
+ 2. Injects the configured `system_prompt` if it exists.
49
+ 3. Merges the configured system prompt with an existing one if
50
+ `allow_system` is True.
51
+
52
+ Args:
53
+ messages (List[AgentMessage]): The raw input history of messages.
54
+
55
+ Returns:
56
+ List[AgentMessage]: The processed list of messages ready for the API.
57
+ """
58
+ if not self.allow_system:
59
+ messages = [m for m in messages if m.role != MessageRole.SYSTEM]
60
+
61
+ if not messages:
62
+ if self.system_prompt:
63
+ return [AgentMessage(role=MessageRole.SYSTEM, content=self.system_prompt)]
64
+ return []
65
+
66
+ if self.system_prompt:
67
+ sysm = AgentMessage(role=MessageRole.SYSTEM, content=self.system_prompt)
68
+ if messages and messages[0].role == MessageRole.SYSTEM:
69
+ if self.allow_system: # merge system prompts
70
+ sysm = AgentMessage(role=MessageRole.SYSTEM,
71
+ content=self.system_prompt + "\n" + messages[0].content)
72
+ # replace system prompt
73
+ messages[0] = sysm
74
+ else:
75
+ messages.insert(0, sysm)
76
+ return messages
77
+
78
+ ###########################################################
79
+ # abstract methods to be implemented by individual plugins
80
+ ###########################################################
81
+ def continue_chat(self, messages: List[AgentMessage],
82
+ session_id: str = "default",
83
+ lang: Optional[str] = None,
84
+ units: Optional[str] = None) -> AgentMessage:
85
+ """
86
+ Generate a response message based on the provided chat history.
87
+
88
+ Args:
89
+ messages (List[AgentMessage]): Full list of messages in the conversation.
90
+ session_id (str): Identifier for the session.
91
+ lang (str, optional): BCP-47 language code.
92
+ units (str, optional): Preferred unit system (e.g., "metric", "imperial").
93
+
94
+ Returns:
95
+ AgentMessage: The generated response message from the assistant.
96
+ """
97
+ ans = self.model.create_chat_completion(
98
+ messages=[
99
+ {"role": m.role, "content": m.content}
100
+ for m in self.validate_messages(messages)
101
+ ],
102
+ max_tokens=self.config.get("max_tokens"),
103
+ stream=False
104
+ )["choices"][0]["message"]
105
+ return AgentMessage(role=MessageRole.ASSISTANT, content=ans["content"])
106
+
107
+ def stream_tokens(self, messages: List[AgentMessage],
108
+ session_id: str = "default",
109
+ lang: Optional[str] = None,
110
+ units: Optional[str] = None) -> Iterable[str]:
111
+ """
112
+ Stream back response tokens as they are generated.
113
+
114
+ Returns partial sentences and is not suitable for direct TTS.
115
+
116
+ Once merged the output corresponds to the content of a AgentMessage with MessageRole.ASSISTANT
117
+
118
+ Note:
119
+ Default implementation yields the full response from continue_chat.
120
+ Subclasses should override this for real-time token streaming.
121
+
122
+ Args:
123
+ messages (List[AgentMessage]): Full list of messages.
124
+ session_id (str): Identifier for the session.
125
+ lang (str, optional): Language code.
126
+ units (str, optional): Unit system.
127
+
128
+ Returns:
129
+ Iterable[str]: A stream of tokens/partial text.
130
+ """
131
+ # With stream=True, the output is of type `Iterator[CompletionChunk]`.
132
+ ans = self.model.create_chat_completion(
133
+ messages=[
134
+ {"role": m.role, "content": m.content}
135
+ for m in self.validate_messages(messages)
136
+ ],
137
+ max_tokens=self.config.get("max_tokens"),
138
+ stream=True
139
+ )
140
+ for item in ans:
141
+ chunk = item['choices'][0]["delta"].get("content")
142
+ if chunk:
143
+ yield chunk
144
+
145
+ def stream_sentences(self, messages: List[AgentMessage],
146
+ session_id: str = "default",
147
+ lang: Optional[str] = None,
148
+ units: Optional[str] = None) -> Iterable[str]:
149
+ """
150
+ Stream back response sentences as they are generated.
151
+
152
+ Returns full sentences only, suitable for direct TTS.
153
+
154
+ Once merged the output corresponds to the content of a AgentMessage with MessageRole.ASSISTANT
155
+
156
+ Note:
157
+ Default implementation yields the full response from continue_chat.
158
+ Subclasses should override this for real-time token streaming.
159
+
160
+ Args:
161
+ messages (List[AgentMessage]): Full list of messages.
162
+ session_id (str): Identifier for the session.
163
+ lang (str, optional): Language code.
164
+ units (str, optional): Unit system.
165
+
166
+ Returns:
167
+ Iterable[str]: A stream of tokens/partial text.
168
+ """
169
+ boundary_detector = SentenceBoundaryDetector()
170
+ for tok in self.stream_tokens(messages):
171
+ yield from boundary_detector.add_chunk(tok)
172
+ final_text = boundary_detector.finish()
173
+ if final_text and not self.config.get("drop_incomplete_sentences", True):
174
+ yield final_text
175
+
176
+
177
+ if __name__ == "__main__":
178
+ LOG.set_level("DEBUG")
179
+
180
+ cfg = {
181
+ "model": "Qwen/Qwen2-0.5B-Instruct-GGUF",
182
+ "remote_filename": "*q8_0.gguf"
183
+ }
184
+
185
+ query = """The possibility of alien life in the solar system has been a topic of interest for scientists and astronomers for many years. The search for extraterrestrial life has been a major focus of space exploration, with numerous missions and discoveries made in recent years. While there is still no concrete evidence of life beyond Earth, the search for alien life continues to be a fascinating and exciting endeavor.
186
+ One of the most promising areas for the search for alien life is the moons of Jupiter and Saturn. These moons, such as Europa and Enceladus, are believed to have subsurface oceans that could potentially harbor life. The presence of water, a key ingredient for life as we know it, has been detected on these moons, and there are also indications of other necessary elements such as carbon, nitrogen, and oxygen.
187
+ Another area of interest for the search for alien life is the asteroid belt between Mars and Jupiter. This region is home to millions of asteroids, some of which may have the right conditions for life to exist. For example, some asteroids have been found to have water and organic compounds, which are essential for life.
188
+ In addition to the moons and asteroids of the solar system, there are also other potential locations for the search for alien life. For example, there are exoplanets, or planets outside of our solar system, that have been discovered in recent years. Some of these exoplanets are believed to be in the habitable zone, which means they are located in the right distance from their star to potentially have liquid water on their surface.
189
+ Despite the potential for alien life in the solar system, there are still many uncertainties and unknowns. The search for extraterrestrial life is a complex and multifaceted endeavor that requires a combination of scientific research, technological advancements, and exploration. While there is still no concrete evidence of life beyond Earth, the search for alien life continues to be a fascinating and exciting endeavor that holds the potential for groundbreaking discoveries in the future."""
190
+ s = GGUFChatEngine(cfg)
191
+ messages = [AgentMessage(role=MessageRole.USER, content=query)]
192
+ for sent in s.stream_sentences(messages):
193
+ print(sent)