lemonade-python-sdk 1.0.2__tar.gz → 1.0.3__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (23) hide show
  1. lemonade_python_sdk-1.0.3/PKG-INFO +368 -0
  2. lemonade_python_sdk-1.0.3/README.md +327 -0
  3. lemonade_python_sdk-1.0.3/lemonade_python_sdk.egg-info/PKG-INFO +368 -0
  4. {lemonade_python_sdk-1.0.2 → lemonade_python_sdk-1.0.3}/lemonade_python_sdk.egg-info/SOURCES.txt +1 -0
  5. lemonade_python_sdk-1.0.3/lemonade_sdk/__init__.py +17 -0
  6. lemonade_python_sdk-1.0.3/lemonade_sdk/audio_stream.py +403 -0
  7. lemonade_python_sdk-1.0.3/lemonade_sdk/client.py +454 -0
  8. lemonade_python_sdk-1.0.3/lemonade_sdk/request_builder.py +325 -0
  9. {lemonade_python_sdk-1.0.2 → lemonade_python_sdk-1.0.3}/setup.py +1 -1
  10. lemonade_python_sdk-1.0.2/PKG-INFO +0 -180
  11. lemonade_python_sdk-1.0.2/README.md +0 -139
  12. lemonade_python_sdk-1.0.2/lemonade_python_sdk.egg-info/PKG-INFO +0 -180
  13. lemonade_python_sdk-1.0.2/lemonade_sdk/__init__.py +0 -6
  14. lemonade_python_sdk-1.0.2/lemonade_sdk/client.py +0 -177
  15. lemonade_python_sdk-1.0.2/lemonade_sdk/request_builder.py +0 -137
  16. {lemonade_python_sdk-1.0.2 → lemonade_python_sdk-1.0.3}/LICENSE +0 -0
  17. {lemonade_python_sdk-1.0.2 → lemonade_python_sdk-1.0.3}/lemonade_python_sdk.egg-info/dependency_links.txt +0 -0
  18. {lemonade_python_sdk-1.0.2 → lemonade_python_sdk-1.0.3}/lemonade_python_sdk.egg-info/requires.txt +0 -0
  19. {lemonade_python_sdk-1.0.2 → lemonade_python_sdk-1.0.3}/lemonade_python_sdk.egg-info/top_level.txt +0 -0
  20. {lemonade_python_sdk-1.0.2 → lemonade_python_sdk-1.0.3}/lemonade_sdk/model_discovery.py +0 -0
  21. {lemonade_python_sdk-1.0.2 → lemonade_python_sdk-1.0.3}/lemonade_sdk/port_scanner.py +0 -0
  22. {lemonade_python_sdk-1.0.2 → lemonade_python_sdk-1.0.3}/lemonade_sdk/utils.py +0 -0
  23. {lemonade_python_sdk-1.0.2 → lemonade_python_sdk-1.0.3}/setup.cfg +0 -0
@@ -0,0 +1,368 @@
1
+ Metadata-Version: 2.4
2
+ Name: lemonade-python-sdk
3
+ Version: 1.0.3
4
+ Summary: A clean interface for interacting with the Lemonade LLM server
5
+ Home-page: https://github.com/Tetramatrix/lemonade-python-sdk
6
+ Author: Your Name
7
+ Author-email: your.email@example.com
8
+ Project-URL: Bug Reports, https://github.com/Tetramatrix/lemonade-python-sdk/issues
9
+ Project-URL: Source, https://github.com/Tetramatrix/lemonade-python-sdk
10
+ Keywords: llm,ai,lemonade,sdk,api
11
+ Classifier: Development Status :: 4 - Beta
12
+ Classifier: Intended Audience :: Developers
13
+ Classifier: License :: OSI Approved :: MIT License
14
+ Classifier: Operating System :: OS Independent
15
+ Classifier: Programming Language :: Python :: 3
16
+ Classifier: Programming Language :: Python :: 3.8
17
+ Classifier: Programming Language :: Python :: 3.9
18
+ Classifier: Programming Language :: Python :: 3.10
19
+ Classifier: Programming Language :: Python :: 3.11
20
+ Requires-Python: >=3.8
21
+ Description-Content-Type: text/markdown
22
+ License-File: LICENSE
23
+ Requires-Dist: requests>=2.25.0
24
+ Provides-Extra: dev
25
+ Requires-Dist: pytest>=6.0; extra == "dev"
26
+ Requires-Dist: pytest-cov; extra == "dev"
27
+ Requires-Dist: flake8; extra == "dev"
28
+ Dynamic: author
29
+ Dynamic: author-email
30
+ Dynamic: classifier
31
+ Dynamic: description
32
+ Dynamic: description-content-type
33
+ Dynamic: home-page
34
+ Dynamic: keywords
35
+ Dynamic: license-file
36
+ Dynamic: project-url
37
+ Dynamic: provides-extra
38
+ Dynamic: requires-dist
39
+ Dynamic: requires-python
40
+ Dynamic: summary
41
+
42
+ # 🍋 Lemonade Python SDK
43
+
44
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
45
+ [![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
46
+
47
+ A robust, production-grade Python wrapper for the **Lemonade C++ Backend**.
48
+
49
+ This SDK provides a clean, pythonic interface for interacting with local LLMs running on Lemonade. It was built to power **Sorana** (a visual workspace for AI), extracting the core integration logic into a standalone, open-source library for the developer community.
50
+
51
+ ## 🚀 Key Features
52
+
53
+ * **Auto-Discovery:** Automatically scans multiple ports and hosts to find active Lemonade instances.
54
+ * **Low-Overhead Architecture:** Designed as a thin, efficient wrapper to leverage Lemonade's C++ performance with minimal Python latency.
55
+ * **Health Checks & Recovery:** Built-in utilities to verify server status and handle connection drops.
56
+ * **Type-Safe Client:** Full Python type hinting for better developer experience (IDE autocompletion).
57
+ * **Model Management:** Simple API to load, unload, and list models dynamically.
58
+ * **Embeddings API:** Generate text embeddings for semantic search, RAG, and clustering (FLM & llamacpp backends).
59
+ * **Audio API:** Whisper speech-to-text and Kokoro text-to-speech.
60
+ * **Reranking API:** Reorder documents by relevance for better RAG results.
61
+ * **Image Generation:** Create images from text prompts using Stable Diffusion.
62
+ * **WebSocket Streaming:** Real-time audio transcription with VAD.
63
+
64
+ ## 📦 Installation
65
+
66
+ ```bash
67
+ pip install .
68
+ ```
69
+
70
+ Alternatively, you can install it directly from GitHub:
71
+
72
+ ```bash
73
+ pip install git+https://github.com/Tetramatrix/lemonade-python-sdk.git
74
+ ```
75
+
76
+ ## ⚡ Quick Start
77
+
78
+ ### 1. Connecting to Lemonade
79
+
80
+ The SDK automatically handles port discovery, so you don't need to hardcode localhost:8000.
81
+
82
+ ```python
83
+ from lemonade_sdk import LemonadeClient, find_available_lemonade_port
84
+
85
+ # Auto-discover running instance
86
+ port = find_available_lemonade_port()
87
+ if port:
88
+ client = LemonadeClient(base_url=f"http://localhost:{port}")
89
+ if client.health_check():
90
+ print(f"Connected to Lemonade on port {port}")
91
+ else:
92
+ print("No Lemonade instance found.")
93
+ ```
94
+
95
+ ### 2. Chat Completion
96
+
97
+ ```python
98
+ response = client.chat_completion(
99
+ model="Llama-3-8B-Instruct",
100
+ messages=[
101
+ {"role": "system", "content": "You are a helpful coding assistant."},
102
+ {"role": "user", "content": "Write a Hello World in C++"}
103
+ ],
104
+ temperature=0.7
105
+ )
106
+
107
+ print(response['choices'][0]['message']['content'])
108
+ ```
109
+
110
+ ### 3. Model Management
111
+
112
+ ```python
113
+ # List all available models
114
+ models = client.list_models()
115
+ for m in models:
116
+ print(f"Found model: {m['id']}")
117
+
118
+ # Load a specific model into memory
119
+ client.load_model("Mistral-7B-v0.1")
120
+ ```
121
+
122
+ ### 4. Embeddings (NEW)
123
+
124
+ Generate text embeddings for semantic search, RAG pipelines, and clustering.
125
+
126
+ ```python
127
+ # List available embedding models (filtered by 'embeddings' label)
128
+ embedding_models = client.list_embedding_models()
129
+ for model in embedding_models:
130
+ print(f"Embedding model: {model['id']}")
131
+
132
+ # Generate embeddings for single text
133
+ response = client.embeddings(
134
+ input="Hello, world!",
135
+ model="nomic-embed-text-v1-GGUF"
136
+ )
137
+
138
+ embedding_vector = response["data"][0]["embedding"]
139
+ print(f"Vector length: {len(embedding_vector)}")
140
+
141
+ # Generate embeddings for multiple texts
142
+ texts = ["Text 1", "Text 2", "Text 3"]
143
+ response = client.embeddings(
144
+ input=texts,
145
+ model="nomic-embed-text-v1-GGUF"
146
+ )
147
+
148
+ for item in response["data"]:
149
+ print(f"Text {item['index']}: {len(item['embedding'])} dimensions")
150
+ ```
151
+
152
+ **Supported Backends: (Lemonade)**
153
+ - ✅ **FLM (FastFlowLM)** - NPU-accelerated on Windows
154
+ - ✅ **llamacpp** (.GGUF models) - CPU/GPU
155
+ - ❌ ONNX/OGA - Not supported
156
+
157
+ ### 5. Audio Transcription (Whisper) - NEW
158
+
159
+ Transcribe audio files to text using Whisper.
160
+
161
+ ```python
162
+ # List available audio models (Whisper + Kokoro)
163
+ audio_models = client.list_audio_models()
164
+ for model in audio_models:
165
+ print(f"Audio model: {model['id']}")
166
+
167
+ # Transcribe an audio file
168
+ result = client.transcribe_audio(
169
+ file_path="meeting.wav",
170
+ model="Whisper-Tiny",
171
+ language="en", # Optional: None for auto-detection
172
+ response_format="json" # Options: "json", "text", "verbose_json"
173
+ )
174
+
175
+ if "error" not in result:
176
+ print(f"Transcription: {result['text']}")
177
+ # Verbose format also includes: duration, language, segments
178
+ ```
179
+
180
+ **Supported Models:**
181
+ - `Whisper-Tiny` (~39M parameters)
182
+ - `Whisper-Base` (~74M parameters)
183
+ - `Whisper-Small` (~244M parameters)
184
+
185
+ **Supported Formats:** WAV, MP3, FLAC, OGG, WebM
186
+
187
+ **Backend:** whisper.cpp (NPU-accelerated on Windows)
188
+
189
+ ### 6. Text-to-Speech (Kokoro) - NEW
190
+
191
+ Generate speech from text using Kokoro TTS.
192
+
193
+ ```python
194
+ # Generate speech and save to file
195
+ client.text_to_speech(
196
+ input_text="Hello, Lemonade can now speak!",
197
+ model="kokoro-v1",
198
+ voice="shimmer", # Options: shimmer, corey, af_bella, am_adam, etc.
199
+ speed=1.0, # 0.5 - 2.0
200
+ response_format="mp3", # Options: mp3, wav, opus, pcm, aac, flac
201
+ output_file="speech.mp3" # Saves directly to file
202
+ )
203
+
204
+ # Or get audio bytes directly
205
+ audio_bytes = client.text_to_speech(
206
+ input_text="Short test!",
207
+ model="kokoro-v1",
208
+ voice="corey",
209
+ response_format="mp3"
210
+ )
211
+
212
+ with open("speech.mp3", "wb") as f:
213
+ f.write(audio_bytes)
214
+ ```
215
+
216
+ **Supported Models:**
217
+ - `kokoro-v1` (~82M parameters)
218
+
219
+ **Available Voices:**
220
+
221
+ | Voice ID | Language | Gender |
222
+ |----------|----------|--------|
223
+ | `shimmer` | EN | Female |
224
+ | `corey` | EN | Male |
225
+ | `af_bella`, `af_nicole` | EN-US | Female |
226
+ | `am_adam`, `am_michael` | EN-US | Male |
227
+ | `bf_emma`, `bf_isabella` | EN-GB | Female |
228
+ | `bm_george`, `bm_lewis` | EN-GB | Male |
229
+
230
+ **Audio Formats:** MP3, WAV, OPUS, PCM, AAC, FLAC
231
+
232
+ **Backend:** Kokoros (.onnx, CPU)
233
+
234
+ ### 7. Reranking (NEW)
235
+
236
+ Rerank documents based on relevance to a query.
237
+
238
+ ```python
239
+ result = client.rerank(
240
+ query="What is the capital of France?",
241
+ documents=[
242
+ "Berlin is the capital of Germany.",
243
+ "Paris is the capital of France.",
244
+ "London is the capital of the UK."
245
+ ],
246
+ model="bge-reranker-v2-m3-GGUF"
247
+ )
248
+
249
+ # Results sorted by relevance score
250
+ for r in result["results"]:
251
+ print(f"Rank {r['index']}: Score={r['relevance_score']:.2f}")
252
+ ```
253
+
254
+ **Supported Models:**
255
+ - `bge-reranker-v2-m3-GGUF`
256
+ - Other BGE reranker models
257
+
258
+ **Backend:** llamacpp (.GGUF only, not available for FLM or OGA)
259
+
260
+ ### 8. Image Generation (NEW)
261
+
262
+ Generate images from text prompts using Stable Diffusion.
263
+
264
+ ```python
265
+ # Generate and save to file
266
+ client.generate_image(
267
+ prompt="A sunset over mountains with lake reflection",
268
+ model="SD-Turbo",
269
+ size="512x512",
270
+ steps=4, # SD-Turbo needs only 4 steps
271
+ cfg_scale=1.0,
272
+ output_file="sunset.png"
273
+ )
274
+
275
+ # Or get image bytes
276
+ image_bytes = client.generate_image(
277
+ prompt="A cute cat",
278
+ model="SD-Turbo"
279
+ )
280
+ ```
281
+
282
+ **Supported Models:**
283
+ - `SD-Turbo` (fast, 4 steps)
284
+ - `SDXL-Turbo` (fast, 4 steps)
285
+ - `SD-1.5` (standard, 20 steps)
286
+ - `SDXL-Base-1.0` (high quality, 20 steps)
287
+
288
+ **Image Sizes:** 512x512, 1024x1024, or custom
289
+
290
+ **Backend:** stable-diffusion.cpp
291
+
292
+ ### 9. WebSocket Streaming (NEW)
293
+
294
+ Real-time audio transcription with Voice Activity Detection (VAD).
295
+
296
+ ```python
297
+ from lemonade_sdk import WhisperWebSocketClient
298
+
299
+ # Create streaming client
300
+ stream = client.create_whisper_stream(model="Whisper-Tiny")
301
+ stream.connect()
302
+
303
+ # Set callback for transcriptions
304
+ def on_transcript(text):
305
+ print(f"Heard: '{text}'")
306
+
307
+ stream.on_transcription(on_transcript)
308
+
309
+ # Stream audio file (PCM16, 16kHz, mono)
310
+ for text in stream.stream("audio.pcm"):
311
+ pass # Callback handles output
312
+
313
+ # Or stream from microphone (requires pyaudio)
314
+ # for text in stream.stream_microphone():
315
+ # print(f"Heard: {text}")
316
+
317
+ stream.disconnect()
318
+ ```
319
+
320
+ **Audio Format:** 16kHz, mono, PCM16 (16-bit)
321
+
322
+ **Features:**
323
+ - Voice Activity Detection (VAD)
324
+ - Real-time streaming
325
+ - Microphone support (with pyaudio)
326
+ - Configurable sensitivity
327
+
328
+ **Backend:** whisper.cpp (NPU-accelerated on Windows)
329
+
330
+ ## 📚 Documentation
331
+
332
+ * **[Embeddings API](docs/embeddings_api.md)** - Complete guide for using embeddings
333
+ * **[Audio API](docs/audio_api.md)** - Whisper transcription and Kokoro TTS (documentation)
334
+ * **[Implementation Plan](docs/AUDIO_IMPLEMENTATION.md)** - Audio API implementation roadmap
335
+ * [Lemonade Server Docs](https://lemonade-server.ai/docs/server/server_spec/) - Official Lemonade documentation
336
+
337
+ ### 🖼️ Production Showcase:
338
+
339
+ This SDK powers **3 real-world production applications**:
340
+
341
+ [Sorana](https://tetramatrix.github.io/Sorana/) — AI Visual Workspace
342
+ * SDK drives semantic AI grouping of files and folders onto a spatial 2D canvas
343
+ * SDK handles auto-discovery and connection to local Lemonade instances (zero config)
344
+
345
+ [Aicono](https://tetramatrix.github.io/Aicono/) — AI Desktop Icon Organizer *(Featured in [CHIP Magazine](https://www.chip.de/downloads/Aicono_186527264.html) 🇩🇪)*
346
+ * SDK drives AI inference for grouping and categorizing desktop icons
347
+ * Reached millions of readers via [CHIP](https://www.chip.de/downloads/Aicono_186527264.html), one of Germany's largest IT publications
348
+
349
+ [TabNeuron](https://tetramatrix.github.io/TabNeuron/) — AI-Powered Tab Organizer
350
+ * SDK enables local AI inference for grouping and categorizing browser tabs
351
+ * Desktop companion app + browser extension, demonstrating SDK viability in lightweight client architectures
352
+
353
+ ## 🛠️ Project Structure
354
+
355
+ * **client.py:** Main entry point for API interactions (chat, embeddings, audio, reranking, images, model management).
356
+ * **port_scanner.py:** Utilities for detecting Lemonade instances across ports (8000-9000).
357
+ * **model_discovery.py:** Logic for fetching and parsing model metadata.
358
+ * **request_builder.py:** Helper functions to construct compliant payloads (chat, embeddings, audio, reranking, images).
359
+ * **audio_stream.py:** WebSocket client for real-time audio transcription with VAD.
360
+ * **utils.py:** Additional utility functions.
361
+
362
+ ## 🤝 Contributing
363
+
364
+ Contributions are welcome! This project is intended to help the AMD Ryzen AI and Lemonade community build downstream applications faster.
365
+
366
+ ## 📄 License
367
+
368
+ This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
@@ -0,0 +1,327 @@
1
+ # 🍋 Lemonade Python SDK
2
+
3
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
4
+ [![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
5
+
6
+ A robust, production-grade Python wrapper for the **Lemonade C++ Backend**.
7
+
8
+ This SDK provides a clean, pythonic interface for interacting with local LLMs running on Lemonade. It was built to power **Sorana** (a visual workspace for AI), extracting the core integration logic into a standalone, open-source library for the developer community.
9
+
10
+ ## 🚀 Key Features
11
+
12
+ * **Auto-Discovery:** Automatically scans multiple ports and hosts to find active Lemonade instances.
13
+ * **Low-Overhead Architecture:** Designed as a thin, efficient wrapper to leverage Lemonade's C++ performance with minimal Python latency.
14
+ * **Health Checks & Recovery:** Built-in utilities to verify server status and handle connection drops.
15
+ * **Type-Safe Client:** Full Python type hinting for better developer experience (IDE autocompletion).
16
+ * **Model Management:** Simple API to load, unload, and list models dynamically.
17
+ * **Embeddings API:** Generate text embeddings for semantic search, RAG, and clustering (FLM & llamacpp backends).
18
+ * **Audio API:** Whisper speech-to-text and Kokoro text-to-speech.
19
+ * **Reranking API:** Reorder documents by relevance for better RAG results.
20
+ * **Image Generation:** Create images from text prompts using Stable Diffusion.
21
+ * **WebSocket Streaming:** Real-time audio transcription with VAD.
22
+
23
+ ## 📦 Installation
24
+
25
+ ```bash
26
+ pip install .
27
+ ```
28
+
29
+ Alternatively, you can install it directly from GitHub:
30
+
31
+ ```bash
32
+ pip install git+https://github.com/Tetramatrix/lemonade-python-sdk.git
33
+ ```
34
+
35
+ ## ⚡ Quick Start
36
+
37
+ ### 1. Connecting to Lemonade
38
+
39
+ The SDK automatically handles port discovery, so you don't need to hardcode localhost:8000.
40
+
41
+ ```python
42
+ from lemonade_sdk import LemonadeClient, find_available_lemonade_port
43
+
44
+ # Auto-discover running instance
45
+ port = find_available_lemonade_port()
46
+ if port:
47
+ client = LemonadeClient(base_url=f"http://localhost:{port}")
48
+ if client.health_check():
49
+ print(f"Connected to Lemonade on port {port}")
50
+ else:
51
+ print("No Lemonade instance found.")
52
+ ```
53
+
54
+ ### 2. Chat Completion
55
+
56
+ ```python
57
+ response = client.chat_completion(
58
+ model="Llama-3-8B-Instruct",
59
+ messages=[
60
+ {"role": "system", "content": "You are a helpful coding assistant."},
61
+ {"role": "user", "content": "Write a Hello World in C++"}
62
+ ],
63
+ temperature=0.7
64
+ )
65
+
66
+ print(response['choices'][0]['message']['content'])
67
+ ```
68
+
69
+ ### 3. Model Management
70
+
71
+ ```python
72
+ # List all available models
73
+ models = client.list_models()
74
+ for m in models:
75
+ print(f"Found model: {m['id']}")
76
+
77
+ # Load a specific model into memory
78
+ client.load_model("Mistral-7B-v0.1")
79
+ ```
80
+
81
+ ### 4. Embeddings (NEW)
82
+
83
+ Generate text embeddings for semantic search, RAG pipelines, and clustering.
84
+
85
+ ```python
86
+ # List available embedding models (filtered by 'embeddings' label)
87
+ embedding_models = client.list_embedding_models()
88
+ for model in embedding_models:
89
+ print(f"Embedding model: {model['id']}")
90
+
91
+ # Generate embeddings for single text
92
+ response = client.embeddings(
93
+ input="Hello, world!",
94
+ model="nomic-embed-text-v1-GGUF"
95
+ )
96
+
97
+ embedding_vector = response["data"][0]["embedding"]
98
+ print(f"Vector length: {len(embedding_vector)}")
99
+
100
+ # Generate embeddings for multiple texts
101
+ texts = ["Text 1", "Text 2", "Text 3"]
102
+ response = client.embeddings(
103
+ input=texts,
104
+ model="nomic-embed-text-v1-GGUF"
105
+ )
106
+
107
+ for item in response["data"]:
108
+ print(f"Text {item['index']}: {len(item['embedding'])} dimensions")
109
+ ```
110
+
111
+ **Supported Backends: (Lemonade)**
112
+ - ✅ **FLM (FastFlowLM)** - NPU-accelerated on Windows
113
+ - ✅ **llamacpp** (.GGUF models) - CPU/GPU
114
+ - ❌ ONNX/OGA - Not supported
115
+
116
+ ### 5. Audio Transcription (Whisper) - NEW
117
+
118
+ Transcribe audio files to text using Whisper.
119
+
120
+ ```python
121
+ # List available audio models (Whisper + Kokoro)
122
+ audio_models = client.list_audio_models()
123
+ for model in audio_models:
124
+ print(f"Audio model: {model['id']}")
125
+
126
+ # Transcribe an audio file
127
+ result = client.transcribe_audio(
128
+ file_path="meeting.wav",
129
+ model="Whisper-Tiny",
130
+ language="en", # Optional: None for auto-detection
131
+ response_format="json" # Options: "json", "text", "verbose_json"
132
+ )
133
+
134
+ if "error" not in result:
135
+ print(f"Transcription: {result['text']}")
136
+ # Verbose format also includes: duration, language, segments
137
+ ```
138
+
139
+ **Supported Models:**
140
+ - `Whisper-Tiny` (~39M parameters)
141
+ - `Whisper-Base` (~74M parameters)
142
+ - `Whisper-Small` (~244M parameters)
143
+
144
+ **Supported Formats:** WAV, MP3, FLAC, OGG, WebM
145
+
146
+ **Backend:** whisper.cpp (NPU-accelerated on Windows)
147
+
148
+ ### 6. Text-to-Speech (Kokoro) - NEW
149
+
150
+ Generate speech from text using Kokoro TTS.
151
+
152
+ ```python
153
+ # Generate speech and save to file
154
+ client.text_to_speech(
155
+ input_text="Hello, Lemonade can now speak!",
156
+ model="kokoro-v1",
157
+ voice="shimmer", # Options: shimmer, corey, af_bella, am_adam, etc.
158
+ speed=1.0, # 0.5 - 2.0
159
+ response_format="mp3", # Options: mp3, wav, opus, pcm, aac, flac
160
+ output_file="speech.mp3" # Saves directly to file
161
+ )
162
+
163
+ # Or get audio bytes directly
164
+ audio_bytes = client.text_to_speech(
165
+ input_text="Short test!",
166
+ model="kokoro-v1",
167
+ voice="corey",
168
+ response_format="mp3"
169
+ )
170
+
171
+ with open("speech.mp3", "wb") as f:
172
+ f.write(audio_bytes)
173
+ ```
174
+
175
+ **Supported Models:**
176
+ - `kokoro-v1` (~82M parameters)
177
+
178
+ **Available Voices:**
179
+
180
+ | Voice ID | Language | Gender |
181
+ |----------|----------|--------|
182
+ | `shimmer` | EN | Female |
183
+ | `corey` | EN | Male |
184
+ | `af_bella`, `af_nicole` | EN-US | Female |
185
+ | `am_adam`, `am_michael` | EN-US | Male |
186
+ | `bf_emma`, `bf_isabella` | EN-GB | Female |
187
+ | `bm_george`, `bm_lewis` | EN-GB | Male |
188
+
189
+ **Audio Formats:** MP3, WAV, OPUS, PCM, AAC, FLAC
190
+
191
+ **Backend:** Kokoros (.onnx, CPU)
192
+
193
+ ### 7. Reranking (NEW)
194
+
195
+ Rerank documents based on relevance to a query.
196
+
197
+ ```python
198
+ result = client.rerank(
199
+ query="What is the capital of France?",
200
+ documents=[
201
+ "Berlin is the capital of Germany.",
202
+ "Paris is the capital of France.",
203
+ "London is the capital of the UK."
204
+ ],
205
+ model="bge-reranker-v2-m3-GGUF"
206
+ )
207
+
208
+ # Results sorted by relevance score
209
+ for r in result["results"]:
210
+ print(f"Rank {r['index']}: Score={r['relevance_score']:.2f}")
211
+ ```
212
+
213
+ **Supported Models:**
214
+ - `bge-reranker-v2-m3-GGUF`
215
+ - Other BGE reranker models
216
+
217
+ **Backend:** llamacpp (.GGUF only, not available for FLM or OGA)
218
+
219
+ ### 8. Image Generation (NEW)
220
+
221
+ Generate images from text prompts using Stable Diffusion.
222
+
223
+ ```python
224
+ # Generate and save to file
225
+ client.generate_image(
226
+ prompt="A sunset over mountains with lake reflection",
227
+ model="SD-Turbo",
228
+ size="512x512",
229
+ steps=4, # SD-Turbo needs only 4 steps
230
+ cfg_scale=1.0,
231
+ output_file="sunset.png"
232
+ )
233
+
234
+ # Or get image bytes
235
+ image_bytes = client.generate_image(
236
+ prompt="A cute cat",
237
+ model="SD-Turbo"
238
+ )
239
+ ```
240
+
241
+ **Supported Models:**
242
+ - `SD-Turbo` (fast, 4 steps)
243
+ - `SDXL-Turbo` (fast, 4 steps)
244
+ - `SD-1.5` (standard, 20 steps)
245
+ - `SDXL-Base-1.0` (high quality, 20 steps)
246
+
247
+ **Image Sizes:** 512x512, 1024x1024, or custom
248
+
249
+ **Backend:** stable-diffusion.cpp
250
+
251
+ ### 9. WebSocket Streaming (NEW)
252
+
253
+ Real-time audio transcription with Voice Activity Detection (VAD).
254
+
255
+ ```python
256
+ from lemonade_sdk import WhisperWebSocketClient
257
+
258
+ # Create streaming client
259
+ stream = client.create_whisper_stream(model="Whisper-Tiny")
260
+ stream.connect()
261
+
262
+ # Set callback for transcriptions
263
+ def on_transcript(text):
264
+ print(f"Heard: '{text}'")
265
+
266
+ stream.on_transcription(on_transcript)
267
+
268
+ # Stream audio file (PCM16, 16kHz, mono)
269
+ for text in stream.stream("audio.pcm"):
270
+ pass # Callback handles output
271
+
272
+ # Or stream from microphone (requires pyaudio)
273
+ # for text in stream.stream_microphone():
274
+ # print(f"Heard: {text}")
275
+
276
+ stream.disconnect()
277
+ ```
278
+
279
+ **Audio Format:** 16kHz, mono, PCM16 (16-bit)
280
+
281
+ **Features:**
282
+ - Voice Activity Detection (VAD)
283
+ - Real-time streaming
284
+ - Microphone support (with pyaudio)
285
+ - Configurable sensitivity
286
+
287
+ **Backend:** whisper.cpp (NPU-accelerated on Windows)
288
+
289
+ ## 📚 Documentation
290
+
291
+ * **[Embeddings API](docs/embeddings_api.md)** - Complete guide for using embeddings
292
+ * **[Audio API](docs/audio_api.md)** - Whisper transcription and Kokoro TTS (documentation)
293
+ * **[Implementation Plan](docs/AUDIO_IMPLEMENTATION.md)** - Audio API implementation roadmap
294
+ * [Lemonade Server Docs](https://lemonade-server.ai/docs/server/server_spec/) - Official Lemonade documentation
295
+
296
+ ### 🖼️ Production Showcase:
297
+
298
+ This SDK powers **3 real-world production applications**:
299
+
300
+ [Sorana](https://tetramatrix.github.io/Sorana/) — AI Visual Workspace
301
+ * SDK drives semantic AI grouping of files and folders onto a spatial 2D canvas
302
+ * SDK handles auto-discovery and connection to local Lemonade instances (zero config)
303
+
304
+ [Aicono](https://tetramatrix.github.io/Aicono/) — AI Desktop Icon Organizer *(Featured in [CHIP Magazine](https://www.chip.de/downloads/Aicono_186527264.html) 🇩🇪)*
305
+ * SDK drives AI inference for grouping and categorizing desktop icons
306
+ * Reached millions of readers via [CHIP](https://www.chip.de/downloads/Aicono_186527264.html), one of Germany's largest IT publications
307
+
308
+ [TabNeuron](https://tetramatrix.github.io/TabNeuron/) — AI-Powered Tab Organizer
309
+ * SDK enables local AI inference for grouping and categorizing browser tabs
310
+ * Desktop companion app + browser extension, demonstrating SDK viability in lightweight client architectures
311
+
312
+ ## 🛠️ Project Structure
313
+
314
+ * **client.py:** Main entry point for API interactions (chat, embeddings, audio, reranking, images, model management).
315
+ * **port_scanner.py:** Utilities for detecting Lemonade instances across ports (8000-9000).
316
+ * **model_discovery.py:** Logic for fetching and parsing model metadata.
317
+ * **request_builder.py:** Helper functions to construct compliant payloads (chat, embeddings, audio, reranking, images).
318
+ * **audio_stream.py:** WebSocket client for real-time audio transcription with VAD.
319
+ * **utils.py:** Additional utility functions.
320
+
321
+ ## 🤝 Contributing
322
+
323
+ Contributions are welcome! This project is intended to help the AMD Ryzen AI and Lemonade community build downstream applications faster.
324
+
325
+ ## 📄 License
326
+
327
+ This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.