deepslate-pipecat 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,25 @@
1
+ # Python-generated files
2
+ __pycache__/
3
+ *.py[oc]
4
+ build/
5
+ dist/
6
+ wheels/
7
+ *.egg-info
8
+
9
+ # NodeJS stuff
10
+ node_modules/
11
+
12
+ # Virtual environments
13
+ .venv
14
+
15
+ # IDE files
16
+ .idea/
17
+
18
+ # Development files
19
+ .env.local
20
+ .no-update
21
+
22
+ # Local settings
23
+ .claude/
24
+ CLAUDE.local.md
25
+
@@ -0,0 +1,550 @@
1
+ Metadata-Version: 2.4
2
+ Name: deepslate-pipecat
3
+ Version: 0.1.0
4
+ Summary: Pipecat plugin for deepslate.eu
5
+ Project-URL: Documentation, https://docs.deepslate.eu/
6
+ Project-URL: Website, https://deepslate.eu/
7
+ Project-URL: Source, https://github.com/deepslate-labs/deepslate-sdks
8
+ Keywords: ai,audio,deepslate,pipecat,realtime,voice
9
+ Classifier: Intended Audience :: Developers
10
+ Classifier: License :: OSI Approved :: Apache Software License
11
+ Classifier: Programming Language :: Python :: 3
12
+ Classifier: Programming Language :: Python :: 3 :: Only
13
+ Classifier: Programming Language :: Python :: 3.11
14
+ Classifier: Topic :: Multimedia :: Sound/Audio
15
+ Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
16
+ Requires-Python: >=3.11
17
+ Requires-Dist: deepslate-core==0.1.0
18
+ Requires-Dist: loguru>=0.7.2
19
+ Requires-Dist: pipecat-ai>=0.0.40
20
+ Requires-Dist: websockets>=16.0
21
+ Description-Content-Type: text/markdown
22
+
23
+ # deepslate-pipecat
24
+
25
+ [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
26
+ [![Documentation](https://img.shields.io/badge/docs-deepslate.eu-green)](https://docs.deepslate.eu/)
27
+ [![Python](https://img.shields.io/badge/python-3.11+-blue.svg)](https://www.python.org/downloads/)
28
+
29
+ Pipecat plugin for [Deepslate's](https://deepslate.eu/) realtime speech-to-speech AI API.
30
+
31
+ `deepslate-pipecat` provides a `DeepslateRealtimeLLMService` implementation for the [Pipecat](https://github.com/pipecat-ai/pipecat) framework, connecting your Pipecat pipelines to Deepslate's unified voice AI infrastructure. The plugin handles bidirectional audio streaming, frame translation, WebSocket connection management, server-side VAD, and optional ElevenLabs TTS — all transparently, through a Pipecat-native interface.
32
+
33
+ ---
34
+
35
+ ## Features
36
+
37
+ - **Realtime Audio Streaming** — Low-latency bidirectional PCM audio over WebSockets
38
+ - **Server-side VAD** — Voice Activity Detection handled by Deepslate with configurable sensitivity
39
+ - **Function Calling** — Full tool/function calling support via Pipecat's `register_function` API
40
+ - **Flexible TTS** — Choose server-side ElevenLabs TTS (via Deepslate) or any downstream Pipecat TTS service
41
+ - **Automatic Interruption Handling** — Native support for interruptions with buffer clearing
42
+ - **Dynamic Context Injection** — Append user or system messages to an active session mid-conversation via `LLMMessagesAppendFrame`
43
+ - **Frame-based Architecture** — Seamless integration with Pipecat's pipeline model
44
+ - **Dynamic Audio Configuration** — Automatically adapts to audio format changes at runtime
45
+
46
+ ---
47
+
48
+ ## Installation
49
+
50
+ ```bash
51
+ pip install deepslate-pipecat
52
+ ```
53
+
54
+ ### Requirements
55
+
56
+ - Python 3.11 or higher
57
+
58
+ ### Dependencies (installed automatically)
59
+
60
+ - `deepslate-core` — Shared Deepslate models and base client
61
+ - `pipecat-ai>=0.0.40` — Core Pipecat framework
62
+ - `loguru>=0.7.2` — Structured logging
63
+ - `websockets>=16.0` — WebSocket client
64
+
65
+ ---
66
+
67
+ ## Prerequisites
68
+
69
+ ### Deepslate Account
70
+
71
+ Sign up at [deepslate.eu](https://deepslate.eu) and set the following environment variables:
72
+
73
+ ```bash
74
+ DEEPSLATE_VENDOR_ID=your_vendor_id
75
+ DEEPSLATE_ORGANIZATION_ID=your_organization_id
76
+ DEEPSLATE_API_KEY=your_api_key
77
+ ```
78
+
79
+ ### ElevenLabs TTS (Optional)
80
+
81
+ For server-side TTS with automatic interruption handling:
82
+
83
+ ```bash
84
+ ELEVENLABS_API_KEY=your_elevenlabs_api_key
85
+ ELEVENLABS_VOICE_ID=your_voice_id # e.g., '21m00Tcm4TlvDq8ikWAM' for Rachel
86
+ ELEVENLABS_MODEL_ID=eleven_turbo_v2 # optional
87
+ ```
88
+
89
+ > **Note:** Without `ElevenLabsTtsConfig`, the service emits `TTSTextFrame` objects for downstream Pipecat TTS services (Cartesia, Azure TTS, etc.). Context truncation on interruption requires server-side TTS.
90
+
91
+ ---
92
+
93
+ ## Quick Start
94
+
95
+ A complete voice bot using Daily.co WebRTC transport, ElevenLabs TTS, and function calling:
96
+
97
+ ```python
98
+ import asyncio
99
+ import os
100
+ import random
101
+ import sys
102
+
103
+ import aiohttp
104
+ from dotenv import load_dotenv
105
+ from loguru import logger
106
+
107
+ from pipecat.frames.frames import LLMSetToolsFrame
108
+ from pipecat.pipeline.pipeline import Pipeline
109
+ from pipecat.pipeline.runner import PipelineRunner
110
+ from pipecat.pipeline.task import PipelineParams, PipelineTask
111
+ from pipecat.services.llm_service import FunctionCallParams
112
+ from pipecat.transports.services.daily import DailyParams, DailyTransport
113
+
114
+ from deepslate_pipecat import DeepslateOptions, DeepslateRealtimeLLMService, ElevenLabsLocation, ElevenLabsTtsConfig
115
+
116
+ load_dotenv(override=True)
117
+
118
+ logger.remove()
119
+ logger.add(sys.stderr, level="DEBUG")
120
+
121
+ # Tool definitions (OpenAI function-calling JSON schema format)
122
+ TOOLS = [
123
+ {
124
+ "type": "function",
125
+ "function": {
126
+ "name": "lookup_weather",
127
+ "description": "Get the current weather for a given location.",
128
+ "parameters": {
129
+ "type": "object",
130
+ "properties": {
131
+ "location": {"type": "string", "description": "The city to look up."}
132
+ },
133
+ "required": ["location"],
134
+ },
135
+ },
136
+ },
137
+ {
138
+ "type": "function",
139
+ "function": {
140
+ "name": "get_current_location",
141
+ "description": "Get the user's current location.",
142
+ "parameters": {"type": "object", "properties": {}},
143
+ },
144
+ },
145
+ ]
146
+
147
+
148
+ async def lookup_weather(params: FunctionCallParams):
149
+ result = {
150
+ "location": params.arguments.get("location", "unknown"),
151
+ "temperature_celsius": random.randint(10, 35),
152
+ "precipitation": random.choice(["none", "light", "moderate", "heavy"]),
153
+ "air_pressure_hpa": random.randint(900, 1100),
154
+ }
155
+ await params.result_callback(result)
156
+
157
+
158
+ async def get_current_location(params: FunctionCallParams):
159
+ await params.result_callback({"location": "Berlin"})
160
+
161
+
162
+ async def main():
163
+ daily_api_key = os.getenv("DAILY_API_KEY")
164
+ daily_room_url = os.getenv("DAILY_ROOM_URL")
165
+
166
+ async with aiohttp.ClientSession() as session:
167
+ headers = {"Authorization": f"Bearer {daily_api_key}"}
168
+ room_name = daily_room_url.split("/")[-1]
169
+ async with session.post(
170
+ "https://api.daily.co/v1/meeting-tokens",
171
+ headers=headers,
172
+ json={"properties": {"room_name": room_name}},
173
+ ) as r:
174
+ token = (await r.json())["token"]
175
+
176
+ transport = DailyTransport(
177
+ room_url=daily_room_url,
178
+ token=token,
179
+ bot_name="Deepslate Bot",
180
+ params=DailyParams(
181
+ audio_in_enabled=True,
182
+ audio_out_enabled=True,
183
+ camera_out_enabled=False,
184
+ vad_enabled=False, # Deepslate handles VAD server-side
185
+ ),
186
+ )
187
+
188
+ opts = DeepslateOptions.from_env(
189
+ system_prompt="You are a friendly and helpful AI assistant. Keep your answers concise."
190
+ )
191
+ tts = ElevenLabsTtsConfig.from_env()
192
+ llm = DeepslateRealtimeLLMService(options=opts, tts_config=tts)
193
+
194
+ llm.register_function("lookup_weather", lookup_weather)
195
+ llm.register_function("get_current_location", get_current_location)
196
+
197
+ pipeline = Pipeline([transport.input(), llm, transport.output()])
198
+ task = PipelineTask(pipeline, params=PipelineParams(allow_interruptions=True))
199
+
200
+ await task.queue_frame(LLMSetToolsFrame(tools=TOOLS))
201
+
202
+ @transport.event_handler("on_first_participant_joined")
203
+ async def on_first_participant_joined(transport, participant):
204
+ logger.info(f"Participant {participant['id']} joined.")
205
+
206
+ @transport.event_handler("on_participant_left")
207
+ async def on_participant_left(transport, participant, reason):
208
+ await task.cancel()
209
+
210
+ runner = PipelineRunner()
211
+ await runner.run(task)
212
+
213
+
214
+ if __name__ == "__main__":
215
+ asyncio.run(main())
216
+ ```
217
+
218
+ ---
219
+
220
+ ## Configuration
221
+
222
+ ### `DeepslateOptions`
223
+
224
+ | Parameter | Type | Default | Description |
225
+ |-------------------|-----------------|----------------------------------|----------------------------------------------------------------|
226
+ | `vendor_id` | `str` | env: `DEEPSLATE_VENDOR_ID` | Deepslate vendor ID |
227
+ | `organization_id` | `str` | env: `DEEPSLATE_ORGANIZATION_ID` | Deepslate organization ID |
228
+ | `api_key` | `str` | env: `DEEPSLATE_API_KEY` | Deepslate API key |
229
+ | `base_url` | `str` | `"https://app.deepslate.eu"` | Base URL for Deepslate API |
230
+ | `system_prompt` | `str` | `"You are a helpful assistant."` | System prompt for the AI assistant |
231
+ | `ws_url` | `Optional[str]` | `None` | Direct WebSocket URL (overrides `base_url`; for local dev/testing) |
232
+ | `max_retries` | `int` | `3` | Maximum reconnection attempts before giving up |
233
+
234
+ Use `DeepslateOptions.from_env()` to load credentials from environment variables:
235
+
236
+ ```python
237
+ from deepslate_pipecat import DeepslateOptions
238
+
239
+ opts = DeepslateOptions.from_env(
240
+ system_prompt="You are a customer service agent. Be professional and helpful.",
241
+ max_retries=5,
242
+ )
243
+ ```
244
+
245
+ ### VAD Configuration
246
+
247
+ Pass a `VadConfig` (also aliased as `DeepslateVadConfig` for backwards compatibility) to tune server-side voice activity detection:
248
+
249
+ ```python
250
+ from deepslate_pipecat import DeepslateRealtimeLLMService, VadConfig
251
+
252
+ llm = DeepslateRealtimeLLMService(
253
+ options=opts,
254
+ vad_config=VadConfig(
255
+ confidence_threshold=0.3, # Lower = more sensitive
256
+ min_volume=0.005,
257
+ start_duration_ms=100,
258
+ stop_duration_ms=300,
259
+ backbuffer_duration_ms=500,
260
+ ),
261
+ )
262
+ ```
263
+
264
+ | Parameter | Type | Default | Description |
265
+ |--------------------------|---------|---------|-------------------------------------------------------------------|
266
+ | `confidence_threshold` | `float` | `0.5` | Minimum confidence required to classify audio as speech (0.0–1.0) |
267
+ | `min_volume` | `float` | `0.01` | Minimum volume level to classify audio as speech (0.0–1.0) |
268
+ | `start_duration_ms` | `int` | `200` | Duration of speech (ms) required to trigger speech start |
269
+ | `stop_duration_ms` | `int` | `500` | Duration of silence (ms) required to trigger speech end |
270
+ | `backbuffer_duration_ms` | `int` | `1000` | Audio (ms) buffered before speech detection triggers |
271
+
272
+ **Tuning tips:**
273
+ - **Noisy environments:** Increase `confidence_threshold` (0.6–0.8) and `min_volume` (0.02–0.05)
274
+ - **Lower latency:** Decrease `start_duration_ms` (100–150) and `stop_duration_ms` (200–300)
275
+ - **Natural conversations:** Slightly increase `stop_duration_ms` (600–800)
276
+ - **Capture sentence starts:** Increase `backbuffer_duration_ms` (1500–2000)
277
+
278
+ ### `ElevenLabsTtsConfig`
279
+
280
+ | Parameter | Type | Default | Description |
281
+ |------------|----------------------|----------------------------|-----------------------------------------------------------------------|
282
+ | `api_key` | `str` | env: `ELEVENLABS_API_KEY` | ElevenLabs API key |
283
+ | `voice_id` | `str` | env: `ELEVENLABS_VOICE_ID` | Voice ID (e.g., `'21m00Tcm4TlvDq8ikWAM'` for Rachel) |
284
+ | `model_id` | `Optional[str]` | env: `ELEVENLABS_MODEL_ID` | Model ID (e.g., `'eleven_turbo_v2'`); uses ElevenLabs default if unset |
285
+ | `location` | `ElevenLabsLocation` | `ElevenLabsLocation.US` | Regional endpoint: US (all accounts), EU or INDIA (enterprise only) |
286
+
287
+ #### Server-side vs Client-side TTS
288
+
289
+ **Server-side TTS (recommended — best interruption handling):**
290
+
291
+ ```python
292
+ from deepslate_pipecat import DeepslateRealtimeLLMService, ElevenLabsTtsConfig
293
+
294
+ tts_config = ElevenLabsTtsConfig.from_env()
295
+ llm = DeepslateRealtimeLLMService(options=opts, tts_config=tts_config)
296
+
297
+ pipeline = Pipeline([transport.input(), llm, transport.output()])
298
+ ```
299
+
300
+ **Client-side TTS (e.g., Cartesia):**
301
+
302
+ ```python
303
+ from pipecat.services.cartesia import CartesiaTTSService
304
+
305
+ llm = DeepslateRealtimeLLMService(options=opts) # No tts_config — emits TTSTextFrame
306
+ tts = CartesiaTTSService(...)
307
+
308
+ pipeline = Pipeline([transport.input(), llm, tts, transport.output()])
309
+ ```
310
+
311
+ > **Important:** Server-side TTS enables Deepslate to truncate the response context when a user interrupts, ensuring the model stays in sync with what was actually spoken. Client-side TTS does not support this.
312
+
313
+ ---
314
+
315
+ ## Function Calling
316
+
317
+ Define tools as OpenAI-style JSON schemas, register async handlers, and sync the definitions to Deepslate via `LLMSetToolsFrame`:
318
+
319
+ ```python
320
+ from pipecat.frames.frames import LLMSetToolsFrame
321
+ from pipecat.services.llm_service import FunctionCallParams
322
+
323
+ TOOLS = [
324
+ {
325
+ "type": "function",
326
+ "function": {
327
+ "name": "lookup_weather",
328
+ "description": "Get the current weather for a given location.",
329
+ "parameters": {
330
+ "type": "object",
331
+ "properties": {
332
+ "location": {"type": "string", "description": "The city to look up."}
333
+ },
334
+ "required": ["location"],
335
+ },
336
+ },
337
+ },
338
+ ]
339
+
340
+ async def lookup_weather(params: FunctionCallParams):
341
+ await params.result_callback({"temperature_celsius": 22, "condition": "sunny"})
342
+
343
+ llm.register_function("lookup_weather", lookup_weather)
344
+
345
+ # Queue tool definitions — synced to Deepslate after the pipeline starts
346
+ await task.queue_frame(LLMSetToolsFrame(tools=TOOLS))
347
+ ```
348
+
349
+ ---
350
+
351
+ ## Dynamic Context Injection
352
+
353
+ Inject messages into an active session mid-conversation using `LLMMessagesAppendFrame`.
354
+
355
+ | Role | Behaviour | Triggers inference? |
356
+ |-------------|-----------------------------------------------------------------|---------------------|
357
+ | `user` | Appended to conversation history as a silent user input | Only if `run_llm=True` |
358
+ | `system` | Forwarded as `extra_instructions` on the next inference turn | Only if `run_llm=True` |
359
+ | `assistant` | Not supported — logged as a warning | — |
360
+
361
+ > **Note:** `system` instructions via `LLMMessagesAppendFrame` are ephemeral — they affect only the triggered inference turn. To set a persistent system prompt, use `DeepslateOptions.system_prompt`.
362
+
363
+ **Silent context injection:**
364
+
365
+ ```python
366
+ from pipecat.frames.frames import LLMMessagesAppendFrame
367
+
368
+ await task.queue_frame(
369
+ LLMMessagesAppendFrame(
370
+ messages=[{"role": "user", "content": "My name is Alice and I'm from Paris."}],
371
+ run_llm=False,
372
+ )
373
+ )
374
+ ```
375
+
376
+ **Immediate inference with a system instruction:**
377
+
378
+ ```python
379
+ await task.queue_frame(
380
+ LLMMessagesAppendFrame(
381
+ messages=[{
382
+ "role": "system",
383
+ "content": "You are now a professional chef assistant. Greet the user and ask how you can help with their cooking.",
384
+ }],
385
+ run_llm=True,
386
+ )
387
+ )
388
+ ```
389
+
390
+ ---
391
+
392
+ ## Transport Integration
393
+
394
+ ### Daily.co (WebRTC)
395
+
396
+ ```python
397
+ from pipecat.transports.services.daily import DailyTransport, DailyParams
398
+
399
+ transport = DailyTransport(
400
+ room_url=daily_room_url,
401
+ token=token,
402
+ bot_name="My Voice Bot",
403
+ params=DailyParams(
404
+ audio_in_enabled=True,
405
+ audio_out_enabled=True,
406
+ vad_enabled=False, # Deepslate handles VAD
407
+ ),
408
+ )
409
+ ```
410
+
411
+ ### Twilio
412
+
413
+ ```python
414
+ from pipecat.transports.services.twilio import TwilioTransport
415
+
416
+ transport = TwilioTransport(
417
+ account_sid=twilio_account_sid,
418
+ auth_token=twilio_auth_token,
419
+ from_number=twilio_from_number,
420
+ )
421
+ ```
422
+
423
+ ### Generic WebSocket
424
+
425
+ ```python
426
+ from pipecat.transports.network.websocket import WebsocketTransport, WebsocketParams
427
+
428
+ transport = WebsocketTransport(
429
+ host="0.0.0.0",
430
+ port=8765,
431
+ params=WebsocketParams(audio_in_enabled=True, audio_out_enabled=True),
432
+ )
433
+ ```
434
+
435
+ ---
436
+
437
+ ## Frame Reference
438
+
439
+ **Input frames consumed by `DeepslateRealtimeLLMService`:**
440
+
441
+ | Frame | Description |
442
+ |---|---|
443
+ | `AudioRawFrame` | PCM audio from user (forwarded to Deepslate for STT + inference) |
444
+ | `TextFrame` | Text input from user |
445
+ | `FunctionCallResultFrame` | Result of an executed function tool |
446
+ | `LLMMessagesAppendFrame` | Injects user/system messages mid-conversation |
447
+ | `LLMSetToolsFrame` | Updates active tool/function definitions |
448
+ | `StartFrame`, `EndFrame`, `CancelFrame` | Pipeline lifecycle management |
449
+
450
+ **Output frames emitted:**
451
+
452
+ | Frame | Description |
453
+ |---|---|
454
+ | `LLMFullResponseStartFrame` / `LLMFullResponseEndFrame` | Marks the start/end of an AI response |
455
+ | `LLMTextFrame` | Streaming text transcript of the AI response |
456
+ | `OutputAudioRawFrame` | PCM audio output (only with server-side TTS configured) |
457
+ | `InterruptionFrame` | User interrupted — signals buffer clearing |
458
+ | `FunctionCallRequestFrame` | Request to execute a function tool |
459
+ | `ErrorFrame` | An error occurred during processing |
460
+
461
+ ---
462
+
463
+ ## Troubleshooting
464
+
465
+ ### Connection Failures
466
+
467
+ Verify `DEEPSLATE_VENDOR_ID`, `DEEPSLATE_ORGANIZATION_ID`, and `DEEPSLATE_API_KEY` are set. The plugin retries with exponential backoff (2 s → 4 s → 8 s, capped at 30 s). Increase the retry limit if needed:
468
+
469
+ ```python
470
+ opts = DeepslateOptions.from_env(max_retries=5)
471
+ ```
472
+
473
+ ### Audio Issues
474
+
475
+ Deepslate expects signed 16-bit PCM audio. Verify sample rate (common: 16000, 24000, 48000 Hz) and channel count (mono = 1) match between your transport and Deepslate. Enable `DEBUG` logging to inspect detected audio configuration:
476
+
477
+ ```python
478
+ from loguru import logger
479
+ import sys
480
+ logger.remove()
481
+ logger.add(sys.stderr, level="DEBUG")
482
+ ```
483
+
484
+ ### No LLM Response
485
+
486
+ - Check VAD settings — they may be too strict (lower `confidence_threshold` or `min_volume`)
487
+ - Ensure sufficient audio duration is being sent
488
+ - Check for `ErrorFrame` output in the pipeline
489
+
490
+ ### Protobuf Version Conflicts
491
+
492
+ ```bash
493
+ pip install --upgrade "protobuf>=5.26.0"
494
+ ```
495
+
496
+ ---
497
+
498
+ ## Examples
499
+
500
+ The [`examples/`](examples/) directory contains a ready-to-run bot you can use as a starting point.
501
+
502
+ ### `simple_bot.py` — Daily.co voice bot with function calling
503
+
504
+ A fully working Pipecat pipeline that demonstrates:
505
+ - Daily.co WebRTC transport (swap for Twilio, WebSocket, etc.)
506
+ - Server-side ElevenLabs TTS with interruption handling
507
+ - Two example function tools: `lookup_weather` and `get_current_location`
508
+
509
+ ```
510
+ packages/pipecat/examples/
511
+ ├── simple_bot.py # The bot
512
+ └── .env.example # Required environment variables
513
+ ```
514
+
515
+ **Setup:**
516
+
517
+ ```bash
518
+ # 1. Install dependencies
519
+ pip install deepslate-pipecat "pipecat-ai[daily]" aiohttp python-dotenv loguru
520
+
521
+ # 2. Configure credentials
522
+ cd packages/pipecat/examples
523
+ cp .env.example .env
524
+ # Edit .env and fill in your credentials
525
+
526
+ # 3. Run
527
+ python simple_bot.py
528
+ ```
529
+
530
+ ---
531
+
532
+ ## Documentation
533
+
534
+ - [Deepslate Documentation](https://docs.deepslate.eu/)
535
+ - [Pipecat Documentation](https://docs.pipecat.ai/)
536
+ - [API Reference](https://docs.deepslate.eu/api-reference/)
537
+
538
+ ---
539
+
540
+ ## Support
541
+
542
+ - **Issues:** [GitHub Issues](https://github.com/deepslate-labs/deepslate-sdks/issues)
543
+ - **Documentation:** [docs.deepslate.eu](https://docs.deepslate.eu/)
544
+ - **Email:** info@deepslate.eu
545
+
546
+ ---
547
+
548
+ ## License
549
+
550
+ Apache License 2.0 — see [LICENSE](../../LICENSE) for details.