PyPI - livellm - Versions diffs - 1.5.5__tar.gz → 1.7.1__tar.gz - Mend

livellm 1.5.5tar.gz → 1.7.1tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (22) hide show

{livellm-1.5.5 → livellm-1.7.1}/.gitignore RENAMED Viewed

@@ -3,4 +3,5 @@ __pycache__
 .pytest_cache
 .coverage
-test.py
+test.py
+test_*.py

{livellm-1.5.5 → livellm-1.7.1}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: livellm
-Version: 1.5.5
+Version: 1.7.1
 Summary: Python client for the LiveLLM Server
 Project-URL: Homepage, https://github.com/qalby-tech/livellm-client-py
 Project-URL: Repository, https://github.com/qalby-tech/livellm-client-py
@@ -19,10 +19,6 @@ Requires-Dist: httpx>=0.27.0
 Requires-Dist: pydantic>=2.0.0
 Requires-Dist: sounddevice>=0.5.3
 Requires-Dist: websockets>=15.0.1
-Provides-Extra: testing
-Requires-Dist: pytest-asyncio>=0.21.0; extra == 'testing'
-Requires-Dist: pytest-cov>=4.1.0; extra == 'testing'
-Requires-Dist: pytest>=8.4.2; extra == 'testing'
 Description-Content-Type: text/markdown
 # LiveLLM Python Client
@@ -39,6 +35,9 @@ Python client library for the LiveLLM Server - a unified proxy for AI agent, aud
 - 🎯 **Multi-provider** - OpenAI, Google, Anthropic, Groq, ElevenLabs
 - 🔄 **Streaming** - Real-time streaming for agent and audio
 - 🛠️ **Flexible API** - Use request objects or keyword arguments
+- 📋 **Structured Output** - Get validated JSON responses with schema support (Pydantic, OutputSchema, or dict)
+- 📏 **Context Overflow Management** - Automatic handling of large texts with truncate/recycle strategies
+- ⏱️ **Per-Request Timeout** - Override default timeout for individual requests
 - 🎙️ **Audio services** - Text-to-speech and transcription
 - 🎤 **Real-Time Transcription** - WebSocket-based live audio transcription with bidirectional streaming
 - ⚡ **Fallback strategies** - Sequential and parallel handling
@@ -98,10 +97,10 @@ from livellm.models import Settings, ProviderKind
 # Basic
 client = LivellmClient(base_url="http://localhost:8000")
-# With timeout and pre-configured providers
+# With default timeout and pre-configured providers
 client = LivellmClient(
     base_url="http://localhost:8000",
-    timeout=30.0,
+    timeout=30.0,  # Default timeout for all requests
     configs=[
         Settings(
             uid="openai",
@@ -119,6 +118,50 @@ client = LivellmClient(
 )
 ```
+### Per-Request Timeout Override
+The timeout provided in `__init__` is the default, but you can override it for individual requests:
+```python
+# Client with 30s default timeout
+client = LivellmClient(base_url="http://localhost:8000", timeout=30.0)
+# Uses default 30s timeout
+response = await client.agent_run(
+    provider_uid="openai",
+    model="gpt-4",
+    messages=[TextMessage(role="user", content="Hello")]
+)
+# Override with 120s timeout for this specific request
+response = await client.agent_run(
+    provider_uid="openai",
+    model="gpt-4",
+    messages=[TextMessage(role="user", content="Write a long essay...")],
+    timeout=120.0  # Override for this request only
+)
+# Works with streaming too
+async for chunk in client.agent_run_stream(
+    provider_uid="openai",
+    model="gpt-4",
+    messages=[TextMessage(role="user", content="Tell me a story")],
+    timeout=300.0  # 5 minutes for streaming
+):
+    print(chunk.output, end="")
+# Works with all methods: speak(), speak_stream(), transcribe(), etc.
+audio = await client.speak(
+    provider_uid="openai",
+    model="tts-1",
+    text="Hello world",
+    voice="alloy",
+    mime_type=SpeakMimeType.MP3,
+    sample_rate=24000,
+    timeout=60.0
+)
+```
 ### Supported Providers
 `OPENAI` • `GOOGLE` • `ANTHROPIC` • `GROQ` • `ELEVENLABS`
@@ -302,6 +345,213 @@ if response.history:
 - Auditing and logging complete conversations
 - Building conversational UIs with full context visibility
+#### Agent with Structured Output
+Get structured JSON responses from the agent by providing an output schema. The agent will return a JSON string matching your schema in the `output` field.
+**Three ways to define a schema:**
+**1. Using Pydantic BaseModel (Recommended)**
+```python
+import json
+from pydantic import BaseModel
+from livellm.models import TextMessage
+class Person(BaseModel):
+    name: str
+    age: int
+    occupation: str
+response = await client.agent_run(
+    provider_uid="openai",
+    model="gpt-4",
+    messages=[TextMessage(role="user", content="Extract info: John is a 28-year-old engineer")],
+    output_schema=Person  # Pass the BaseModel class directly
+)
+# response.output is a JSON string: '{"name": "John", "age": 28, "occupation": "engineer"}'
+print(type(response.output))  # <class 'str'>
+# Parse the JSON string yourself if needed
+data = json.loads(response.output)
+print(f"Name: {data['name']}")
+print(f"Age: {data['age']}")
+print(f"Occupation: {data['occupation']}")
+# Or validate with your Pydantic model
+person = Person.model_validate_json(response.output)
+print(f"Name: {person.name}")
+```
+**2. Using OutputSchema**
+```python
+from livellm.models import OutputSchema, PropertyDef, TextMessage
+schema = OutputSchema(
+    title="Person",
+    description="A person's information",
+    properties={
+        "name": PropertyDef(type="string", description="The person's name"),
+        "age": PropertyDef(type="integer", minimum=0, maximum=150, description="Age in years"),
+        "email": PropertyDef(type="string", pattern="^[^@]+@[^@]+\\.[^@]+$", description="Email address"),
+    },
+    required=["name", "age", "email"]
+)
+response = await client.agent_run(
+    provider_uid="openai",
+    model="gpt-4",
+    messages=[TextMessage(role="user", content="Tell me about a person")],
+    output_schema=schema
+)
+```
+**3. Using a dictionary (JSON Schema)**
+```python
+schema_dict = {
+    "title": "Person",
+    "type": "object",
+    "properties": {
+        "name": {"type": "string", "description": "The person's name"},
+        "age": {"type": "integer", "minimum": 0, "maximum": 150},
+        "email": {"type": "string", "pattern": "^[^@]+@[^@]+\\.[^@]+$"}
+    },
+    "required": ["name", "age", "email"]
+}
+response = await client.agent_run(
+    provider_uid="openai",
+    model="gpt-4",
+    messages=[TextMessage(role="user", content="Extract person info")],
+    output_schema=schema_dict
+)
+```
+**Complex nested schemas:**
+```python
+from pydantic import BaseModel
+from typing import List, Optional
+class Address(BaseModel):
+    street: str
+    city: str
+    zip_code: str
+class Person(BaseModel):
+    name: str
+    age: int
+    addresses: List[Address]
+    phone: Optional[str] = None
+response = await client.agent_run(
+    provider_uid="openai",
+    model="gpt-4",
+    messages=[TextMessage(role="user", content="Extract person with addresses")],
+    output_schema=Person  # Nested models are automatically resolved
+)
+```
+**With streaming:**
+```python
+from pydantic import BaseModel
+class Summary(BaseModel):
+    title: str
+    key_points: List[str]
+    word_count: int
+stream = client.agent_run_stream(
+    provider_uid="openai",
+    model="gpt-4",
+    messages=[TextMessage(role="user", content="Summarize this article")],
+    output_schema=Summary
+)
+async for chunk in stream:
+    print(chunk.output, end="", flush=True)
+# After streaming completes, parse the full JSON output
+full_output = "".join([chunk.output async for chunk in stream])
+data = json.loads(full_output)
+```
+**Response fields:**
+- `output` - The JSON string response matching your schema
+**Use cases:**
+- Data extraction and parsing
+- API response formatting
+- Structured data generation
+- Type-safe responses
+- Integration with type-checked code
+#### Context Overflow Management
+Handle large texts that exceed model context windows with automatic truncation or iterative processing:
+```python
+from livellm.models import TextMessage, ContextOverflowStrategy, OutputSchema, PropertyDef
+# TRUNCATE strategy (default): Preserves beginning, middle, and end
+# Works with both streaming and non-streaming
+response = await client.agent_run(
+    provider_uid="openai",
+    model="gpt-4",
+    messages=[
+        TextMessage(role="system", content="Summarize the document."),
+        TextMessage(role="user", content=very_long_document)
+    ],
+    context_limit=4000,  # Max tokens
+    context_overflow_strategy=ContextOverflowStrategy.TRUNCATE
+)
+# RECYCLE strategy: Iteratively processes chunks and merges results
+# Useful for extraction tasks - processes entire document
+# Requires output_schema for JSON merging
+output_schema = OutputSchema(
+    title="ExtractedInfo",
+    properties={
+        "topics": PropertyDef(type="array", items={"type": "string"}),
+        "key_figures": PropertyDef(type="array", items={"type": "string"})
+    },
+    required=["topics", "key_figures"]
+)
+response = await client.agent_run(
+    provider_uid="openai",
+    model="gpt-4",
+    messages=[
+        TextMessage(role="system", content="Extract all topics and key figures."),
+        TextMessage(role="user", content=very_long_document)
+    ],
+    context_limit=3000,
+    context_overflow_strategy=ContextOverflowStrategy.RECYCLE,
+    output_schema=output_schema
+)
+# Parse the merged results
+import json
+result = json.loads(response.output)
+print(f"Topics: {result['topics']}")
+print(f"Key figures: {result['key_figures']}")
+```
+**Strategy comparison:**
+| Strategy | How it works | Best for | Streaming |
+|----------|--------------|----------|-----------|
+| `TRUNCATE` | Takes beginning, middle, end portions | Summarization, Q&A | ✅ Yes |
+| `RECYCLE` | Processes chunks iteratively, merges JSON | Full document extraction | ❌ No |
+**Parameters:**
+- `context_limit` (int, default: 0) - Maximum tokens. If ≤ 0, overflow handling is disabled
+- `context_overflow_strategy` (ContextOverflowStrategy, default: TRUNCATE) - Strategy to use
+**Notes:**
+- System prompts are always preserved (never truncated)
+- Token counting includes a 20% safety buffer
+- RECYCLE requires `output_schema` for JSON merging
 ### Audio Services
 #### Text-to-Speech
@@ -411,11 +661,17 @@ async def transcribe_live_direct():
         )
         # Stream audio and receive transcriptions
-        async for response in client.start_session(init_request, audio_source()):
-            print(f"Transcription: {response.transcription}")
-            if response.is_end:
-                print("Transcription complete!")
-                break
+        # Each iteration yields a list of responses (oldest to newest)
+        async for responses in client.start_session(init_request, audio_source()):
+            # Get the latest transcription (last element)
+            latest = responses[-1]
+            print(f"Latest transcription: {latest.transcription}")
+            # Process all accumulated transcriptions if needed
+            if len(responses) > 1:
+                print(f"  (received {len(responses)} chunks)")
+                for resp in responses:
+                    print(f"    - {resp.transcription}")
 asyncio.run(transcribe_live_direct())
 ```
@@ -453,25 +709,25 @@ async def transcribe_and_chat():
                 gen_config={},
             )
-            # Listen for transcriptions and, for each chunk, run an agent request
-            async for resp in t_client.start_session(init_request, audio_source()):
-                print("User said:", resp.transcription)
+            # Listen for transcriptions and, for each batch, run an agent request
+            # Each iteration yields a list of responses - newest is last
+            async for responses in t_client.start_session(init_request, audio_source()):
+                # Use the latest transcription for the agent
+                latest = responses[-1]
+                print("User said:", latest.transcription)
                 # You can call agent_run (or speak, etc.) while the transcription stream is active
+                # Even if this is slow, transcriptions accumulate and won't stall the loop
                 agent_response = await realtime.agent_run(
                     provider_uid="openai",
                     model="gpt-4",
                     messages=[
-                        TextMessage(role="user", content=resp.transcription),
+                        TextMessage(role="user", content=latest.transcription),
                     ],
                     temperature=0.7,
                 )
                 print("Agent:", agent_response.output)
-                if resp.is_end:
-                    print("Transcription session complete")
-                    break
 asyncio.run(transcribe_and_chat())
 ```
@@ -568,25 +824,27 @@ response = await client.ping()
 ### Client Methods
+All methods accept an optional `timeout` parameter to override the default client timeout.
 **Configuration**
-- `ping()` - Health check
-- `update_config(config)` / `update_configs(configs)` - Add/update providers
-- `get_configs()` - List all configurations
-- `delete_config(uid)` - Remove provider
+- `ping(timeout?)` - Health check
+- `update_config(config, timeout?)` / `update_configs(configs, timeout?)` - Add/update providers
+- `get_configs(timeout?)` - List all configurations
+- `delete_config(uid, timeout?)` - Remove provider
 **Agent**
-- `agent_run(request | **kwargs)` - Run agent (blocking)
-- `agent_run_stream(request | **kwargs)` - Run agent (streaming)
+- `agent_run(request | **kwargs, timeout?)` - Run agent (blocking)
+- `agent_run_stream(request | **kwargs, timeout?)` - Run agent (streaming)
 **Audio**
-- `speak(request | **kwargs)` - Text-to-speech (blocking)
-- `speak_stream(request | **kwargs)` - Text-to-speech (streaming)
-- `transcribe(request | **kwargs)` - Speech-to-text
+- `speak(request | **kwargs, timeout?)` - Text-to-speech (blocking)
+- `speak_stream(request | **kwargs, timeout?)` - Text-to-speech (streaming)
+- `transcribe(request | **kwargs, timeout?)` - Speech-to-text
 **Real-Time Transcription (TranscriptionWsClient)**
 - `connect()` - Establish WebSocket connection
 - `disconnect()` - Close WebSocket connection
-- `start_session(init_request, audio_source)` - Start bidirectional streaming transcription
+- `start_session(init_request, audio_source)` - Start bidirectional streaming transcription; yields `list[TranscriptionWsResponse]` (accumulated responses, newest last)
 - `async with client:` - Auto connection management (recommended)
 **Cleanup**
@@ -607,25 +865,33 @@ response = await client.ping()
 - `MessageRole` - `USER` | `MODEL` | `SYSTEM` | `TOOL_CALL` | `TOOL_RETURN` (or use strings)
 **Requests**
-- `AgentRequest(provider_uid, model, messages, tools?, gen_config?, include_history?)` - Set `include_history=True` to get full conversation
+- `AgentRequest(provider_uid, model, messages, tools?, gen_config?, include_history?, output_schema?, context_limit?, context_overflow_strategy?)` - Set `include_history=True` to get full conversation. Set `output_schema` for structured JSON output. Set `context_limit` and `context_overflow_strategy` for handling large texts.
 - `SpeakRequest(provider_uid, model, text, voice, mime_type, sample_rate, gen_config?)`
 - `TranscribeRequest(provider_uid, file, model, language?, gen_config?)`
 - `TranscriptionInitWsRequest(provider_uid, model, language?, input_sample_rate?, input_audio_format?, gen_config?)`
 - `TranscriptionAudioChunkWsRequest(audio)` - Audio chunk for streaming
+**Context Overflow**
+- `ContextOverflowStrategy` - `TRUNCATE` | `RECYCLE`
 **Tools**
 - `WebSearchInput(kind=ToolKind.WEB_SEARCH, search_context_size)`
 - `MCPStreamableServerInput(kind=ToolKind.MCP_STREAMABLE_SERVER, url, prefix?, timeout?)`
+**Structured Output**
+- `OutputSchema(title, description?, properties, required?, additionalProperties?)` - JSON Schema for structured output
+- `PropertyDef(type, description?, enum?, default?, minLength?, maxLength?, pattern?, minimum?, maximum?, items?, ...)` - Property definition with validation constraints
+- `OutputSchema.from_pydantic(model)` - Convert a Pydantic BaseModel class to OutputSchema
 **Fallback**
 - `AgentFallbackRequest(strategy, requests, timeout_per_request?)`
 - `AudioFallbackRequest(strategy, requests, timeout_per_request?)`
 - `FallbackStrategy` - `SEQUENTIAL` | `PARALLEL`
 **Responses**
-- `AgentResponse(output, usage{input_tokens, output_tokens}, history?)` - `history` included when `include_history=True`
+- `AgentResponse(output, usage{input_tokens, output_tokens}, history?)` - `history` included when `include_history=True`. `output` is a JSON string when `output_schema` is provided.
 - `TranscribeResponse(text, language)`
-- `TranscriptionWsResponse(transcription, is_end)` - Real-time transcription result
+- `TranscriptionWsResponse(transcription, received_at)` - Real-time transcription result; yielded as `list[TranscriptionWsResponse]` with newest last
 ## Error Handling

livellm 1.5.5__tar.gz → 1.7.1__tar.gz

livellm 1.5.5tar.gz → 1.7.1tar.gz