PyPI - flowra - Versions diffs - 0.0.25.dev35__tar.gz → 0.0.26.dev37__tar.gz - Mend

flowra 0.0.25.dev35tar.gz → 0.0.26.dev37tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (280) hide show

flowra-0.0.26.dev37/.claude/commands/update-pricing.md ADDED Viewed

@@ -0,0 +1,74 @@
+# Update Pricing
+Update the LLM pricing data in `flowra/llm/pricing/data/` with current pricing from the web.
+## Instructions
+You are updating the pricing data for LLM models used by this project.
+### Project structure
+Pricing is stored in JSON files under `flowra/llm/pricing/data/`:
+- `generated.json` — auto-generated from litellm via `tools/sync_pricing.py`. **Do not edit manually.**
+- `custom.json` — manual overrides and additions for models/fields missing from litellm. Custom entries are **merged** with generated entries (custom fields override, other fields preserved from generated).
+The JSON format is `provider → model → pricing_fields`:
+```json
+{
+  "anthropic": {
+    "claude-sonnet-4-6": {
+      "input": 3.0,
+      "output": 15.0,
+      "cache_read": 0.3,
+      "cache_creation": 3.75,
+      "cache_creation_1h": 6.0,
+      "input_above_200k": 6.0,
+      "output_above_200k": 22.5
+    }
+  }
+}
+```
+Provider keys: `anthropic`, `openai`, `google` (future: `bedrock/us`, `azure/eu`, etc.)
+All prices in **$/1M tokens**. Omit fields that are zero.
+Available pricing fields:
+- `input`, `output` — base rates
+- `cache_read` — cache read cost
+- `cache_creation` — cache creation cost (5-minute ephemeral for Anthropic)
+- `cache_creation_1h` — Anthropic 1-hour cache creation cost
+- `reasoning_output` — separate reasoning token rate (if different from output)
+- `input_above_200k`, `output_above_200k`, `cache_read_above_200k`, `cache_creation_above_200k`, `cache_creation_1h_above_200k` — context tier rates for >200k tokens
+### Steps to follow
+1. **Read current pricing data**: Read `custom.json` and `generated.json` to see existing models and prices.
+2. **Determine which models to update**:
+   - If the user provided arguments (e.g., `/update-pricing all Gemini models`), search for pricing for those specific models.
+   - If no arguments were provided, refresh pricing for models in `custom.json`.
+3. **Web search for current pricing**: Use WebSearch to find the most up-to-date pricing:
+   - Search official pricing pages: Anthropic (anthropic.com/pricing), OpenAI (openai.com/api/pricing), Google Cloud (cloud.google.com/vertex-ai/generative-ai/pricing)
+   - Focus on models/fields that `generated.json` is missing (check litellm gaps)
+4. **Update `custom.json`**:
+   - Add/update entries for models or fields not covered by `generated.json`
+   - For partially missing fields (e.g., litellm has base rates but not `cache_creation_1h`), only include the missing fields — they will be merged with generated data
+   - For completely missing models, include all known pricing fields
+   - Remove entries from `custom.json` that are now fully covered by `generated.json`
+5. **Verify**: Run `make test name=pricing` to ensure nothing is broken.
+## Important notes
+- All prices are in dollars per 1 million tokens
+- `custom.json` entries **merge** with `generated.json` — you only need to specify fields that differ or are missing
+- Anthropic has separate 5-minute and 1-hour cache creation prices. 5m = 1.25x base input, 1h = 2x base input
+- OpenAI and Google cache creation is free — no `cache_creation` field needed
+- Model matching uses substring matching, so keys should be specific enough to avoid false matches
+- Always prioritize official pricing pages from model providers
+- To regenerate `generated.json` from litellm, run: `python tools/sync_pricing.py --apply`

{flowra-0.0.25.dev35 → flowra-0.0.26.dev37}/CHANGELOG.md RENAMED Viewed

@@ -7,6 +7,23 @@ and this project adheres to [Semantic Versioning](https://semver.org).
 ## [Unreleased]
+### Added
+- **`top_p`** parameter in `LLMRequest` and `LLMConfig` — nucleus sampling, supported by all providers.
+### Changed
+- **Universal pricing registry** — replaced three separate per-protocol pricing modules
+  (`anthropic.py`, `openai.py`, `google.py`) with a single JSON-backed `PricingRegistry`.
+  Pricing data now lives in `flowra/llm/pricing/data/generated.json` (auto-generated from
+  litellm) and `custom.json` (manual overrides). Supports context tiers (>200k tokens),
+  reasoning tokens, and cache creation TTL variants through a uniform `estimate_cost()` API.
+## [0.0.25] - 2026-03-24
+### Changed
+- **MLflow tool output** — JSON tool results are now parsed into structured dicts
+  for display in MLflow UI, instead of escaped strings.
+- **`SessionStorage`** and **`ChangeSet`** are now exported from `flowra.agent`.
 ## [0.0.24] - 2026-03-24
 ### Added

{flowra-0.0.25.dev35 → flowra-0.0.26.dev37}/CLAUDE.md RENAMED Viewed

@@ -51,7 +51,7 @@ Provider-agnostic interface for calling LLMs:
 Providers live in `flowra/llm/providers/`. Currently: `AnthropicVertexProvider`, `OpenAIProvider`, `GoogleVertexProvider`.
-Pricing utilities live in `flowra/llm/pricing/` — per-protocol cost estimation (anthropic, openai, google). Providers use these to populate `Usage.cost_usd` and `Usage.cost` (`CostBreakdown`).
+Pricing lives in `flowra/llm/pricing/` — universal JSON-backed registry (`PricingRegistry`) with per-provider cost estimation. Data files in `data/generated.json` (auto-generated from litellm via `tools/sync_pricing.py`) and `data/custom.json` (manual overrides, merged on load). Supports context tiers (>200k tokens), reasoning tokens, cache creation TTL variants. Providers call `estimate_cost(model, provider=..., ...)` → `CostBreakdown`.
 ### Flowing context (`flowra/agent/flow/flowing_registry.py`)

{flowra-0.0.25.dev35 → flowra-0.0.26.dev37}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: flowra
-Version: 0.0.25.dev35
+Version: 0.0.26.dev37
 Summary: Flowra — flow infrastructure for building stateful LLM agents
 Project-URL: Repository, https://github.com/anna-money/flowra
 Project-URL: Changelog, https://github.com/anna-money/flowra/blob/master/CHANGELOG.md

{flowra-0.0.25.dev35 → flowra-0.0.26.dev37}/context7.json RENAMED Viewed

@@ -14,7 +14,7 @@
     "LLM ABSTRACTION: LLMProvider is the core interface — two methods: async call(LLMRequest) -> LLMResponse and async stream(LLMRequest) -> AsyncIterator[StreamEvent]. Also an async context manager: supports aclose() and 'async with provider:' for resource cleanup",
     "stream() returns StreamEvent = TextDelta | ThinkingDelta | ContentComplete. TextDelta/ThinkingDelta carry incremental text; ContentComplete is always last and contains the full LLMResponse",
     "Default stream() implementation calls call() and yields a single ContentComplete — providers override for real-time streaming",
-    "LLMRequest contains: model, system (list[SystemMessage], default []), messages (list[UserMessage | AssistantMessage], default []), tools, json_schema, temperature, max_tokens, stop_sequences, additional_config, max_schema_retries. System messages are separate from conversation messages",
+    "LLMRequest contains: model, system (list[SystemMessage], default []), messages (list[UserMessage | AssistantMessage], default []), tools, json_schema, temperature, top_p, max_tokens, stop_sequences, additional_config, max_schema_retries. System messages are separate from conversation messages",
     "LLMResponse contains: message (AssistantMessage), stop_reason (StopReason), stop_sequence (str | None), usage (Usage | None), extra (dict[str, Any] — provider-specific data like provider_stop_reason, id)",
     "Usage contains: input_tokens, output_tokens, cache_read_input_tokens, cache_creation_input_tokens, cost_usd (total), cost (CostBreakdown with input/output/cache_read/cache_creation and total property). Token contract: input_tokens excludes cached tokens",
     "Messages: SystemMessage, UserMessage, AssistantMessage. System messages go in LLMRequest.system, conversation messages in LLMRequest.messages",
@@ -107,7 +107,7 @@
     "transient hint: blocks/messages/tools with transient=True are (1) skipped by NonTransient caching bundles (which cache the non-transient prefix — stop at first transient) and (2) auto-filtered from ChatAgent session history",
     "Anthropic extra passthrough: AnthropicVertexProvider merges block.extra into output dicts (**block.extra), so cache_control and other Anthropic-specific fields pass through directly",
-    "CONFIG: LLMConfig(model, temperature, max_tokens, stop_sequences, additional_config) configures LLM calls",
+    "CONFIG: LLMConfig(model, temperature, top_p, max_tokens, stop_sequences, additional_config) configures LLM calls",
     "ConfigValue[T] wraps static or dynamic (callable) config values: ConfigValue[str] | ConfigValue[Callable[[], str]]",
     "QUICK START: Create provider -> create ToolRegistry -> create Config -> create AgentRuntime -> runtime.run()",

{flowra-0.0.25.dev35 → flowra-0.0.26.dev37}/docs/internal/lib.md RENAMED Viewed

@@ -362,6 +362,7 @@ Shared LLM configuration used by all lib agents:
 LLMConfig(
     model="claude-sonnet-4-5@20250929",
     temperature=0.7,           # optional
+    top_p=0.9,                 # optional
     max_tokens=4096,           # optional
     stop_sequences=["END"],    # optional
     additional_config={},      # provider-specific

{flowra-0.0.25.dev35 → flowra-0.0.26.dev37}/docs/internal/llm.md RENAMED Viewed

@@ -321,6 +321,7 @@ LLMRequest(
 | `tools`              | `list[Tool] \| None`                    | `None`  | Available tools                      |
 | `json_schema`        | `dict[str, Any] \| None`                | `None`  | JSON Schema for structured output    |
 | `temperature`        | `float \| None`                         | `None`  | Generation temperature               |
+| `top_p`              | `float \| None`                         | `None`  | Nucleus sampling threshold           |
 | `max_tokens`         | `int \| None`                           | `None`  | Maximum tokens in response           |
 | `stop_sequences`     | `list[str] \| None`                     | `None`  | Stop sequences                       |
 | `max_schema_retries` | `int`                                   | `3`     | Retries on schema validation failure |

flowra-0.0.26.dev37/docs/research/llm_retry_backoff.md ADDED Viewed

@@ -0,0 +1,244 @@
+# Built-in LLM Retry with Backoff in Tool Loop — Research (March 2026)
+Research date: 2026-03-24
+## Problem
+When LLM providers return transient errors (429 rate limit, 5xx server errors,
+network timeouts), the tool loop has no retry logic. The exception propagates
+through `ToolLoopAgent.call_llm()` → `Engine.advance()` → `AgentRuntime` and
+kills the agent. The user gets an unrecoverable crash on a transient error.
+This is a gap in tool loop as a "batteries included" building block. Users
+should not need to write a wrapper agent or custom provider to handle the most
+common LLM failure mode.
+Reference: Strands SDK implements this as `ModelRetryStrategy` — a hook-based
+plugin with exponential backoff on `ModelThrottledException`.
+## Current state
+### What happens on LLM error
+```
+ToolLoopAgent.call_llm()
+  └─ self.__provider.call(request)  ← no try/except
+      └─ raises e.g. anthropic.RateLimitError
+          └─ propagates to Engine.advance()
+              └─ Engine closes all spans with error, re-raises
+                  └─ AgentRuntime.__run_loop_inner() sees exception, crashes
+```
+**No catch, no retry, no backoff.** The agent is dead.
+### Provider exceptions
+| Provider | Throttling exception | Other transient |
+|---|---|---|
+| Anthropic (`anthropic` SDK) | `anthropic.RateLimitError` (subclass of `APIStatusError`, status 429) | `APIConnectionError`, `APITimeoutError`, `InternalServerError` (5xx) |
+| OpenAI (`openai` SDK) | `openai.RateLimitError` (status 429) | `APIConnectionError`, `APITimeoutError`, `InternalServerError` (5xx) |
+| Google (`google.genai`) | `google.api_core.exceptions.ResourceExhausted` (429) | `ServiceUnavailable` (503), `DeadlineExceeded`, transient gRPC errors |
+Note: both `anthropic` and `openai` SDKs have their own built-in retry logic
+(2 retries by default), so by the time the exception reaches us, the SDK has
+already retried. But SDK retries are short (seconds), while real throttling
+can last minutes. And SDK retries don't help with extended outages.
+### What exists
+- **JSON schema validation retry** — in `AnthropicVertexProvider` only, for
+  structured output. Not for API errors.
+- **`max_consecutive_errors`** — in `ToolLoopConfig`, but for tool execution
+  errors (wrong tool calls), not LLM API errors.
+- **Crash recovery** — `SessionStorage` can resume after crash, but the user
+  has to restart the agent manually. Not the same as automatic retry.
+## Where to add retry
+### Option A: Inside ToolLoopAgent.call_llm() (recommended)
+Wrap the LLM call in a retry loop directly in the tool loop step:
+```python
+@step("call_llm")
+async def call_llm(self) -> GotoStep | Spawn | ToolLoopResult:
+    ...
+    retry_config = self.__config.retry  # RetryConfig(max_attempts=5, initial_delay=4, max_delay=240)
+    for attempt in range(retry_config.max_attempts):
+        try:
+            async with self.__emitter.span(LLMCallSpan(request=request)) as llm_span:
+                response = await self.__provider.call(request)
+                llm_span.response = response
+            break
+        except RETRYABLE_EXCEPTIONS as exc:
+            if attempt == retry_config.max_attempts - 1:
+                raise
+            delay = min(retry_config.initial_delay * (2 ** attempt), retry_config.max_delay)
+            # fire event for observability
+            await self.__emitter.emit(LLMRetryEvent(attempt=attempt, delay=delay, error=exc))
+            await asyncio.sleep(delay)
+    ...
+```
+**Pros:**
+- Simple, self-contained, no new abstractions
+- Works for both `call()` and `stream()`
+- Retry is per-LLM-call, not per-agent — exactly the right scope
+- Composable with crash recovery (if all retries fail → crash → resume)
+- The tool loop is THE building block; it should own this
+**Cons:**
+- Hardcoded in tool loop — not configurable via hooks
+- But: this is infrastructure, not business logic. Like TCP retransmission.
+### Option B: LLMProvider wrapper (decorator)
+```python
+class RetryProvider(LLMProvider):
+    def __init__(self, inner: LLMProvider, config: RetryConfig): ...
+    async def call(self, request): ...  # retry loop around inner.call()
+```
+**Pros:**
+- Decoupled from tool loop — works outside agents too
+- Provider-agnostic
+**Cons:**
+- User must wrap every provider manually
+- Not "batteries included" — the opposite of what we want
+- Streaming retry is tricky (partially consumed stream)
+- Doesn't integrate with tool loop observability
+### Option C: Hook-based (like Strands)
+A hook subscribes to a new `AfterLLMCallEvent` with an `error` field and sets
+`event.retry = True`.
+**Pros:**
+- Pluggable, configurable
+- Follows our hook pattern
+**Cons:**
+- Requires mutable event + retry flag + loop in the tool loop
+- Overengineered for what is essentially "sleep and retry on 429"
+- Hook state management across retries is subtle
+- The Strands approach stores mutable state on the strategy object —
+  not great for parallel/concurrent agents sharing the same strategy
+### Recommendation: Option A
+Retry on transient LLM errors is infrastructure. It belongs in the tool loop
+with a simple config, not in a pluggable hook system. The hook system is for
+business logic (model fallback, guardrails, caching). Retrying a 429 is not
+business logic — it's plumbing.
+## Design details
+### RetryConfig
+```python
+@dataclass(frozen=True)
+class RetryConfig:
+    max_attempts: int = 5           # total attempts (1 = no retry)
+    initial_delay: float = 4.0      # seconds
+    max_delay: float = 240.0        # seconds
+    backoff_factor: float = 2.0     # exponential multiplier
+    retryable: Callable[[BaseException], bool] | None = None  # custom predicate
+```
+Default in `ToolLoopConfig`:
+```python
+retry: RetryConfig = RetryConfig()
+```
+### What to retry
+Default retryable predicate — check for known transient exceptions from all
+three provider SDKs. The user can override with a custom predicate.
+Key question: should we catch broad `Exception` and check status codes, or
+import provider-specific exceptions? Importing provider SDKs creates unwanted
+dependencies. Better approach: a generic check:
+```python
+def is_retryable(exc: BaseException) -> bool:
+    # Check for status code attribute (anthropic, openai)
+    status = getattr(exc, "status_code", None) or getattr(exc, "status", None)
+    if status in (429, 500, 502, 503, 529):
+        return True
+    # Check for connection/timeout errors by name pattern
+    type_name = type(exc).__name__
+    if any(s in type_name for s in ("Timeout", "Connection", "Unavailable")):
+        return True
+    return False
+```
+No provider SDK imports needed. Works with any provider.
+### Observability
+Fire an event/span for each retry so the user can see what's happening:
+- `LLMRetryEvent(attempt, delay, error)` — hook event
+- Or simply log + emit through existing span hooks
+The `LLMCallSpan` should capture the final successful call, not the failed
+attempts. Failed attempts can be logged as sub-events or separate lightweight
+spans.
+### Streaming
+For `stream()`, the retry wraps the entire stream creation. If the stream
+fails mid-way (connection drop), that's harder — the response is partially
+consumed. Options:
+1. **Retry only on initial connection errors** — if `stream()` raises before
+   yielding any events, retry. If it fails mid-stream, propagate the error.
+   This covers 429 (rejected before streaming starts).
+2. **Full stream retry** — buffer events and replay on retry. Complex, and
+   the user may have already processed some deltas.
+Start with (1). Mid-stream failures are rare and a different problem.
+### Interrupt integration
+The `asyncio.sleep(delay)` during retry backoff should respect the interrupt
+token. If the agent is interrupted during a retry wait, it should stop
+immediately:
+```python
+await with_interrupt(asyncio.sleep(delay), self.__interrupt)
+```
+This already exists in the codebase for stream interruption.
+## Relation to other features
+| Feature | Relationship |
+|---|---|
+| **Model fallback** (`model_fallback.md`) | Complementary. Retry handles transient errors (same model). Fallback handles persistent errors (switch model). Could chain: retry N times → fall back to stronger model. |
+| **Crash recovery** | Retry is the first line of defense. If all retries fail, crash recovery kicks in. |
+| **Hooks** | Retry emits events for observability but is not hook-driven. |
+| **`max_consecutive_errors`** | Different scope — tool execution errors, not LLM API errors. |
+## Open questions
+1. **Should retry config be a `ConfigValue` (callable)?** Probably not — retry
+   config rarely needs to change dynamically. Keep it simple.
+2. **Jitter?** Exponential backoff with jitter is best practice to avoid
+   thundering herd. Add random jitter (±25%) to the delay.
+3. **Retry-After header?** Some providers return a `Retry-After` header with
+   429 responses. Should we parse it? The SDKs might already handle this in
+   their built-in retries, so by the time it reaches us, there's no header.
+   Low priority.
+4. **Per-provider retry?** Some providers are more aggressive with throttling
+   (Google Vertex with short bursts). Should retry config be per-provider?
+   Probably not — keep one config, the user can tune it.
+5. **Default: on or off?** Should retry be enabled by default? Yes — the
+   default `RetryConfig()` should retry with sensible defaults. Users who want
+   fail-fast can set `max_attempts=1`.

{flowra-0.0.25.dev35 → flowra-0.0.26.dev37}/docs/research/pricing_complexity.md RENAMED Viewed

@@ -13,11 +13,11 @@ Real LLM pricing is significantly more complex.
 The same model costs differently depending on how you access it:
-| Model | Direct API | Vertex AI | Bedrock | Azure |
-|---|---|---|---|---|
-| Claude Sonnet 4.6 | $3/$15 | different | different | N/A |
-| GPT-4o | $2.50/$10 | N/A | N/A | different |
-| Gemini 2.5 Pro | N/A | $1.25/$10 | N/A | N/A |
+| Model             | Direct API   | Vertex AI  | Bedrock   | Azure     |
+|-------------------|--------------|------------|-----------|-----------|
+| Claude Sonnet 4.6 | $3/$15       | different  | different | N/A       |
+| GPT-4o            | $2.50/$10    | N/A        | N/A       | different |
+| Gemini 2.5 Pro    | N/A          | $1.25/$10  | N/A       | N/A       |
 litellm tracks this — each model has separate entries per provider:
 - `claude-sonnet-4-6` (litellm_provider: "anthropic")

{flowra-0.0.25.dev35 → flowra-0.0.26.dev37}/docs/todo.md RENAMED Viewed

@@ -26,6 +26,19 @@
+## Built-in LLM retry with backoff
+- **Automatic retry with exponential backoff for transient LLM errors (429, 5xx, timeouts).**
+  The tool loop is a "batteries included" building block — retry on transient provider errors
+  should be built in, not require a wrapper agent. Add `RetryConfig` to `ToolLoopConfig`
+  (max_attempts, initial_delay, max_delay, backoff_factor, custom retryable predicate).
+  Wrap the LLM call in `ToolLoopAgent.call_llm()` with a retry loop. Use a generic
+  `is_retryable()` check (status codes + exception name patterns) to avoid importing
+  provider SDKs. Respect interrupt tokens during backoff sleep. Emit observability events
+  on retry. Start with retry on initial connection errors only (not mid-stream failures).
+  See `docs/research/llm_retry_backoff.md`.
 ## Documentation benchmark suite
 - A series of tasks given to a coding agent that has access only to documentation

{flowra-0.0.25.dev35 → flowra-0.0.26.dev37}/flowra/ext/mlflow.py RENAMED Viewed

@@ -65,8 +65,8 @@ def _resolve_experiment_id(experiment_name: str) -> str:
 class _MlflowTracing:
     __slots__ = ()
+    @staticmethod
     def install(
-        self,
         runtime: AgentRuntime,
         *,
         experiment_name: str | None = None,
@@ -347,8 +347,11 @@ def _format_chat_inputs(req: LLMRequest) -> dict[str, Any]:
     inputs: dict[str, Any] = {"model": req.model}
     if req.temperature is not None:
         inputs["temperature"] = req.temperature
+    if req.top_p is not None:
+        inputs["top_p"] = req.top_p
     if req.max_tokens is not None:
         inputs["max_tokens"] = req.max_tokens
+    inputs.update(req.additional_config)
     messages: list[dict[str, Any]] = []
     for msg in req.system:

{flowra-0.0.25.dev35 → flowra-0.0.26.dev37}/flowra/ext/otel.py RENAMED Viewed

@@ -200,8 +200,12 @@ def _make_llm_handler(
         }
         if span.request.temperature is not None:
             attrs["gen_ai.request.temperature"] = span.request.temperature
+        if span.request.top_p is not None:
+            attrs["gen_ai.request.top_p"] = span.request.top_p
         if span.request.max_tokens is not None:
             attrs["gen_ai.request.max_tokens"] = span.request.max_tokens
+        for key, val in span.request.additional_config.items():
+            attrs[f"gen_ai.request.{key}"] = val if isinstance(val, str | int | float | bool) else str(val)
         otel_span = _start_span(otel_parent, tracer, f"chat {model_name}", kind=SpanKind.CLIENT, attributes=attrs)
         otel_parent.set(otel_span)

{flowra-0.0.25.dev35 → flowra-0.0.26.dev37}/flowra/lib/llm_call/agent.py RENAMED Viewed

@@ -36,6 +36,7 @@ class LLMCallAgent(Agent[LLMCallSpec, LLMCallResult]):
             system=list(spec.system),
             messages=list(spec.messages),
             temperature=llm_config.temperature,
+            top_p=llm_config.top_p,
             max_tokens=llm_config.max_tokens,
             stop_sequences=llm_config.stop_sequences,
             additional_config=llm_config.additional_config,

{flowra-0.0.25.dev35 → flowra-0.0.26.dev37}/flowra/lib/llm_config.py RENAMED Viewed

@@ -8,6 +8,7 @@ __all__ = ["LLMConfig"]
 class LLMConfig:
     model: str
     temperature: float | None = None
+    top_p: float | None = None
     max_tokens: int | None = None
     stop_sequences: list[str] | None = None
     additional_config: dict[str, Any] = dataclasses.field(default_factory=dict)

{flowra-0.0.25.dev35 → flowra-0.0.26.dev37}/flowra/lib/tool_loop/agent.py RENAMED Viewed

@@ -160,6 +160,7 @@ class ToolLoopAgent(Agent[ToolLoopSpec, ToolLoopResult]):
             tools=tools,
             json_schema=json_schema,
             temperature=llm_config.temperature,
+            top_p=llm_config.top_p,
             max_tokens=llm_config.max_tokens,
             stop_sequences=llm_config.stop_sequences,
             additional_config=llm_config.additional_config,

flowra-0.0.26.dev37/flowra/llm/pricing/__init__.py ADDED Viewed

@@ -0,0 +1,40 @@
+"""Universal LLM pricing — JSON-backed registry with per-provider cost estimation."""
+from ..response import CostBreakdown
+from .registry import ModelPricing, PricingRegistry
+__all__ = ["CostBreakdown", "ModelPricing", "PricingRegistry", "estimate_cost", "get_registry"]
+_default: PricingRegistry | None = None
+def get_registry() -> PricingRegistry:
+    """Return the default :class:`PricingRegistry` (lazy-loaded singleton)."""
+    global _default
+    if _default is None:
+        _default = PricingRegistry.load_default()
+    return _default
+def estimate_cost(
+    model: str,
+    *,
+    provider: str,
+    input_tokens: int,
+    output_tokens: int,
+    cache_read_tokens: int = 0,
+    cache_creation_tokens: int = 0,
+    cache_creation_1h_tokens: int = 0,
+    reasoning_tokens: int = 0,
+) -> CostBreakdown | None:
+    """Convenience wrapper around :meth:`PricingRegistry.estimate_cost`."""
+    return get_registry().estimate_cost(
+        model,
+        provider=provider,
+        input_tokens=input_tokens,
+        output_tokens=output_tokens,
+        cache_read_tokens=cache_read_tokens,
+        cache_creation_tokens=cache_creation_tokens,
+        cache_creation_1h_tokens=cache_creation_1h_tokens,
+        reasoning_tokens=reasoning_tokens,
+    )

flowra-0.0.26.dev37/flowra/llm/pricing/data/custom.json ADDED Viewed

@@ -0,0 +1,52 @@
+{
+  "anthropic": {
+    "claude-sonnet-4-6": {
+      "cache_creation_1h": 6.0
+    },
+    "claude-sonnet-4-5": {
+      "cache_creation_1h": 6.0
+    },
+    "claude-sonnet-4": {
+      "cache_creation_1h": 6.0
+    },
+    "claude-sonnet-3-7": {
+      "input": 3.0,
+      "output": 15.0,
+      "cache_read": 0.3,
+      "cache_creation": 3.75,
+      "cache_creation_1h": 6.0
+    },
+    "claude-haiku-3-5": {
+      "input": 0.8,
+      "output": 4.0,
+      "cache_read": 0.08,
+      "cache_creation": 1.0,
+      "cache_creation_1h": 1.6
+    },
+    "claude-haiku-3": {
+      "input": 0.25,
+      "output": 1.25,
+      "cache_read": 0.03,
+      "cache_creation": 0.3,
+      "cache_creation_1h": 0.5
+    },
+    "claude-opus-3": {
+      "input": 15.0,
+      "output": 75.0,
+      "cache_read": 1.5,
+      "cache_creation": 18.75,
+      "cache_creation_1h": 30.0
+    }
+  },
+  "openai": {
+    "mercury-coder": {
+      "input": 0.25,
+      "output": 1.0
+    },
+    "mercury-2": {
+      "input": 0.25,
+      "output": 0.75,
+      "cache_read": 0.025
+    }
+  }
+}

flowra 0.0.25.dev35__tar.gz → 0.0.26.dev37__tar.gz

flowra 0.0.25.dev35tar.gz → 0.0.26.dev37tar.gz