PyPI - flowra - Versions diffs - 0.0.1.dev2__tar.gz → 0.0.2.dev5__tar.gz - Mend

flowra 0.0.1.dev2tar.gz → 0.0.2.dev5tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (165) hide show

{flowra-0.0.1.dev2 → flowra-0.0.2.dev5}/.github/workflows/publish.yml RENAMED Viewed

@@ -56,7 +56,7 @@ jobs:
       - name: Checkout code
         uses: actions/checkout@v6
         with:
-          token: ${{ secrets.GITHUB_TOKEN }}
+          ssh-key: ${{ secrets.DEPLOY_KEY }}
       - name: Set up uv
         uses: astral-sh/setup-uv@v7
       - name: Install deps

flowra-0.0.2.dev5/CHANGELOG.md ADDED Viewed

@@ -0,0 +1,31 @@
+# Changelog
+All notable changes to this project will be documented in this file.
+The format is based on [Keep a Changelog](https://keepachangelog.com),
+and this project adheres to [Semantic Versioning](https://semver.org).
+## [Unreleased]
+### Added
+- **Streaming**: `LLMProvider.stream()` method returns `AsyncIterator[StreamEvent]` with `TextDelta`, `ThinkingDelta`, and `ContentComplete` events. All three built-in providers implement real-time streaming. Default fallback calls `call()` and yields `ContentComplete`.
+- **Anthropic thinking**: `AnthropicVertexAdditionalConfig` with `thinking_budget_tokens` enables extended thinking on Claude models. Thinking blocks are now parsed from Anthropic responses (`ThinkingBlock`).
+- **Streaming hooks**: `on_text_delta` and `on_thinking_delta` hooks in `ToolLoopHooks`. When set, the agent automatically uses `stream()` instead of `call()`. Streaming respects `InterruptToken` — exits immediately even if the LLM is blocked.
+- **`with_interrupt`**: Generic async iterator wrapper that races `__anext__()` against `InterruptToken.wait()` via `asyncio.wait(FIRST_COMPLETED)`. Used internally by `ToolLoopAgent` for streaming; available as `from flowra.agent import with_interrupt`.
+- **Thinking model entry**: `anthropic/sonnet-4-5-think` in example model registry with 4000 token thinking budget.
+- **Console/TUI streaming**: `--stream` flag for `console_chat.py` and `tui_chat.py` examples.
+## [0.0.1] - 2026-03-07
+Initial release.
+### Added
+- State machine agents with `@step` methods, `Goto`, `Spawn`, and stored values (`Scalar`, `AppendOnlyList`)
+- Provider-agnostic LLM abstraction (`LLMProvider`, `LLMRequest`, `LLMResponse`)
+- LLM providers: `AnthropicVertexProvider`, `GoogleVertexProvider`, `OpenAIProvider`
+- Tool integration: `@tool` decorator, MCP server support, DI into tool handlers
+- Execution engine with persistence, crash recovery, and cooperative interrupts
+- Pre-built agents: `ChatAgent` (multi-turn chat) and `ToolLoopAgent` (tool loop with hooks)
+- `ChatHooks` with `on_save_turn_messages` for transient message filtering
+- Optional provider dependencies via extras: `flowra[anthropic]`, `flowra[openai]`, `flowra[google]`, `flowra[all]`
+- Python 3.12, 3.13, 3.14 support

{flowra-0.0.1.dev2 → flowra-0.0.2.dev5}/CLAUDE.md RENAMED Viewed

@@ -30,12 +30,13 @@ Python 3.12+ library. Package manager: **uv**. All config in `pyproject.toml`.
 Provider-agnostic interface for calling LLMs:
-- `LLMProvider` (abc) — single method `async call(LLMRequest) -> LLMResponse`
-- `LLMRequest` — model, messages, tools, json_schema, temperature, max_tokens, stop_sequences
+- `LLMProvider` (abc) — `async call(LLMRequest) -> LLMResponse` and `async stream(LLMRequest) -> AsyncIterator[StreamEvent]`
+- `LLMRequest` — model, messages, tools, json_schema, temperature, max_tokens, stop_sequences, additional_config, max_schema_retries
 - `LLMResponse` — message (AssistantMessage), stop_reason, usage
+- `StreamEvent` = `TextDelta | ThinkingDelta | ContentComplete` — stream events for real-time token delivery
 - `Usage` — input_tokens, output_tokens, cache_read_input_tokens, cache_creation_input_tokens, cost_usd. Token contract: `input_tokens` excludes cached tokens
 - Messages: `SystemMessage`, `UserMessage`, `AssistantMessage` — system messages must be at the beginning of the messages list
-- Blocks: `TextBlock` (with `cache: bool` for prompt caching), `ImageBlock`, `ToolUseBlock`, `ToolResultBlock`
+- Blocks: `TextBlock` (with `cache: bool` for prompt caching), `ImageBlock`, `ToolUseBlock`, `ToolResultBlock`, `ThinkingBlock`
 - `Tool` — name, description, input_schema, output_schema, cache
 Providers live in `flowra/llm/providers/`. Currently: `AnthropicVertexProvider`, `OpenAIProvider`, `GoogleVertexProvider`.
@@ -56,3 +57,7 @@ Review prompts live in `docs/review_prompts/`:
 ### Tests
 Test directory structure mirrors `flowra/`. E2E tests use `_e2e` suffix (e.g., `test_anthropic_e2e.py`). Environment variables loaded from `.env` via Makefile.
+## Maintenance
+- **`context7.json`** — project description for [Context7](https://context7.com). Must be updated when adding new features, changing public APIs, or modifying architecture. Keep rules in sync with actual capabilities.

{flowra-0.0.1.dev2 → flowra-0.0.2.dev5}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: flowra
-Version: 0.0.1.dev2
+Version: 0.0.2.dev5
 Summary: Flowra — flow infrastructure for building stateful LLM agents
 Project-URL: Repository, https://github.com/anna-money/flowra
 Project-URL: Changelog, https://github.com/anna-money/flowra/blob/master/CHANGELOG.md
@@ -33,6 +33,11 @@ Description-Content-Type: text/markdown
 # Flowra
+[![PyPI](https://img.shields.io/pypi/v/flowra)](https://pypi.org/project/flowra/)
+[![Python](https://img.shields.io/pypi/pyversions/flowra)](https://pypi.org/project/flowra/)
+[![License](https://img.shields.io/pypi/l/flowra)](https://github.com/anna-money/flowra/blob/master/LICENSE)
+[![CI](https://github.com/anna-money/flowra/actions/workflows/master.yml/badge.svg)](https://github.com/anna-money/flowra/actions/workflows/master.yml)
 **Flow infra** for building stateful, persistent LLM agents with tool use,
 parallel execution, and crash recovery. Requires Python 3.12+.
@@ -45,7 +50,7 @@ parallel execution, and crash recovery. Requires Python 3.12+.
 - **Tool integration** — `@tool` decorator for local functions, MCP server support,
   DI into tool handlers
 - **LLM abstraction** — provider-agnostic `LLMProvider` interface with immutable
-  message types (ships `AnthropicVertexProvider`, `GoogleVertexProvider`, `OpenAIProvider`)
+  message types and real-time streaming (ships `AnthropicVertexProvider`, `GoogleVertexProvider`, `OpenAIProvider`)
 - **Cooperative interrupts** — `InterruptToken` for graceful cancellation across
   the entire execution tree
 - **Pre-built agents** — `ChatAgent` (multi-turn chat with session history) and

{flowra-0.0.1.dev2 → flowra-0.0.2.dev5}/README.md RENAMED Viewed

@@ -1,5 +1,10 @@
 # Flowra
+[![PyPI](https://img.shields.io/pypi/v/flowra)](https://pypi.org/project/flowra/)
+[![Python](https://img.shields.io/pypi/pyversions/flowra)](https://pypi.org/project/flowra/)
+[![License](https://img.shields.io/pypi/l/flowra)](https://github.com/anna-money/flowra/blob/master/LICENSE)
+[![CI](https://github.com/anna-money/flowra/actions/workflows/master.yml/badge.svg)](https://github.com/anna-money/flowra/actions/workflows/master.yml)
 **Flow infra** for building stateful, persistent LLM agents with tool use,
 parallel execution, and crash recovery. Requires Python 3.12+.
@@ -12,7 +17,7 @@ parallel execution, and crash recovery. Requires Python 3.12+.
 - **Tool integration** — `@tool` decorator for local functions, MCP server support,
   DI into tool handlers
 - **LLM abstraction** — provider-agnostic `LLMProvider` interface with immutable
-  message types (ships `AnthropicVertexProvider`, `GoogleVertexProvider`, `OpenAIProvider`)
+  message types and real-time streaming (ships `AnthropicVertexProvider`, `GoogleVertexProvider`, `OpenAIProvider`)
 - **Cooperative interrupts** — `InterruptToken` for graceful cancellation across
   the entire execution tree
 - **Pre-built agents** — `ChatAgent` (multi-turn chat with session history) and

{flowra-0.0.1.dev2 → flowra-0.0.2.dev5}/context7.json RENAMED Viewed

@@ -1,7 +1,7 @@
 {
   "$schema": "https://context7.com/schema/context7.json",
   "projectTitle": "flowra",
-  "description": "Flow infrastructure for building stateful, persistent LLM agents with tool use, parallel execution, and crash recovery. Requires Python 3.12+. Features state machine agents with @step methods, persistent state (Scalar, AppendOnlyList) with dirty tracking, tool integration (@tool decorator + MCP), provider-agnostic LLM abstraction, cooperative interrupts, and pre-built ChatAgent/ToolLoopAgent.",
+  "description": "Flow infrastructure for building stateful, persistent LLM agents with tool use, parallel execution, crash recovery, and real-time streaming. Requires Python 3.12+. Features state machine agents with @step methods, persistent state (Scalar, AppendOnlyList) with dirty tracking, tool integration (@tool decorator + MCP), provider-agnostic LLM abstraction with streaming support, extended thinking (Anthropic, Google), cooperative interrupts, and pre-built ChatAgent/ToolLoopAgent.",
   "folders": ["flowra", "examples"],
   "excludeFolders": ["tests", ".github", "logs", ".chat_sessions"],
   "excludeFiles": [],
@@ -11,15 +11,22 @@
     "PACKAGE STRUCTURE: flowra/llm/ (LLM abstraction), flowra/tools/ (tool system), flowra/agent/ (state machine framework), flowra/runtime/ (execution engine), flowra/lib/ (pre-built agents)",
     "Dependency graph: llm (no deps) -> tools (llm) -> agent (no deps) -> runtime (agent, llm) -> lib (agent, llm, tools, runtime). No circular dependencies",
-    "LLM ABSTRACTION: LLMProvider is the core interface — single method: async call(LLMRequest) -> LLMResponse",
-    "LLMRequest contains: model, messages, tools, json_schema, temperature, max_tokens, stop_sequences",
+    "LLM ABSTRACTION: LLMProvider is the core interface — two methods: async call(LLMRequest) -> LLMResponse and async stream(LLMRequest) -> AsyncIterator[StreamEvent]",
+    "stream() returns StreamEvent = TextDelta | ThinkingDelta | ContentComplete. TextDelta/ThinkingDelta carry incremental text; ContentComplete is always last and contains the full LLMResponse",
+    "Default stream() implementation calls call() and yields a single ContentComplete — providers override for real-time streaming",
+    "LLMRequest contains: model, messages, tools, json_schema, temperature, max_tokens, stop_sequences, additional_config, max_schema_retries",
     "LLMResponse contains: message (AssistantMessage), stop_reason (StopReason), usage (Usage)",
     "Usage contains: input_tokens, output_tokens, cache_read_input_tokens, cache_creation_input_tokens, cost_usd. Token contract: input_tokens excludes cached tokens",
     "Messages: SystemMessage, UserMessage, AssistantMessage. System messages must be at the beginning of the messages list",
     "Blocks: TextBlock (with cache: bool for prompt caching), ImageBlock, ToolUseBlock, ToolResultBlock, ThinkingBlock",
+    "ThinkingBlock holds reasoning/thinking text from models with extended thinking (Anthropic Claude, Google Gemini). Not sent back to the API",
     "Tool definition for LLM: Tool(name, description, input_schema, output_schema, cache)",
-    "Three built-in providers: AnthropicVertexProvider (Claude via Vertex AI), OpenAIProvider (OpenAI-compatible APIs), GoogleVertexProvider (Gemini via Vertex AI)",
+    "Three built-in providers: AnthropicVertexProvider (Claude via Vertex AI), OpenAIProvider (OpenAI-compatible APIs), GoogleVertexProvider (Gemini via Vertex AI). All three implement stream() with real-time deltas",
     "Import providers: from flowra.llm.providers.anthropic_vertex import AnthropicVertexProvider, from flowra.llm.providers.openai import OpenAIProvider, from flowra.llm.providers.google_vertex import GoogleVertexProvider",
+    "Import stream types: from flowra.llm import TextDelta, ThinkingDelta, ContentComplete",
+    "AnthropicVertexAdditionalConfig(thinking_budget_tokens: int) enables extended thinking on Claude. Pass via additional_config={'thinking_budget_tokens': 4000}",
+    "GoogleVertexAdditionalConfig(thinking_level: ThinkingLevel, thinking_budget: int) configures Gemini thinking. Pass via additional_config={'thinking_level': 'medium'} or {'thinking_budget': 4096}",
+    "AnthropicVertexProvider falls back to non-streaming when json_schema is set (retry loop requires full responses)",
     "TOOL SYSTEM: @tool decorator turns a Python function into a tool definition",
     "get_local_tool(func) wraps a @tool-decorated function into a LocalTool",
@@ -49,6 +56,7 @@
     "COOPERATIVE INTERRUPTS: InterruptToken for graceful cancellation across the entire execution tree",
     "InterruptTokenSource creates tokens: source = InterruptTokenSource(); token = source.token; source.interrupt()",
+    "with_interrupt(ait, token) wraps any AsyncIterator so it exits immediately when token fires — races __anext__() vs token.wait() via asyncio.wait. Import: from flowra.agent import with_interrupt",
     "PRE-BUILT AGENTS — ChatAgent: multi-turn chat with session history persistence",
     "ChatAgent usage: runtime.run(agent=ChatAgent, step=ChatAgent.process_message, spec=ChatSpec(user_message=text))",
@@ -61,19 +69,20 @@
     "PRE-BUILT AGENTS — ToolLoopAgent: single-turn LLM tool loop with hooks and caching",
     "ToolLoopAgent sends messages to LLM, executes tool calls, feeds results back, repeats until done",
     "ToolLoopConfig configures ToolLoopAgent: ToolLoopConfig(llm_config=LLMConfig(model='...'), cache_config=CacheConfig(...))",
-    "ToolLoopHooks provides lifecycle callbacks: on_user_message, on_start_iteration, on_before_llm_call, on_after_llm_call, on_before_tool_call, on_after_tool_call, on_result_message, on_thinking, on_text_reasoning, on_message_accepted",
+    "ToolLoopHooks provides lifecycle callbacks: on_user_message, on_message_accepted, on_start_iteration, on_before_llm_call, on_text_delta, on_thinking_delta, on_after_llm_call, on_text_reasoning, on_thinking, on_result_message, on_before_tool_call, on_after_tool_call",
+    "When on_text_delta or on_thinking_delta hooks are set, ToolLoopAgent automatically uses provider.stream() instead of provider.call()",
     "CACHING: CacheConfig(system_prompt, tools, messages) controls prompt caching strategies",
     "Predefined configs: CACHE_ALL, CACHE_SESSION, CACHE_MANUAL, NO_CACHE",
     "Cache strategies: cache_last_system_prompt, cache_last_tool, cache_last_message, cache_last_session_message, no_cache_*",
-    "CONFIG: LLMConfig(model, temperature, max_tokens, stop_sequences) configures LLM calls",
+    "CONFIG: LLMConfig(model, temperature, max_tokens, stop_sequences, additional_config) configures LLM calls",
     "ConfigValue[T] wraps static or dynamic (callable) config values: ConfigValue[str] | ConfigValue[Callable[[], str]]",
     "QUICK START: Create provider -> create ToolRegistry -> create Config -> create AgentRuntime -> runtime.run()",
     "Import ChatAgent: from flowra.lib.chat import ChatAgent, ChatConfig, ChatHooks, ChatResult, ChatSpec",
     "Import LLMConfig: from flowra.lib import LLMConfig",
-    "Import LLM types: from flowra.llm import LLMProvider, SystemMessage, TextBlock, Usage",
+    "Import LLM types: from flowra.llm import LLMProvider, SystemMessage, TextBlock, Usage, TextDelta, ThinkingDelta, ContentComplete",
     "Import runtime: from flowra.runtime import AgentRuntime, FileSessionStorage",
     "Import tools: from flowra.tools import ToolRegistry, get_local_tool, tool"
   ],

{flowra-0.0.1.dev2 → flowra-0.0.2.dev5}/docs/agent.md RENAMED Viewed

@@ -11,7 +11,8 @@ flowra/agent/
 ├── agent_def.py       # Control flow (Goto, Call, Spawn), type aliases, resolve functions
 ├── agent_store.py     # AgentStore (ABC) — flush interface
 ├── service_locator.py # ServiceLocator (ABC) — service provision and access
-├── interrupt_token.py # InterruptToken (ABC) — cooperative interrupt interface
+├── interrupt_token.py  # InterruptToken (ABC) — cooperative interrupt interface
+├── interrupt_helpers.py # with_interrupt() — race async iterators against InterruptToken
 ├── agent_registry.py  # AgentRegistry — hierarchical agent name/type resolution
 ├── stored_values.py   # Scalar[T], AppendOnlyList[T], slot() — dirty-tracked state containers
 └── compile.py         # compile_agent() — introspection, slot discovery, type registry
@@ -543,6 +544,26 @@ class MyAgent(Agent):
         # ... continue processing ...
 ```
+### `with_interrupt` — racing async iterators
+`with_interrupt` wraps any `AsyncIterator[T]` so it exits immediately when
+the token fires — even if `__anext__()` is blocked on I/O:
+```python
+from flowra.agent import InterruptToken, with_interrupt
+async def consume(stream: AsyncIterator[str], token: InterruptToken) -> list[str]:
+    items = []
+    async for item in with_interrupt(stream, token):
+        items.append(item)
+    return items  # partial results if interrupted
+```
+On each iteration, `__anext__()` and `token.wait()` are raced via
+`asyncio.wait(FIRST_COMPLETED)`. If the token wins, the underlying iterator
+is closed (`aclose()`) and the wrapper ends. This is used internally by
+`ToolLoopAgent` to interrupt LLM streaming immediately.
 ## Dependency injection
 ### Constructor injection

{flowra-0.0.1.dev2 → flowra-0.0.2.dev5}/docs/architecture.md RENAMED Viewed

@@ -20,7 +20,8 @@ No circular dependencies.
 ### `llm` — LLM abstraction
 Protocol layer between the SDK and any LLM. The core abstraction is `LLMProvider` —
-a single-method interface (`call(LLMRequest) → LLMResponse`). Request and response use
+`call(LLMRequest) → LLMResponse` for full responses, and `stream(LLMRequest) →
+AsyncIterator[StreamEvent]` for real-time text/thinking deltas. Request and response use
 a shared set of message and block types. Ships three providers:
 `AnthropicVertexProvider` (Claude via Vertex AI), `OpenAIProvider` (OpenAI-compatible APIs),
 `GoogleVertexProvider` (Gemini via Vertex AI). → [docs/llm.md](llm.md)
@@ -66,7 +67,7 @@ User message
     │
     ▼
 ChatAgent.process_message
-    │  Delegates to ToolLoopAgent via Call
+    │  Spawns ToolLoopAgent via Call inside Spawn
     ▼
 ToolLoopAgent.start
     │  Saves user message to turn messages

{flowra-0.0.1.dev2 → flowra-0.0.2.dev5}/docs/lib.md RENAMED Viewed

@@ -305,6 +305,8 @@ runtime = AgentRuntime(
 | `on_before_llm_call`   | `OnBeforeLLMCall \| OnBeforeLLMCallAsync \| None`              | `None`  |
 | `on_after_llm_call`    | `OnAfterLLMCall \| OnAfterLLMCallAsync \| None`                | `None`  |
 | `on_result_message`    | `OnResultMessage \| OnResultMessageAsync \| None`              | `None`  |
+| `on_text_delta`        | `OnTextDelta \| OnTextDeltaAsync \| None`                      | `None`  |
+| `on_thinking_delta`    | `OnThinkingDelta \| OnThinkingDeltaAsync \| None`              | `None`  |
 | `on_text_reasoning`    | `OnTextReasoning \| OnTextReasoningAsync \| None`              | `None`  |
 | `on_thinking`          | `OnThinking \| OnThinkingAsync \| None`                        | `None`  |
 | `on_before_tool_call`  | `OnBeforeToolCall \| OnBeforeToolCallAsync \| None`            | `None`  |
@@ -328,30 +330,37 @@ Hooks fire in this order during a single tool loop iteration:
 4. **`on_before_llm_call`** — before each LLM request. Receives `LLMRequest` and context.
    Observational only (no return value).
-5. **`on_after_llm_call`** — after each LLM response. Receives `LLMRequest`, `LLMResponse`,
+5. **`on_text_delta`** / **`on_thinking_delta`** — when either hook is set, the agent
+   uses `provider.stream()` instead of `provider.call()`. `on_text_delta` fires for
+   each incremental text chunk; `on_thinking_delta` fires for each thinking chunk.
+   These fire **during** the LLM call, before `on_after_llm_call`. The stream is
+   wrapped with `with_interrupt`, so an `InterruptToken` signal exits immediately —
+   even if the LLM is slow to produce the next token.
+6. **`on_after_llm_call`** — after each LLM response. Receives `LLMRequest`, `LLMResponse`,
    and context. Observational only.
-6. **`on_text_reasoning`** — for each `TextBlock` in the assistant response. Fires
-   regardless of stop reason — useful for streaming text output even when tool calls
+7. **`on_text_reasoning`** — for each `TextBlock` in the assistant response. Fires
+   regardless of stop reason — useful for observing text output even when tool calls
    are also present.
-7. **`on_thinking`** — for each `ThinkingBlock` in the assistant response. Fires
+8. **`on_thinking`** — for each `ThinkingBlock` in the assistant response. Fires
    for models with thinking/reasoning enabled (e.g. extended thinking).
 Then the flow branches based on stop reason:
 - **If `TOOL_USE`:**
-  8. **`on_before_tool_call`** — before each tool execution. Return `BeforeToolCallResult`
+  9. **`on_before_tool_call`** — before each tool execution. Return `BeforeToolCallResult`
      with `amended_tool_use` to modify tool parameters.
-  9. **`on_after_tool_call`** — after each tool execution. Return `AfterToolCallResult`
-     with `amended_result` and/or `additional_messages`.
+  10. **`on_after_tool_call`** — after each tool execution. Return `AfterToolCallResult`
+      with `amended_result` and/or `additional_messages`.
 - **If `END_TURN`:**
-  10. **`on_result_message`** — return `ResultMessageResult` with `continue_messages`
-     to force the loop to continue instead of finishing.
+  11. **`on_result_message`** — return `ResultMessageResult` with `continue_messages`
+      to force the loop to continue instead of finishing.
 ### Hook result types
@@ -515,6 +524,8 @@ OnMessageAccepted / OnMessageAcceptedAsync
 OnStartIteration / OnStartIterationAsync
 OnBeforeLLMCall / OnBeforeLLMCallAsync
 OnAfterLLMCall / OnAfterLLMCallAsync
+OnTextDelta / OnTextDeltaAsync
+OnThinkingDelta / OnThinkingDeltaAsync
 OnTextReasoning / OnTextReasoningAsync
 OnThinking / OnThinkingAsync
 OnResultMessage / OnResultMessageAsync
@@ -546,8 +557,9 @@ ChatAgent
 1. `start` — accepts user message, runs `on_user_message` hook, appends to
    `turn_messages`, flushes, fires `on_message_accepted`, then gotos `call_llm`.
 2. `call_llm` — checks interrupt/finish/max_iterations, runs `on_start_iteration`,
-   builds `LLMRequest`, runs `on_before_llm_call`, calls LLM, runs `on_after_llm_call`
-   and `on_text_reasoning`, then:
+   builds `LLMRequest`, runs `on_before_llm_call`, calls LLM (streaming deltas via
+   `on_text_delta`/`on_thinking_delta` if set), runs `on_after_llm_call`,
+   `on_text_reasoning`, and `on_thinking`, then:
    - `END_TURN` → runs `on_result_message`, returns `ToolLoopResult` (or continues
      if `continue_messages` is non-empty)
    - `TOOL_USE` → runs `on_before_tool_call` for each tool, spawns `ToolCallAgent`

{flowra-0.0.1.dev2 → flowra-0.0.2.dev5}/docs/llm.md RENAMED Viewed

@@ -11,7 +11,8 @@ flowra/llm/
 ├── tools.py           # Tool
 ├── request.py         # LLMRequest
 ├── response.py        # LLMResponse, StopReason, Usage
-├── provider.py        # LLMProvider (abc)
+├── provider.py        # LLMProvider (abc) — call() + stream()
+├── stream.py          # StreamEvent, TextDelta, ThinkingDelta, ContentComplete
 ├── schema_formatting.py  # JSON schema formatting for LLM prompts
 ├── schema_validation.py  # JSON schema validation and markdown stripping
 ├── pricing/
@@ -137,8 +138,8 @@ ToolResultBlock(tool_use_id="toolu_123", content="Division by zero", is_error=Tr
 ### `ThinkingBlock`
 Thinking/reasoning content from the LLM. Produced by models that support extended
-thinking (e.g. Google Gemini with thinking enabled). Not sent back to the API by
-any provider.
+thinking (e.g. Anthropic Claude with `thinking_budget_tokens`, Google Gemini with
+thinking enabled). Not sent back to the API by any provider.
 ```python
 # Usually not created manually — comes from AssistantMessage via LLM
@@ -346,12 +347,17 @@ if response.usage is not None:
 ### `LLMProvider`
-Abstract base class for calling an LLM. Defines a single `call()` method:
+Abstract base class for calling an LLM. Defines `call()` (abstract) and `stream()`
+(optional, with a default fallback):
 ```python
 class LLMProvider(abc.ABC):
     @abc.abstractmethod
     async def call(self, request: LLMRequest) -> LLMResponse: ...
+    async def stream(self, request: LLMRequest) -> AsyncIterator[StreamEvent]:
+        response = await self.call(request)
+        yield ContentComplete(response=response)
 ```
 The provider is responsible for converting `LLMRequest` into the target API's format,
@@ -359,6 +365,39 @@ calling the API, and converting the response back to `LLMResponse`. If `json_sch
 is set, the provider should also handle validation and retries (see
 [Structured output](#structured-output-json-schema)).
+#### Streaming
+`stream()` returns an `AsyncIterator[StreamEvent]` that yields incremental events
+as the LLM generates its response. The default implementation calls `call()` and
+yields a single `ContentComplete` event — providers override this for real-time streaming.
+**Stream events:**
+| Event             | Fields              | Description                              |
+|-------------------|---------------------|------------------------------------------|
+| `TextDelta`       | `text: str`         | Incremental text content                 |
+| `ThinkingDelta`   | `text: str`         | Incremental thinking/reasoning content   |
+| `ContentComplete` | `response: LLMResponse` | Always last — full response          |
+`ContentComplete` is always the final event and contains the same `LLMResponse` you
+would get from `call()`.
+```python
+async for event in provider.stream(request):
+    match event:
+        case TextDelta(text=text):
+            print(text, end="", flush=True)
+        case ThinkingDelta(text=text):
+            print(f"[thinking] {text}", end="")
+        case ContentComplete(response=response):
+            print()  # newline after streaming
+            # response.message, response.usage, etc. are available here
+```
+All three built-in providers implement `stream()` with real-time deltas.
+`AnthropicVertexProvider` falls back to non-streaming when `json_schema` is set
+(the retry loop requires full responses).
 ### `AnthropicVertexProvider`
 Implementation for Claude via Vertex AI.
@@ -492,6 +531,42 @@ on failure. Used internally by `AnthropicVertexProvider` for structured output r
 Internally, `strip_markdown_code_block(text)` removes surrounding markdown code fences
 (`` ```...``` ``) before parsing. This is an implementation detail, not part of the public API.
+#### `AnthropicVertexAdditionalConfig`
+Provider-specific configuration passed via `LLMRequest.additional_config`:
+| Field                   | Type           | Default | Description                                     |
+|-------------------------|----------------|---------|-------------------------------------------------|
+| `thinking_budget_tokens` | `int \| None` | `None`  | Token budget for extended thinking (enables thinking mode) |
+When `thinking_budget_tokens` is set, the provider passes `thinking: {"type": "enabled",
+"budget_tokens": N}` to the API and forces `temperature=1` (Anthropic requirement for
+thinking mode). The response will contain `ThinkingBlock` blocks with the model's
+chain-of-thought reasoning.
+```python
+from flowra.llm.providers.anthropic_vertex import AnthropicVertexAdditionalConfig
+response = await provider.call(
+    LLMRequest(
+        model="claude-sonnet-4-5@20250929",
+        messages=[UserMessage(blocks=[TextBlock(text="Solve this step by step...")])],
+        max_tokens=8192,
+        additional_config={"thinking_budget_tokens": 4000},
+    )
+)
+# response.message.blocks may contain ThinkingBlock + TextBlock
+for block in response.message.blocks:
+    if isinstance(block, ThinkingBlock):
+        print(f"[thinking] {block.text}")
+    elif isinstance(block, TextBlock):
+        print(block.text)
+```
+**Note:** Streaming + `json_schema` is not supported with Anthropic — `stream()` falls
+back to non-streaming in that case.
 ### `OpenAIProvider`
 Implementation for OpenAI-compatible APIs (OpenAI, Inception AI, etc.).
@@ -573,21 +648,37 @@ removes `additionalProperties`, converts type arrays to `anyOf`).
 Provider-specific configuration passed via `LLMRequest.additional_config`:
-| Field            | Type                                | Default | Description                          |
-|------------------|-------------------------------------|---------|--------------------------------------|
-| `thinking_level` | `genai_types.ThinkingLevel \| None` | `None`  | Thinking level for extended thinking |
+| Field              | Type                                | Default | Description                                                    |
+|--------------------|-------------------------------------|---------|----------------------------------------------------------------|
+| `thinking_level`   | `genai_types.ThinkingLevel \| None` | `None`  | Thinking level (MINIMAL, LOW, MEDIUM, HIGH) — for Gemini 3     |
+| `thinking_budget`  | `int \| None`                       | `None`  | Token budget for thinking — for Gemini 2.5 (min 128 for Pro)   |
+Either field (or both) enables thinking mode. `thinking_level` controls reasoning
+depth for Gemini 3 models. `thinking_budget` sets a token budget for Gemini 2.5
+models (setting to 0 disables thinking on Flash; minimum 128 on Pro).
 ```python
 from flowra.llm.providers.google_vertex import GoogleVertexAdditionalConfig
+# Gemini 3 — thinking level
 response = await provider.call(
     LLMRequest(
-        model="gemini-2.5-pro",
+        model="gemini-3-pro-preview",
         messages=[UserMessage(blocks=[TextBlock(text="Solve this step by step...")])],
         max_tokens=4096,
         additional_config={"thinking_level": "medium"},
     )
 )
+# Gemini 2.5 — thinking budget
+response = await provider.call(
+    LLMRequest(
+        model="gemini-2.5-pro",
+        messages=[UserMessage(blocks=[TextBlock(text="Solve this step by step...")])],
+        max_tokens=4096,
+        additional_config={"thinking_budget": 4096},
+    )
+)
 ```
 ### Adding a new provider

flowra-0.0.2.dev5/docs/research/strands_comparison.md ADDED Viewed

@@ -0,0 +1,46 @@
+# Strands Agents SDK vs Flowra — Comparison (March 2026)
+Research date: 2026-03-08
+## What Strands has that Flowra doesn't
+| Capability | Strands | Flowra | Priority |
+|---|---|---|---|
+| **Streaming** | Full event streaming — each agent step streams to client | `LLMProvider.stream()` with `TextDelta`/`ThinkingDelta`/`ContentComplete` events; `ToolLoopAgent` auto-switches when delta hooks are set | — (implemented) |
+| **Observability (OpenTelemetry)** | Built-in traces, metrics, export to X-Ray/CloudWatch/Jaeger | No — only manual hooks in examples (`on_before_llm_call` etc.) | **High** — necessary for production |
+| **Multi-agent patterns (Swarm, Graph, Workflow)** | Built-in coordinators: orchestrator-worker, peer swarm, DAG graph with auto-parallelization | `Spawn` (parallel children) and `Call` — but no ready-made abstractions like "agent graph" or "swarm" | Medium — our primitives allow building this, but nothing ready-made |
+| **More providers out of the box** | Bedrock, Anthropic, OpenAI, Gemini, Ollama, LiteLLM, llama.cpp | Anthropic Vertex, OpenAI, Google Vertex | Medium — LiteLLM adapter would cover everything |
+| **Guardrails** | Integration with Amazon Bedrock Guardrails — content filtering, topic blocking, PII protection | None | Medium — depends on use case |
+| **A2A (Agent-to-Agent) protocol** | Agents communicate across processes/services via standard protocol | No — agents only within a single runtime | Low for now — relevant for distributed systems |
+| **TypeScript SDK** | Yes (preview) | No | Low — we are a Python library |
+| **Session management with external stores** | DynamoDB, Bedrock AgentCore Memory, custom | InMemory, File, custom (but no ready-made cloud DB adapters) | Medium |
+| **"Agents as tools"** | An agent can be a tool of another agent directly | Requires manual wrapping | Low — `Call`/`Spawn` solve this differently |
+## What Flowra has that Strands doesn't (or weaker)
+| Capability | Flowra | Strands |
+|---|---|---|
+| **Crash recovery** | Full: persistence after each step, resume after crash | Session persistence exists, but step-level crash recovery — no |
+| **Incremental dirty-tracking** | `Scalar[T]` and `AppendOnlyList[T]` save only changes | Saves state entirely |
+| **State machine with compile-time checks** | `@step`, `Goto`, `Spawn`, `Call` — compiler checks slots and types at class definition time | Model-driven loop — no explicit state machine |
+| **Cooperative interrupts** | `InterruptToken` propagates through entire execution tree | No equivalent |
+| **DI into tool handlers** | `ToolService` marker — services injected into tool functions | Tools receive only tool input |
+## Priority action items
+1. ~~**Streaming** — most visible gap for user experience.~~ ✅ Implemented: `LLMProvider.stream()`, `on_text_delta`/`on_thinking_delta` hooks, TUI/console examples support streaming.
+2. **Observability** — at minimum OpenTelemetry spans for LLM calls and tool execution. Our hooks are a good foundation.
+3. **More providers** — Ollama/LiteLLM adapter would be useful for local development.
+4. **Ready-made multi-agent patterns** — Graph/Workflow on top of our primitives.
+## Sources
+- [Introducing Strands Agents (AWS Blog)](https://aws.amazon.com/blogs/opensource/introducing-strands-agents-an-open-source-ai-agents-sdk/)
+- [Strands Agents Documentation](https://strandsagents.com/latest/documentation/docs/)
+- [Technical Deep Dive (AWS Blog)](https://aws.amazon.com/blogs/machine-learning/strands-agents-sdk-a-technical-deep-dive-into-agent-architectures-and-observability/)
+- [Multi-Agent Patterns](https://dev.to/aws-builders/understanding-multi-agent-patterns-in-strands-agent-graph-swarm-and-workflow-4nb8)
+- [Session Management](https://strandsagents.com/latest/documentation/docs/user-guide/concepts/agents/session-management/)
+- [A2A Protocol](https://strandsagents.com/latest/documentation/docs/user-guide/concepts/multi-agent/agent-to-agent/)
+- [Guardrails](https://strandsagents.com/latest/documentation/docs/user-guide/safety-security/guardrails/)
+- [Streaming](https://strandsagents.com/latest/documentation/docs/user-guide/concepts/streaming/)
+- [GitHub - sdk-python](https://github.com/strands-agents/sdk-python)

{flowra-0.0.1.dev2 → flowra-0.0.2.dev5}/examples/console_chat.py RENAMED Viewed

@@ -27,7 +27,7 @@ from examples.model_registry import DEFAULT_MODEL, create_router
 from examples.tools import calculate, random_numbers
 from examples.tools.switch_model import create_switch_model_tool
 from flowra.lib.chat import ChatResult, ChatSpec
-from flowra.lib.tool_loop import ToolLoopHooks
+from flowra.lib.tool_loop import ToolLoopAgentContext, ToolLoopHooks
 from flowra.llm import LLMProvider, SystemMessage, TextBlock, Usage
 from flowra.runtime import AgentRuntime, FileSessionStorage
 from flowra.tools import ToolRegistry, get_local_tool
@@ -128,6 +128,7 @@ async def main() -> None:
     parser.add_argument("--model", default=DEFAULT_MODEL, help="Model key (e.g. anthropic/sonnet)")
     parser.add_argument("--resume", metavar="SESSION_ID", help="Resume a session ('last' for most recent)")
     parser.add_argument("--input", metavar="MESSAGE", help="Send a single message and exit (batch mode)")
+    parser.add_argument("--stream", action="store_true", help="Enable streaming (print text as it arrives)")
     args = parser.parse_args()
     session_id: str | None = None
@@ -160,9 +161,17 @@ async def main() -> None:
             ],
         )
+        def on_text_delta(text: str, context: ToolLoopAgentContext) -> None:
+            print(text, end="", flush=True)
+        def on_thinking_delta(text: str, context: ToolLoopAgentContext) -> None:
+            print(f"\033[2m{text}\033[0m", end="", flush=True)
         hooks = ToolLoopHooks(
             on_before_llm_call=log_before_llm_call,
             on_after_llm_call=log_after_llm_call,
+            on_text_delta=on_text_delta if args.stream else None,
+            on_thinking_delta=on_thinking_delta if args.stream else None,
         )
         storage = FileSessionStorage(base_dir=_SESSION_DIR, session_id=session_id)
         runtime = AgentRuntime(

{flowra-0.0.1.dev2 → flowra-0.0.2.dev5}/examples/llm_routing.py RENAMED Viewed

@@ -1,9 +1,10 @@
 """LLM router for chat examples — routes requests to providers by model key."""
 import dataclasses
+from collections.abc import AsyncIterator
 from typing import Any
-from flowra.llm import LLMProvider, LLMRequest, LLMResponse
+from flowra.llm import LLMProvider, LLMRequest, LLMResponse, StreamEvent
 __all__ = ["ChatLLMRouter", "ModelEntry"]
@@ -25,11 +26,20 @@ class ChatLLMRouter(LLMProvider):
     def available_models(self) -> list[str]:
         return sorted(self.__models)
-    async def call(self, request: LLMRequest) -> LLMResponse:
+    def __resolve(self, request: LLMRequest) -> tuple[LLMProvider, LLMRequest]:
         entry = self.__models[request.model]
         actual_request = dataclasses.replace(
             request,
             model=entry.model_id,
             additional_config={**entry.additional_config, **request.additional_config},
         )
-        return await entry.provider.call(actual_request)
+        return entry.provider, actual_request
+    async def call(self, request: LLMRequest) -> LLMResponse:
+        provider, actual_request = self.__resolve(request)
+        return await provider.call(actual_request)
+    async def stream(self, request: LLMRequest) -> AsyncIterator[StreamEvent]:
+        provider, actual_request = self.__resolve(request)
+        async for event in provider.stream(actual_request):
+            yield event

{flowra-0.0.1.dev2 → flowra-0.0.2.dev5}/examples/menu_agent.py RENAMED Viewed

@@ -5,7 +5,7 @@ Agents contain NO I/O — each step returns a prompt or result.
 The single input loop in main() handles all user interaction.
 Usage:
-    uv run python -m examples.menu_agent
+    uv run python examples/menu_agent.py
 """
 import asyncio

{flowra-0.0.1.dev2 → flowra-0.0.2.dev5}/examples/menu_agent_class.py RENAMED Viewed

@@ -4,7 +4,7 @@ Same menu/calc/echo demo as menu_agent.py, but using the Agent base class
 with type-based references and direct Goto/Call/Spawn constructors.
 Usage:
-    uv run python -m examples.menu_agent_class
+    uv run python examples/menu_agent_class.py
 """
 import asyncio

{flowra-0.0.1.dev2 → flowra-0.0.2.dev5}/examples/model_registry.py RENAMED Viewed

@@ -56,6 +56,11 @@ def create_router() -> ChatLLMRouter:
         anthropic_provider = AnthropicVertexProvider(project=project, location=location, credentials=credentials_b64)
         models["anthropic/sonnet-4-5"] = ModelEntry(provider=anthropic_provider, model_id="claude-sonnet-4-5@20250929")
+        models["anthropic/sonnet-4-5-think"] = ModelEntry(
+            provider=anthropic_provider,
+            model_id="claude-sonnet-4-5@20250929",
+            additional_config={"thinking_budget_tokens": 4000},
+        )
         models["anthropic/haiku-4-5"] = ModelEntry(provider=anthropic_provider, model_id="claude-haiku-4-5@20251001")
     # --- OpenAI ---

flowra 0.0.1.dev2__tar.gz → 0.0.2.dev5__tar.gz

flowra 0.0.1.dev2tar.gz → 0.0.2.dev5tar.gz