penguiflow 1.0.2__tar.gz → 2.0.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.


This version of penguiflow might be problematic. Click here for more details.

Files changed (42) hide show
  1. {penguiflow-1.0.2 → penguiflow-2.0.0}/PKG-INFO +181 -19
  2. {penguiflow-1.0.2 → penguiflow-2.0.0}/README.md +177 -17
  3. {penguiflow-1.0.2 → penguiflow-2.0.0}/penguiflow/__init__.py +27 -2
  4. penguiflow-2.0.0/penguiflow/core.py +1284 -0
  5. penguiflow-2.0.0/penguiflow/errors.py +113 -0
  6. penguiflow-2.0.0/penguiflow/metrics.py +105 -0
  7. penguiflow-2.0.0/penguiflow/middlewares.py +16 -0
  8. {penguiflow-1.0.2 → penguiflow-2.0.0}/penguiflow/patterns.py +47 -5
  9. penguiflow-2.0.0/penguiflow/policies.py +149 -0
  10. penguiflow-2.0.0/penguiflow/streaming.py +142 -0
  11. penguiflow-2.0.0/penguiflow/testkit.py +269 -0
  12. {penguiflow-1.0.2 → penguiflow-2.0.0}/penguiflow/types.py +15 -1
  13. penguiflow-2.0.0/penguiflow/viz.py +185 -0
  14. {penguiflow-1.0.2 → penguiflow-2.0.0}/penguiflow.egg-info/PKG-INFO +181 -19
  15. {penguiflow-1.0.2 → penguiflow-2.0.0}/penguiflow.egg-info/SOURCES.txt +17 -1
  16. {penguiflow-1.0.2 → penguiflow-2.0.0}/penguiflow.egg-info/requires.txt +2 -0
  17. {penguiflow-1.0.2 → penguiflow-2.0.0}/pyproject.toml +27 -4
  18. penguiflow-2.0.0/tests/test_budgets.py +149 -0
  19. penguiflow-2.0.0/tests/test_cancel.py +168 -0
  20. {penguiflow-1.0.2 → penguiflow-2.0.0}/tests/test_controller.py +38 -0
  21. {penguiflow-1.0.2 → penguiflow-2.0.0}/tests/test_core.py +217 -2
  22. penguiflow-2.0.0/tests/test_errors.py +106 -0
  23. penguiflow-2.0.0/tests/test_metadata.py +97 -0
  24. penguiflow-2.0.0/tests/test_metrics.py +41 -0
  25. penguiflow-2.0.0/tests/test_middlewares.py +187 -0
  26. penguiflow-2.0.0/tests/test_node.py +96 -0
  27. penguiflow-2.0.0/tests/test_routing_policy.py +128 -0
  28. penguiflow-2.0.0/tests/test_streaming.py +271 -0
  29. penguiflow-2.0.0/tests/test_testkit.py +92 -0
  30. penguiflow-2.0.0/tests/test_viz.py +43 -0
  31. penguiflow-1.0.2/penguiflow/core.py +0 -609
  32. penguiflow-1.0.2/penguiflow/middlewares.py +0 -17
  33. penguiflow-1.0.2/penguiflow/viz.py +0 -5
  34. {penguiflow-1.0.2 → penguiflow-2.0.0}/LICENSE +0 -0
  35. {penguiflow-1.0.2 → penguiflow-2.0.0}/penguiflow/node.py +0 -0
  36. {penguiflow-1.0.2 → penguiflow-2.0.0}/penguiflow/registry.py +0 -0
  37. {penguiflow-1.0.2 → penguiflow-2.0.0}/penguiflow.egg-info/dependency_links.txt +0 -0
  38. {penguiflow-1.0.2 → penguiflow-2.0.0}/penguiflow.egg-info/top_level.txt +0 -0
  39. {penguiflow-1.0.2 → penguiflow-2.0.0}/setup.cfg +0 -0
  40. {penguiflow-1.0.2 → penguiflow-2.0.0}/tests/test_patterns.py +0 -0
  41. {penguiflow-1.0.2 → penguiflow-2.0.0}/tests/test_registry.py +0 -0
  42. {penguiflow-1.0.2 → penguiflow-2.0.0}/tests/test_types.py +0 -0
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: penguiflow
3
- Version: 1.0.2
3
+ Version: 2.0.0
4
4
  Summary: Async agent orchestration primitives.
5
5
  Author: PenguiFlow Team
6
6
  License: MIT License
@@ -26,7 +26,7 @@ License: MIT License
26
26
  SOFTWARE.
27
27
 
28
28
  Project-URL: Homepage, https://github.com/penguiflow/penguiflow
29
- Requires-Python: >=3.12
29
+ Requires-Python: >=3.11
30
30
  Description-Content-Type: text/markdown
31
31
  License-File: LICENSE
32
32
  Requires-Dist: pydantic>=2.6
@@ -34,6 +34,8 @@ Provides-Extra: dev
34
34
  Requires-Dist: mypy>=1.8; extra == "dev"
35
35
  Requires-Dist: pytest>=7.4; extra == "dev"
36
36
  Requires-Dist: pytest-asyncio>=0.23; extra == "dev"
37
+ Requires-Dist: pytest-cov>=4.0; extra == "dev"
38
+ Requires-Dist: coverage[toml]>=7.0; extra == "dev"
37
39
  Requires-Dist: ruff>=0.2; extra == "dev"
38
40
  Dynamic: license-file
39
41
 
@@ -43,6 +45,21 @@ Dynamic: license-file
43
45
  <img src="asset/Penguiflow.png" alt="PenguiFlow logo" width="220">
44
46
  </p>
45
47
 
48
+ <p align="center">
49
+ <a href="https://github.com/penguiflow/penguiflow/actions/workflows/ci.yml">
50
+ <img src="https://github.com/penguiflow/penguiflow/actions/workflows/ci.yml/badge.svg" alt="CI Status">
51
+ </a>
52
+ <a href="https://github.com/penguiflow/penguiflow">
53
+ <img src="https://img.shields.io/badge/coverage-85%25-brightgreen" alt="Coverage">
54
+ </a>
55
+ <a href="https://pypi.org/project/penguiflow/">
56
+ <img src="https://img.shields.io/pypi/v/penguiflow.svg" alt="PyPI version">
57
+ </a>
58
+ <a href="https://github.com/penguiflow/penguiflow/blob/main/LICENSE">
59
+ <img src="https://img.shields.io/badge/license-MIT-blue.svg" alt="License">
60
+ </a>
61
+ </p>
62
+
46
63
  **Async-first orchestration library for multi-agent and data pipelines**
47
64
 
48
65
  PenguiFlow is a **lightweight Python library** to orchestrate agent flows.
@@ -52,8 +69,14 @@ It provides:
52
69
  * **Concurrent fan-out / fan-in patterns**
53
70
  * **Routing & decision points**
54
71
  * **Retries, timeouts, backpressure**
72
+ * **Streaming chunks** (LLM-style token emission with `Context.emit_chunk`)
55
73
  * **Dynamic loops** (controller nodes)
56
74
  * **Runtime playbooks** (callable subflows with shared metadata)
75
+ * **Per-trace cancellation** (`PenguiFlow.cancel` with `TraceCancelled` surfacing in nodes)
76
+ * **Deadlines & budgets** (`Message.deadline_s`, `WM.budget_hops`, and `WM.budget_tokens` guardrails that you can leave unset/`None`)
77
+ * **Observability hooks** (`FlowEvent` callbacks for logging, MLflow, or custom metrics sinks)
78
+ * **Policy-driven routing** (optional policies steer routers without breaking existing flows)
79
+ * **Traceable exceptions** (`FlowError` captures node/trace metadata and optionally emits to Rookery)
57
80
 
58
81
  Built on pure `asyncio` (no threads), PenguiFlow is small, predictable, and repo-agnostic.
59
82
  Product repos only define **their models + node functions** — the core stays dependency-light.
@@ -86,6 +109,7 @@ msg = Message(
86
109
  payload=QueryIn(text="unique reach last 30 days"),
87
110
  headers=Headers(tenant="acme")
88
111
  )
112
+ msg.meta["request_id"] = "abc123"
89
113
  ```
90
114
 
91
115
  ### Node
@@ -99,12 +123,18 @@ from penguiflow.node import Node
99
123
  class QueryOut(BaseModel):
100
124
  topic: str
101
125
 
102
- async def triage(m: QueryIn) -> QueryOut:
126
+ async def triage(msg: QueryIn, ctx) -> QueryOut:
103
127
  return QueryOut(topic="metrics")
104
128
 
105
129
  triage_node = Node(triage, name="triage")
106
130
  ```
107
131
 
132
+ Node functions must always accept **two positional parameters**: the incoming payload and
133
+ the `Context` object. If a node does not use the context, name it `_` or `_ctx`, but keep
134
+ the parameter so the runtime can still inject it. Registering the node with
135
+ `ModelRegistry` ensures the payload is validated/cast to the expected Pydantic model;
136
+ setting `NodePolicy(validate="none")` skips that validation for hot paths.
137
+
108
138
  ### Flow
109
139
 
110
140
  A flow wires nodes together in a directed graph.
@@ -178,13 +208,77 @@ await flow.stop()
178
208
  pip install -e ./penguiflow
179
209
  ```
180
210
 
181
- Requires **Python 3.12+**.
211
+ Requires **Python 3.11+**.
212
+
213
+ ---
214
+
215
+ ## 🛠️ Key capabilities
216
+
217
+ ### Streaming & incremental delivery
218
+
219
+ `Context.emit_chunk` (and `PenguiFlow.emit_chunk`) provide token-level streaming without
220
+ sacrificing backpressure or ordering guarantees. The helper wraps the payload in a
221
+ `StreamChunk`, mirrors routing metadata from the parent message, and automatically
222
+ increments per-stream sequence numbers. See `tests/test_streaming.py` and
223
+ `examples/streaming_llm/` for an end-to-end walk-through.
224
+
225
+ ### Reliability & guardrails
226
+
227
+ PenguiFlow enforces reliability boundaries out of the box:
228
+
229
+ * **Per-trace cancellation** (`PenguiFlow.cancel(trace_id)`) unwinds a single run while
230
+ other traces keep executing. Worker tasks observe `TraceCancelled` and clean up
231
+ resources; `tests/test_cancel.py` covers the behaviour.
232
+ * **Deadlines & budgets** let you keep loops honest. `Message.deadline_s` guards
233
+ wall-clock execution, while controller payloads (`WM`) track hop and token budgets.
234
+ Exhaustion short-circuits into terminal `FinalAnswer` messages as demonstrated in
235
+ `tests/test_budgets.py` and `examples/controller_multihop/`.
236
+ * **Retries & timeouts** live in `NodePolicy`. Exponential backoff, timeout enforcement,
237
+ and structured retry events are exercised heavily in the core test suite.
238
+
239
+ ### Metadata & observability
240
+
241
+ Every `Message` carries a mutable `meta` dictionary so nodes can propagate debugging
242
+ breadcrumbs, billing information, or routing hints without touching the payload. The
243
+ runtime clones metadata during streaming and playbook calls (`tests/test_metadata.py`).
244
+ Structured runtime events surface through `FlowEvent` objects; attach middlewares for
245
+ custom logging or metrics ingestion (`examples/mlflow_metrics/`).
246
+
247
+ ### Routing & dynamic policies
248
+
249
+ Branching flows stay flexible thanks to routers and optional policies. The
250
+ `predicate_router` and `union_router` helpers can consult a `RoutingPolicy` at runtime to
251
+ override or drop successors, while `DictRoutingPolicy` provides a config-driven
252
+ implementation ready for JSON/YAML/env inputs (`tests/test_routing_policy.py`,
253
+ `examples/routing_policy/`).
254
+
255
+ ### Traceable exceptions
256
+
257
+ When retries are exhausted or timeouts fire, PenguiFlow wraps the failure in a
258
+ `FlowError` that preserves the trace id, node metadata, and a stable error code.
259
+ Opt into `emit_errors_to_rookery=True` to receive these objects directly from
260
+ `flow.fetch()`—see `tests/test_errors.py` and `examples/traceable_errors/` for usage.
261
+
262
+ ### FlowTestKit
263
+
264
+ The new `penguiflow.testkit` module keeps unit tests tiny:
265
+
266
+ * `await testkit.run_one(flow, message)` boots a flow, emits a message, captures runtime
267
+ events, and returns the first Rookery payload.
268
+ * `testkit.assert_node_sequence(trace_id, [...])` asserts the order in which nodes ran.
269
+ * `testkit.simulate_error(...)` builds coroutine helpers that fail a configurable number
270
+ of times—perfect for retry scenarios.
271
+
272
+ The harness is covered by `tests/test_testkit.py` and demonstrated in
273
+ `examples/testkit_demo/`.
274
+
182
275
 
183
276
  ## 🧭 Repo Structure
184
277
 
185
278
  penguiflow/
186
279
  __init__.py
187
280
  core.py # runtime orchestrator, retries, controller helpers, playbooks
281
+ errors.py # FlowError / FlowErrorCode definitions
188
282
  node.py
189
283
  types.py
190
284
  registry.py
@@ -273,18 +367,46 @@ stitched directly into a flow adjacency list:
273
367
 
274
368
  - `map_concurrent(items, worker, max_concurrency=8)` — fan a single message out into
275
369
  many in-memory tasks (e.g., batch document enrichment) while respecting a semaphore.
276
- - `predicate_router(name, mapping)` — route messages to successor nodes based on simple
277
- boolean functions over payload or headers. Perfect for guardrails or conditional
278
- tool invocation without building a full controller.
370
+ - `predicate_router(name, predicate, policy=None)` — route messages to successor nodes
371
+ based on simple boolean functions over payload or headers, optionally consulting a
372
+ runtime `policy` to override or filter the computed targets. Perfect for guardrails or
373
+ conditional tool invocation without rebuilding the flow.
279
374
  - `union_router(name, discriminated_model)` — accept a Pydantic discriminated union and
280
375
  forward each variant to the matching typed successor node. Keeps type-safety even when
281
376
  multiple schema branches exist.
282
377
  - `join_k(name, k)` — aggregate `k` messages per `trace_id` before resuming downstream
283
378
  work. Useful for fan-out/fan-in batching, map-reduce style summarization, or consensus.
379
+ - `DictRoutingPolicy(mapping, key_getter=None)` — load routing overrides from
380
+ configuration and pair it with the router helpers via `policy=...` to switch routing at
381
+ runtime without modifying the flow graph.
284
382
 
285
383
  All helpers are regular `Node` instances under the hood, so they inherit retries,
286
384
  timeouts, and validation just like hand-written nodes.
287
385
 
386
+ ### Streaming Responses
387
+
388
+ PenguiFlow now supports **LLM-style streaming** with the `StreamChunk` model. Each
389
+ chunk carries `stream_id`, `seq`, `text`, optional `meta`, and a `done` flag. Use
390
+ `Context.emit_chunk(parent=message, text=..., done=...)` inside a node (or the
391
+ convenience wrapper `await flow.emit_chunk(...)` from outside a node) to push
392
+ chunks downstream without manually crafting `Message` envelopes:
393
+
394
+ ```python
395
+ await ctx.emit_chunk(parent=msg, text=token, done=done)
396
+ ```
397
+
398
+ - Sequence numbers auto-increment per `stream_id` (defaults to the parent trace).
399
+ - Backpressure is preserved; if the downstream queue is full the helper awaits just
400
+ like `Context.emit`.
401
+ - When `done=True`, the sequence counter resets so a new stream can reuse the same id.
402
+
403
+ Pair the producer with a sink node that consumes `StreamChunk` payloads and assembles
404
+ the final result when `done` is observed. See `examples/streaming_llm/` for a complete
405
+ mock LLM → SSE pipeline. For presentation layers, utilities like
406
+ `format_sse_event(chunk)` and `chunk_to_ws_json(chunk)` (both exported from the
407
+ package) will convert a `StreamChunk` into SSE-compatible text or WebSocket JSON payloads
408
+ without boilerplate.
409
+
288
410
  ### Dynamic Controller Loops
289
411
 
290
412
  Long-running agents often need to **think, plan, and act over multiple hops**. PenguiFlow
@@ -306,20 +428,21 @@ easy to surface guardrails to downstream consumers.
306
428
  ### Playbooks & Subflows
307
429
 
308
430
  Sometimes a controller or router needs to execute a **mini flow** — for example,
309
- retrieval → rerank → compress — without polluting the global topology. `call_playbook`
310
- spawns a brand-new `PenguiFlow` on demand and wires it into the parent message context:
431
+ retrieval → rerank → compress — without polluting the global topology.
432
+ `Context.call_playbook` spawns a brand-new `PenguiFlow` on demand and wires it into
433
+ the parent message context:
311
434
 
312
435
  - Trace IDs and headers are reused so observability stays intact.
313
- - The helper respects optional timeouts and always stops the subflow (even on cancel).
436
+ - The helper respects optional timeouts, mirrors cancellation to the subflow, and always
437
+ stops it (even on cancel).
314
438
  - The first payload emitted to the playbook's Rookery is returned to the caller,
315
439
  allowing you to treat subflows as normal async functions.
316
440
 
317
441
  ```python
318
- from penguiflow import call_playbook
319
442
  from penguiflow.types import Message
320
443
 
321
444
  async def controller(msg: Message, ctx) -> Message:
322
- playbook_result = await call_playbook(build_retrieval_playbook, msg)
445
+ playbook_result = await ctx.call_playbook(build_retrieval_playbook, msg)
323
446
  return msg.model_copy(update={"payload": playbook_result})
324
447
  ```
325
448
 
@@ -328,12 +451,36 @@ flow focused on high-level orchestration logic.
328
451
 
329
452
  ---
330
453
 
454
+ ### Visualization
455
+
456
+ Need a quick view of the flow topology? Call `flow_to_mermaid(flow)` to render the graph
457
+ as a Mermaid diagram ready for Markdown or docs tools, or `flow_to_dot(flow)` for a
458
+ Graphviz-friendly definition. Both outputs annotate controller loops and the synthetic
459
+ OpenSea/Rookery boundaries so you can spot ingress/egress paths at a glance:
460
+
461
+ ```python
462
+ from penguiflow import flow_to_dot, flow_to_mermaid
463
+
464
+ print(flow_to_mermaid(flow, direction="LR"))
465
+ print(flow_to_dot(flow, rankdir="LR"))
466
+ ```
467
+
468
+ See `examples/visualizer/` for a runnable script that exports Markdown and DOT files for
469
+ docs or diagramming pipelines.
470
+
471
+ ---
472
+
331
473
  ## 🛡️ Reliability & Observability
332
474
 
333
475
  * **NodePolicy**: set validation scope plus per-node timeout, retries, and backoff curves.
334
- * **Structured logs**: enrich every node event with `{ts, trace_id, node_name, event, latency_ms, q_depth_in, attempt}`.
335
- * **Middleware hooks**: subscribe observers (e.g., MLflow) to the structured event stream.
336
- * See `examples/reliability_middleware/` for a concrete timeout + retry walkthrough.
476
+ * **Per-trace metrics**: cancellation events include `trace_pending`, `trace_inflight`,
477
+ `q_depth_in`, `q_depth_out`, and node fan-out counts for richer observability.
478
+ * **Structured `FlowEvent`s**: every node event carries `{ts, trace_id, node_name, event,
479
+ latency_ms, q_depth_in, q_depth_out, attempt}` plus a mutable `extra` map for custom
480
+ annotations.
481
+ * **Middleware hooks**: subscribe observers (e.g., MLflow) to the structured `FlowEvent`
482
+ stream. See `examples/mlflow_metrics/` for an MLflow integration and
483
+ `examples/reliability_middleware/` for a concrete timeout + retry walkthrough.
337
484
 
338
485
  ---
339
486
 
@@ -341,15 +488,25 @@ flow focused on high-level orchestration logic.
341
488
 
342
489
  - **In-process runtime**: there is no built-in distribution layer yet. Long-running CPU work should be delegated to your own pools or services.
343
490
  - **Registry-driven typing**: nodes default to validation. Provide a `ModelRegistry` when calling `flow.run(...)` or set `validate="none"` explicitly for untyped hops.
344
- - **Observability**: structured logs + middleware hooks are available, but integrations with third-party stacks (OTel, Prometheus) are DIY for now.
345
- - **Roadmap**: v2 targets streaming, distributed backends, richer observability, and test harnesses. Contributions and proposals are welcome!
491
+ - **Observability**: structured `FlowEvent` callbacks power logs/metrics; integrations with
492
+ third-party stacks (OTel, Prometheus, Datadog) remain DIY. See the MLflow middleware
493
+ example for a lightweight pattern.
494
+ - **Roadmap**: follow-up releases focus on optional distributed backends, deeper observability integrations, and additional playbook patterns. Contributions and proposals are welcome!
495
+
496
+ ---
497
+
498
+ ## 📊 Benchmarks
499
+
500
+ Lightweight benchmarks live under `benchmarks/`. Run them via `uv run python benchmarks/<name>.py`
501
+ to capture baselines for fan-out throughput, retry/timeout overhead, and controller
502
+ playbook latency. Copy them into product repos to watch for regressions over time.
346
503
 
347
504
  ---
348
505
 
349
506
  ## 🔮 Roadmap
350
507
 
351
- * **v1 (current)**: safe core runtime, type-safety, retries, timeouts, routing, controller loops, playbooks via examples.
352
- * **v2 (future)**: streaming support, per-trace cancel, deadlines/budgets, observability hooks, visualizer, testing harness.
508
+ * **v2 (current)**: streaming, per-trace cancellation, deadlines/budgets, metadata propagation, observability hooks, visualizer, routing policies, traceable errors, and FlowTestKit.
509
+ * **Future**: optional distributed runners, richer third-party observability adapters, and opinionated playbook templates.
353
510
 
354
511
  ---
355
512
 
@@ -383,7 +540,12 @@ pytest -q
383
540
  * `examples/map_concurrent/`: bounded fan-out work inside a node.
384
541
  * `examples/controller_multihop/`: dynamic multi-hop agent loop.
385
542
  * `examples/reliability_middleware/`: retries, timeouts, and middleware hooks.
543
+ * `examples/mlflow_metrics/`: structured `FlowEvent` export to MLflow (stdout fallback).
386
544
  * `examples/playbook_retrieval/`: retrieval → rerank → compress playbook.
545
+ * `examples/trace_cancel/`: per-trace cancellation propagating into a playbook.
546
+ * `examples/streaming_llm/`: mock LLM emitting streaming chunks to an SSE sink.
547
+ * `examples/metadata_propagation/`: attaching and consuming `Message.meta` context.
548
+ * `examples/visualizer/`: exports Mermaid + DOT diagrams with loop/subflow annotations.
387
549
 
388
550
  ---
389
551
 
@@ -4,6 +4,21 @@
4
4
  <img src="asset/Penguiflow.png" alt="PenguiFlow logo" width="220">
5
5
  </p>
6
6
 
7
+ <p align="center">
8
+ <a href="https://github.com/penguiflow/penguiflow/actions/workflows/ci.yml">
9
+ <img src="https://github.com/penguiflow/penguiflow/actions/workflows/ci.yml/badge.svg" alt="CI Status">
10
+ </a>
11
+ <a href="https://github.com/penguiflow/penguiflow">
12
+ <img src="https://img.shields.io/badge/coverage-85%25-brightgreen" alt="Coverage">
13
+ </a>
14
+ <a href="https://pypi.org/project/penguiflow/">
15
+ <img src="https://img.shields.io/pypi/v/penguiflow.svg" alt="PyPI version">
16
+ </a>
17
+ <a href="https://github.com/penguiflow/penguiflow/blob/main/LICENSE">
18
+ <img src="https://img.shields.io/badge/license-MIT-blue.svg" alt="License">
19
+ </a>
20
+ </p>
21
+
7
22
  **Async-first orchestration library for multi-agent and data pipelines**
8
23
 
9
24
  PenguiFlow is a **lightweight Python library** to orchestrate agent flows.
@@ -13,8 +28,14 @@ It provides:
13
28
  * **Concurrent fan-out / fan-in patterns**
14
29
  * **Routing & decision points**
15
30
  * **Retries, timeouts, backpressure**
31
+ * **Streaming chunks** (LLM-style token emission with `Context.emit_chunk`)
16
32
  * **Dynamic loops** (controller nodes)
17
33
  * **Runtime playbooks** (callable subflows with shared metadata)
34
+ * **Per-trace cancellation** (`PenguiFlow.cancel` with `TraceCancelled` surfacing in nodes)
35
+ * **Deadlines & budgets** (`Message.deadline_s`, `WM.budget_hops`, and `WM.budget_tokens` guardrails that you can leave unset/`None`)
36
+ * **Observability hooks** (`FlowEvent` callbacks for logging, MLflow, or custom metrics sinks)
37
+ * **Policy-driven routing** (optional policies steer routers without breaking existing flows)
38
+ * **Traceable exceptions** (`FlowError` captures node/trace metadata and optionally emits to Rookery)
18
39
 
19
40
  Built on pure `asyncio` (no threads), PenguiFlow is small, predictable, and repo-agnostic.
20
41
  Product repos only define **their models + node functions** — the core stays dependency-light.
@@ -47,6 +68,7 @@ msg = Message(
47
68
  payload=QueryIn(text="unique reach last 30 days"),
48
69
  headers=Headers(tenant="acme")
49
70
  )
71
+ msg.meta["request_id"] = "abc123"
50
72
  ```
51
73
 
52
74
  ### Node
@@ -60,12 +82,18 @@ from penguiflow.node import Node
60
82
  class QueryOut(BaseModel):
61
83
  topic: str
62
84
 
63
- async def triage(m: QueryIn) -> QueryOut:
85
+ async def triage(msg: QueryIn, ctx) -> QueryOut:
64
86
  return QueryOut(topic="metrics")
65
87
 
66
88
  triage_node = Node(triage, name="triage")
67
89
  ```
68
90
 
91
+ Node functions must always accept **two positional parameters**: the incoming payload and
92
+ the `Context` object. If a node does not use the context, name it `_` or `_ctx`, but keep
93
+ the parameter so the runtime can still inject it. Registering the node with
94
+ `ModelRegistry` ensures the payload is validated/cast to the expected Pydantic model;
95
+ setting `NodePolicy(validate="none")` skips that validation for hot paths.
96
+
69
97
  ### Flow
70
98
 
71
99
  A flow wires nodes together in a directed graph.
@@ -139,13 +167,77 @@ await flow.stop()
139
167
  pip install -e ./penguiflow
140
168
  ```
141
169
 
142
- Requires **Python 3.12+**.
170
+ Requires **Python 3.11+**.
171
+
172
+ ---
173
+
174
+ ## 🛠️ Key capabilities
175
+
176
+ ### Streaming & incremental delivery
177
+
178
+ `Context.emit_chunk` (and `PenguiFlow.emit_chunk`) provide token-level streaming without
179
+ sacrificing backpressure or ordering guarantees. The helper wraps the payload in a
180
+ `StreamChunk`, mirrors routing metadata from the parent message, and automatically
181
+ increments per-stream sequence numbers. See `tests/test_streaming.py` and
182
+ `examples/streaming_llm/` for an end-to-end walk-through.
183
+
184
+ ### Reliability & guardrails
185
+
186
+ PenguiFlow enforces reliability boundaries out of the box:
187
+
188
+ * **Per-trace cancellation** (`PenguiFlow.cancel(trace_id)`) unwinds a single run while
189
+ other traces keep executing. Worker tasks observe `TraceCancelled` and clean up
190
+ resources; `tests/test_cancel.py` covers the behaviour.
191
+ * **Deadlines & budgets** let you keep loops honest. `Message.deadline_s` guards
192
+ wall-clock execution, while controller payloads (`WM`) track hop and token budgets.
193
+ Exhaustion short-circuits into terminal `FinalAnswer` messages as demonstrated in
194
+ `tests/test_budgets.py` and `examples/controller_multihop/`.
195
+ * **Retries & timeouts** live in `NodePolicy`. Exponential backoff, timeout enforcement,
196
+ and structured retry events are exercised heavily in the core test suite.
197
+
198
+ ### Metadata & observability
199
+
200
+ Every `Message` carries a mutable `meta` dictionary so nodes can propagate debugging
201
+ breadcrumbs, billing information, or routing hints without touching the payload. The
202
+ runtime clones metadata during streaming and playbook calls (`tests/test_metadata.py`).
203
+ Structured runtime events surface through `FlowEvent` objects; attach middlewares for
204
+ custom logging or metrics ingestion (`examples/mlflow_metrics/`).
205
+
206
+ ### Routing & dynamic policies
207
+
208
+ Branching flows stay flexible thanks to routers and optional policies. The
209
+ `predicate_router` and `union_router` helpers can consult a `RoutingPolicy` at runtime to
210
+ override or drop successors, while `DictRoutingPolicy` provides a config-driven
211
+ implementation ready for JSON/YAML/env inputs (`tests/test_routing_policy.py`,
212
+ `examples/routing_policy/`).
213
+
214
+ ### Traceable exceptions
215
+
216
+ When retries are exhausted or timeouts fire, PenguiFlow wraps the failure in a
217
+ `FlowError` that preserves the trace id, node metadata, and a stable error code.
218
+ Opt into `emit_errors_to_rookery=True` to receive these objects directly from
219
+ `flow.fetch()`—see `tests/test_errors.py` and `examples/traceable_errors/` for usage.
220
+
221
+ ### FlowTestKit
222
+
223
+ The new `penguiflow.testkit` module keeps unit tests tiny:
224
+
225
+ * `await testkit.run_one(flow, message)` boots a flow, emits a message, captures runtime
226
+ events, and returns the first Rookery payload.
227
+ * `testkit.assert_node_sequence(trace_id, [...])` asserts the order in which nodes ran.
228
+ * `testkit.simulate_error(...)` builds coroutine helpers that fail a configurable number
229
+ of times—perfect for retry scenarios.
230
+
231
+ The harness is covered by `tests/test_testkit.py` and demonstrated in
232
+ `examples/testkit_demo/`.
233
+
143
234
 
144
235
  ## 🧭 Repo Structure
145
236
 
146
237
  penguiflow/
147
238
  __init__.py
148
239
  core.py # runtime orchestrator, retries, controller helpers, playbooks
240
+ errors.py # FlowError / FlowErrorCode definitions
149
241
  node.py
150
242
  types.py
151
243
  registry.py
@@ -234,18 +326,46 @@ stitched directly into a flow adjacency list:
234
326
 
235
327
  - `map_concurrent(items, worker, max_concurrency=8)` — fan a single message out into
236
328
  many in-memory tasks (e.g., batch document enrichment) while respecting a semaphore.
237
- - `predicate_router(name, mapping)` — route messages to successor nodes based on simple
238
- boolean functions over payload or headers. Perfect for guardrails or conditional
239
- tool invocation without building a full controller.
329
+ - `predicate_router(name, predicate, policy=None)` — route messages to successor nodes
330
+ based on simple boolean functions over payload or headers, optionally consulting a
331
+ runtime `policy` to override or filter the computed targets. Perfect for guardrails or
332
+ conditional tool invocation without rebuilding the flow.
240
333
  - `union_router(name, discriminated_model)` — accept a Pydantic discriminated union and
241
334
  forward each variant to the matching typed successor node. Keeps type-safety even when
242
335
  multiple schema branches exist.
243
336
  - `join_k(name, k)` — aggregate `k` messages per `trace_id` before resuming downstream
244
337
  work. Useful for fan-out/fan-in batching, map-reduce style summarization, or consensus.
338
+ - `DictRoutingPolicy(mapping, key_getter=None)` — load routing overrides from
339
+ configuration and pair it with the router helpers via `policy=...` to switch routing at
340
+ runtime without modifying the flow graph.
245
341
 
246
342
  All helpers are regular `Node` instances under the hood, so they inherit retries,
247
343
  timeouts, and validation just like hand-written nodes.
248
344
 
345
+ ### Streaming Responses
346
+
347
+ PenguiFlow now supports **LLM-style streaming** with the `StreamChunk` model. Each
348
+ chunk carries `stream_id`, `seq`, `text`, optional `meta`, and a `done` flag. Use
349
+ `Context.emit_chunk(parent=message, text=..., done=...)` inside a node (or the
350
+ convenience wrapper `await flow.emit_chunk(...)` from outside a node) to push
351
+ chunks downstream without manually crafting `Message` envelopes:
352
+
353
+ ```python
354
+ await ctx.emit_chunk(parent=msg, text=token, done=done)
355
+ ```
356
+
357
+ - Sequence numbers auto-increment per `stream_id` (defaults to the parent trace).
358
+ - Backpressure is preserved; if the downstream queue is full the helper awaits just
359
+ like `Context.emit`.
360
+ - When `done=True`, the sequence counter resets so a new stream can reuse the same id.
361
+
362
+ Pair the producer with a sink node that consumes `StreamChunk` payloads and assembles
363
+ the final result when `done` is observed. See `examples/streaming_llm/` for a complete
364
+ mock LLM → SSE pipeline. For presentation layers, utilities like
365
+ `format_sse_event(chunk)` and `chunk_to_ws_json(chunk)` (both exported from the
366
+ package) will convert a `StreamChunk` into SSE-compatible text or WebSocket JSON payloads
367
+ without boilerplate.
368
+
249
369
  ### Dynamic Controller Loops
250
370
 
251
371
  Long-running agents often need to **think, plan, and act over multiple hops**. PenguiFlow
@@ -267,20 +387,21 @@ easy to surface guardrails to downstream consumers.
267
387
  ### Playbooks & Subflows
268
388
 
269
389
  Sometimes a controller or router needs to execute a **mini flow** — for example,
270
- retrieval → rerank → compress — without polluting the global topology. `call_playbook`
271
- spawns a brand-new `PenguiFlow` on demand and wires it into the parent message context:
390
+ retrieval → rerank → compress — without polluting the global topology.
391
+ `Context.call_playbook` spawns a brand-new `PenguiFlow` on demand and wires it into
392
+ the parent message context:
272
393
 
273
394
  - Trace IDs and headers are reused so observability stays intact.
274
- - The helper respects optional timeouts and always stops the subflow (even on cancel).
395
+ - The helper respects optional timeouts, mirrors cancellation to the subflow, and always
396
+ stops it (even on cancel).
275
397
  - The first payload emitted to the playbook's Rookery is returned to the caller,
276
398
  allowing you to treat subflows as normal async functions.
277
399
 
278
400
  ```python
279
- from penguiflow import call_playbook
280
401
  from penguiflow.types import Message
281
402
 
282
403
  async def controller(msg: Message, ctx) -> Message:
283
- playbook_result = await call_playbook(build_retrieval_playbook, msg)
404
+ playbook_result = await ctx.call_playbook(build_retrieval_playbook, msg)
284
405
  return msg.model_copy(update={"payload": playbook_result})
285
406
  ```
286
407
 
@@ -289,12 +410,36 @@ flow focused on high-level orchestration logic.
289
410
 
290
411
  ---
291
412
 
413
+ ### Visualization
414
+
415
+ Need a quick view of the flow topology? Call `flow_to_mermaid(flow)` to render the graph
416
+ as a Mermaid diagram ready for Markdown or docs tools, or `flow_to_dot(flow)` for a
417
+ Graphviz-friendly definition. Both outputs annotate controller loops and the synthetic
418
+ OpenSea/Rookery boundaries so you can spot ingress/egress paths at a glance:
419
+
420
+ ```python
421
+ from penguiflow import flow_to_dot, flow_to_mermaid
422
+
423
+ print(flow_to_mermaid(flow, direction="LR"))
424
+ print(flow_to_dot(flow, rankdir="LR"))
425
+ ```
426
+
427
+ See `examples/visualizer/` for a runnable script that exports Markdown and DOT files for
428
+ docs or diagramming pipelines.
429
+
430
+ ---
431
+
292
432
  ## 🛡️ Reliability & Observability
293
433
 
294
434
  * **NodePolicy**: set validation scope plus per-node timeout, retries, and backoff curves.
295
- * **Structured logs**: enrich every node event with `{ts, trace_id, node_name, event, latency_ms, q_depth_in, attempt}`.
296
- * **Middleware hooks**: subscribe observers (e.g., MLflow) to the structured event stream.
297
- * See `examples/reliability_middleware/` for a concrete timeout + retry walkthrough.
435
+ * **Per-trace metrics**: cancellation events include `trace_pending`, `trace_inflight`,
436
+ `q_depth_in`, `q_depth_out`, and node fan-out counts for richer observability.
437
+ * **Structured `FlowEvent`s**: every node event carries `{ts, trace_id, node_name, event,
438
+ latency_ms, q_depth_in, q_depth_out, attempt}` plus a mutable `extra` map for custom
439
+ annotations.
440
+ * **Middleware hooks**: subscribe observers (e.g., MLflow) to the structured `FlowEvent`
441
+ stream. See `examples/mlflow_metrics/` for an MLflow integration and
442
+ `examples/reliability_middleware/` for a concrete timeout + retry walkthrough.
298
443
 
299
444
  ---
300
445
 
@@ -302,15 +447,25 @@ flow focused on high-level orchestration logic.
302
447
 
303
448
  - **In-process runtime**: there is no built-in distribution layer yet. Long-running CPU work should be delegated to your own pools or services.
304
449
  - **Registry-driven typing**: nodes default to validation. Provide a `ModelRegistry` when calling `flow.run(...)` or set `validate="none"` explicitly for untyped hops.
305
- - **Observability**: structured logs + middleware hooks are available, but integrations with third-party stacks (OTel, Prometheus) are DIY for now.
306
- - **Roadmap**: v2 targets streaming, distributed backends, richer observability, and test harnesses. Contributions and proposals are welcome!
450
+ - **Observability**: structured `FlowEvent` callbacks power logs/metrics; integrations with
451
+ third-party stacks (OTel, Prometheus, Datadog) remain DIY. See the MLflow middleware
452
+ example for a lightweight pattern.
453
+ - **Roadmap**: follow-up releases focus on optional distributed backends, deeper observability integrations, and additional playbook patterns. Contributions and proposals are welcome!
454
+
455
+ ---
456
+
457
+ ## 📊 Benchmarks
458
+
459
+ Lightweight benchmarks live under `benchmarks/`. Run them via `uv run python benchmarks/<name>.py`
460
+ to capture baselines for fan-out throughput, retry/timeout overhead, and controller
461
+ playbook latency. Copy them into product repos to watch for regressions over time.
307
462
 
308
463
  ---
309
464
 
310
465
  ## 🔮 Roadmap
311
466
 
312
- * **v1 (current)**: safe core runtime, type-safety, retries, timeouts, routing, controller loops, playbooks via examples.
313
- * **v2 (future)**: streaming support, per-trace cancel, deadlines/budgets, observability hooks, visualizer, testing harness.
467
+ * **v2 (current)**: streaming, per-trace cancellation, deadlines/budgets, metadata propagation, observability hooks, visualizer, routing policies, traceable errors, and FlowTestKit.
468
+ * **Future**: optional distributed runners, richer third-party observability adapters, and opinionated playbook templates.
314
469
 
315
470
  ---
316
471
 
@@ -344,7 +499,12 @@ pytest -q
344
499
  * `examples/map_concurrent/`: bounded fan-out work inside a node.
345
500
  * `examples/controller_multihop/`: dynamic multi-hop agent loop.
346
501
  * `examples/reliability_middleware/`: retries, timeouts, and middleware hooks.
502
+ * `examples/mlflow_metrics/`: structured `FlowEvent` export to MLflow (stdout fallback).
347
503
  * `examples/playbook_retrieval/`: retrieval → rerank → compress playbook.
504
+ * `examples/trace_cancel/`: per-trace cancellation propagating into a playbook.
505
+ * `examples/streaming_llm/`: mock LLM emitting streaming chunks to an SSE sink.
506
+ * `examples/metadata_propagation/`: attaching and consuming `Message.meta` context.
507
+ * `examples/visualizer/`: exports Mermaid + DOT diagrams with loop/subflow annotations.
348
508
 
349
509
  ---
350
510