PyPI - react-agent-harness - Versions diffs - 0.5.2__tar.gz → 0.6.0__tar.gz - Mend

react-agent-harness 0.5.2tar.gz → 0.6.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (83) hide show

{react_agent_harness-0.5.2/react_agent_harness.egg-info → react_agent_harness-0.6.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: react-agent-harness
-Version: 0.5.2
+Version: 0.6.0
 Summary: Multi-agent LLM orchestration: hybrid DAG planning, two-tier memory, streaming
 Requires-Python: >=3.10
 License-File: LICENSE

{react_agent_harness-0.5.2 → react_agent_harness-0.6.0}/README.md RENAMED Viewed

@@ -36,6 +36,8 @@ events to stdout and prints elapsed time + cost at the end.
 harness/runtime.py          AgentRuntime — single entry point, wire once run anything
 harness/events.py           BusEvent + EventType — canonical event vocabulary
 harness/llm/openai.py       OpenAILLM — OpenAI adapter with usage + cost tracking
+harness/llm/fallback.py     FallbackLLM — transparent retry on transient upstream errors
+harness/llm/routing.py      RoutingLLM — dispatch calls to different adapters by a selector
 harness/annotation.py       Annotation store + AnnotationHook — RLHF trajectory capture
 harness/hitl.py             HITL approval gate — interactive CLI, session-allow list
 harness/tool_policy.py      Persistent tool policy — user-scoped allow rules, CLI management
@@ -188,6 +190,99 @@ llm = ClaudeCodeLLM(
 )
 ```
+### Cost shaping + reliability
+Two patterns, ordered by how production teams actually solve this:
+**1. Per-call-site LLM injection (the recommended pattern)**
+`AgentRuntime` exposes one slot per orchestrator call site. Each defaults to
+`llm` when unset, so existing code keeps working. The classifier and router
+both see only the goal + agent descriptions (~300 tokens) and emit a
+one-token decision — natural candidates for a cheaper model. The planner
+and synthesiser produce structured DAGs and final answers and usually want
+to stay on the main model.
+```python
+runtime = AgentRuntime(
+    agent_registry=agents,
+    tool_registry=tools,
+    memory=memory,
+    llm=premium,                 # default — agent ReAct loops use this
+    classifier_llm=cheap,        # simple vs complex dispatch decision
+    router_llm=cheap,            # single-agent picker
+    # planner_llm=...            # defaults to llm; override only if you want
+    # synthesizer_llm=...        # defaults to llm
+)
+```
+No guessing, no keyword matching, no fragility — you read the runtime
+construction and you know exactly which model serves which purpose. The
+budget guard is wired into every distinct LLM instance automatically
+(deduped by object identity, so injecting the same wrapper into multiple
+slots costs no extra calls).
+**2. `FallbackLLM` for resilience**
+Try each adapter in order; transparently switch to the next on rate
+limits, timeouts, or 5xx errors:
+```python
+from harness.llm.fallback import FallbackLLM
+llm = FallbackLLM([
+    AnthropicLLM(model="claude-sonnet-4-6"),   # primary
+    OpenAILLM(model="gpt-4o-mini"),            # backup
+])
+runtime = AgentRuntime(..., llm=llm)
+print(llm.last_route)   # 0 if primary worked, 1 if backup did
+```
+Permanent errors (auth, bad request) propagate immediately — only transient
+upstream errors trigger fallback. Customise with `transient_errors=...`.
+Streaming retries only fire before the first token; mid-stream failures
+propagate to preserve response integrity.
+**3. `RoutingLLM` for bring-your-own-selector cases**
+When you need runtime routing — capability gating (`vision` vs
+`long_context`), learned classifiers (RouteLLM-style), cascade
+routing (cheap-then-escalate-on-low-confidence) — wrap a routes dict
+with your own selector callable:
+```python
+from harness.llm.routing import RoutingLLM
+def by_capability(system, messages):
+    if _needs_vision(messages):
+        return "vision"
+    if _estimated_tokens(system, messages) > 100_000:
+        return "long_context"
+    return "default"
+llm = RoutingLLM(
+    routes={
+        "default":      OpenAILLM(model="gpt-4o-mini"),
+        "vision":       OpenAILLM(model="gpt-4o"),
+        "long_context": AnthropicLLM(model="claude-sonnet-4-6"),
+    },
+    selector=by_capability,
+    default_route="default",
+)
+```
+The harness intentionally does not ship default selectors. Naive selectors
+(keyword matching, fixed token thresholds) misroute in subtle ways and
+encourage the wrong mental model — if you're reaching for one, you almost
+certainly want per-call-site injection instead.
+Compose freely: `FallbackLLM([premium, backup])` injected into the
+`llm=` slot gives the agent loops resilience, with `classifier_llm=cheap`
+and `router_llm=cheap` shaping the cheap-call cost — all without a custom
+selector.
+---
 `ClaudeCodeLLM` reads a `claude-code` OAuth entry, refreshes it automatically
 when expired, and retries once after `401`/`403`. This mirrors Pi's Claude
 Pro/Max extension approach rather than shelling out to the Claude CLI. The
@@ -722,6 +817,49 @@ The OTEL hook is a side-channel on the existing `Tracer` — the in-memory trace
 is always available via `result["trace"]` regardless of whether OTEL is enabled.
 Zero overhead and zero imports when `enable_otel=False`.
+## Trace recorder + replay + local viewer
+For local debug and post-mortem inspection without an OTEL backend, the
+harness ships a JSONL trace recorder and a stdlib-only HTML viewer. Wrap
+any streaming call:
+```python
+from harness.trace import record_trace, replay
+async for event in record_trace(runtime.dispatch_stream(goal), "run.jsonl"):
+    ...  # your normal handling
+```
+Each `BusEvent` is flushed per-line, so a partial trace survives a crash.
+View the trace in your browser:
+```bash
+agent-harness trace view run.jsonl     # opens http://127.0.0.1:8765/
+```
+The viewer is a single embedded HTML page — vertical timeline, filter by
+agent / event type / text, expandable per-event JSON. No build step, no
+external services.
+Replay a trace through `ConsoleRenderer` (great for grepping or piping
+into another script):
+```bash
+agent-harness trace replay run.jsonl
+agent-harness trace replay run.jsonl --realtime --speed 2.0
+```
+Programmatic replay yields reconstructed `BusEvent` objects:
+```python
+async for event in replay("run.jsonl", realtime=False):
+    ...  # reuse the same loops you write for live streams
+```
+This is complementary to OTEL — OTEL is for production observability and
+long-term storage in Jaeger/Datadog; the JSONL recorder is for local
+debugging, sharing reproductions, and replaying past runs.
 ## Vision / multimodal agents
 `WorkingMemory` accepts `str | list` content so image blocks pass through to

{react_agent_harness-0.5.2 → react_agent_harness-0.6.0}/harness/cli.py RENAMED Viewed

@@ -46,6 +46,19 @@ def main() -> int:
     policy_clear = policy_sub.add_parser("clear", help="remove all policy rules")
     policy_clear.add_argument("--policy-file", default=str(default_policy_file()))
+    trace = sub.add_parser("trace", help="view or replay a recorded run trace")
+    trace_sub = trace.add_subparsers(dest="trace_command", required=True)
+    trace_view = trace_sub.add_parser("view", help="open a local web viewer for a trace")
+    trace_view.add_argument("path", help="path to a JSONL trace produced by record_trace")
+    trace_view.add_argument("--port", type=int, default=8765)
+    trace_view.add_argument("--no-open", action="store_true", help="don't auto-open the browser")
+    trace_replay = trace_sub.add_parser("replay", help="dump a trace to stdout via ConsoleRenderer")
+    trace_replay.add_argument("path", help="path to a JSONL trace produced by record_trace")
+    trace_replay.add_argument(
+        "--realtime", action="store_true", help="preserve recorded inter-event timing"
+    )
+    trace_replay.add_argument("--speed", type=float, default=1.0, help="realtime speed multiplier")
     args = parser.parse_args()
     try:
         if args.command == "login":
@@ -71,6 +84,16 @@ def main() -> int:
                 return _policy_revoke(path, args.rule_id)
             if args.policy_command == "clear":
                 return _policy_clear(path)
+        if args.command == "trace":
+            if args.trace_command == "view":
+                from harness.trace_viewer import serve
+                serve(args.path, port=args.port, open_browser=not args.no_open)
+                return 0
+            if args.trace_command == "replay":
+                return asyncio.run(
+                    _trace_replay(args.path, realtime=args.realtime, speed=args.speed)
+                )
     except Exception as e:
         print(f"agent-harness: {e}", file=sys.stderr)
         return 1
@@ -180,5 +203,16 @@ def _policy_clear(path: Path) -> int:
     return 0
+async def _trace_replay(path: str, *, realtime: bool, speed: float) -> int:
+    """Read a JSONL trace and render it via ConsoleRenderer."""
+    from harness.console import ConsoleRenderer
+    from harness.trace import replay
+    renderer = ConsoleRenderer()
+    async for event in replay(path, realtime=realtime, speed=speed):
+        renderer.render(event)
+    return 0
 if __name__ == "__main__":
     raise SystemExit(main())

react_agent_harness-0.6.0/harness/llm/fallback.py ADDED Viewed

@@ -0,0 +1,169 @@
+"""``FallbackLLM`` — try multiple LLM clients in order on transient failures.
+Wraps any number of LLM adapters that share the standard harness contract
+(``complete``, optionally ``stream_complete``, ``set_budget``, ``last_usage``).
+On a transient error (rate limit, timeout, 5xx) the next adapter in the list
+is tried. The first non-transient error — or exhausting the list — re-raises.
+Example::
+    from harness.llm.openai import OpenAILLM
+    from harness.llm.anthropic import AnthropicLLM
+    from harness.llm.fallback import FallbackLLM
+    primary = AnthropicLLM(model="claude-sonnet-4-6")
+    backup = OpenAILLM(model="gpt-4o-mini")
+    llm = FallbackLLM([primary, backup])
+    runtime = AgentRuntime(..., llm=llm)
+Set ``transient_errors`` to a callable that returns True when the exception
+should trigger the next fallback. The default heuristic catches rate-limit
+and 5xx-class errors from the OpenAI and Anthropic SDKs and any
+``asyncio.TimeoutError`` / ``ConnectionError`` / ``OSError`` raised by the
+transport.
+``last_route`` exposes the index of the adapter that actually answered the
+most recent call, so callers can see which one was hit::
+    await llm.complete(system, messages)
+    print(llm.last_route)   # 0 if primary worked, 1 if backup did, ...
+"""
+from __future__ import annotations
+import asyncio
+import logging
+from collections.abc import AsyncGenerator, Callable
+from typing import Any
+logger = logging.getLogger(__name__)
+def _default_is_transient(exc: BaseException) -> bool:
+    """Best-effort classifier for retryable upstream errors.
+    Detects without importing the SDKs (so the fallback adapter has no
+    optional-dep coupling):
+      - ``status_code`` attr in {408, 425, 429, 500, 502, 503, 504}
+      - class name suffixed with ``RateLimitError`` / ``ServiceUnavailableError``
+        / ``APITimeoutError`` / ``InternalServerError`` / ``OverloadedError``
+      - ``asyncio.TimeoutError``, ``ConnectionError``, ``OSError``
+    """
+    if isinstance(exc, asyncio.TimeoutError | ConnectionError | OSError):
+        return True
+    status = getattr(exc, "status_code", None)
+    if isinstance(status, int) and status in {408, 425, 429, 500, 502, 503, 504}:
+        return True
+    name = type(exc).__name__
+    transient_suffixes = (
+        "RateLimitError",
+        "ServiceUnavailableError",
+        "APITimeoutError",
+        "InternalServerError",
+        "OverloadedError",
+        "TimeoutError",
+    )
+    return any(name.endswith(s) for s in transient_suffixes)
+class FallbackLLM:
+    def __init__(
+        self,
+        llms: list[Any],
+        *,
+        transient_errors: Callable[[BaseException], bool] | None = None,
+    ) -> None:
+        if not llms:
+            raise ValueError("FallbackLLM requires at least one inner LLM")
+        self._llms = list(llms)
+        self._is_transient = transient_errors or _default_is_transient
+        self.last_route: int = -1
+        self.last_usage: dict | None = None
+    def set_budget(self, guard: Any) -> None:
+        """Forward the budget guard to every inner LLM."""
+        for llm in self._llms:
+            if hasattr(llm, "set_budget"):
+                llm.set_budget(guard)
+    # ── Non-streaming ────────────────────────────────────────────────────────
+    async def complete(
+        self,
+        system: str | None,
+        messages: list[dict],
+        **kwargs: Any,
+    ) -> dict:
+        last_exc: BaseException | None = None
+        for i, llm in enumerate(self._llms):
+            try:
+                result = await llm.complete(system, messages, **kwargs)
+            except BaseException as exc:
+                if i == len(self._llms) - 1 or not self._is_transient(exc):
+                    raise
+                logger.warning(
+                    "FallbackLLM: adapter %d (%s) raised transient %s — trying next",
+                    i,
+                    type(llm).__name__,
+                    type(exc).__name__,
+                )
+                last_exc = exc
+                continue
+            self.last_route = i
+            self.last_usage = getattr(llm, "last_usage", None)
+            return result
+        # Unreachable in practice — the loop always returns or re-raises.
+        assert last_exc is not None
+        raise last_exc
+    # ── Streaming ────────────────────────────────────────────────────────────
+    async def stream_complete(
+        self,
+        system: str | None,
+        messages: list[dict],
+    ) -> AsyncGenerator[str, None]:
+        """Stream from the first adapter that doesn't fail before yielding.
+        We can only retry until the first token has been emitted — once the
+        caller has seen partial output, a switch mid-stream would corrupt the
+        response. The transient check therefore runs against errors raised
+        before the generator yields anything.
+        """
+        last_exc: BaseException | None = None
+        for i, llm in enumerate(self._llms):
+            if not hasattr(llm, "stream_complete"):
+                continue
+            try:
+                gen = llm.stream_complete(system, messages)
+                first = await _peek_first(gen)
+            except BaseException as exc:
+                if i == len(self._llms) - 1 or not self._is_transient(exc):
+                    raise
+                logger.warning(
+                    "FallbackLLM(stream): adapter %d (%s) raised transient %s "
+                    "before first token — trying next",
+                    i,
+                    type(llm).__name__,
+                    type(exc).__name__,
+                )
+                last_exc = exc
+                continue
+            self.last_route = i
+            if first is not None:
+                yield first
+            async for chunk in gen:
+                yield chunk
+            self.last_usage = getattr(llm, "last_usage", None)
+            return
+        assert last_exc is not None
+        raise last_exc
+async def _peek_first(gen: AsyncGenerator[str, None]) -> str | None:
+    """Pull the first item from an async generator, or None if exhausted."""
+    try:
+        return await gen.__anext__()
+    except StopAsyncIteration:
+        return None

react_agent_harness-0.6.0/harness/llm/routing.py ADDED Viewed

@@ -0,0 +1,139 @@
+"""``RoutingLLM`` — dispatch each LLM call to a different adapter by a selector.
+**For agent-harness's own call sites, prefer per-call-site injection** —
+``AgentRuntime`` exposes ``classifier_llm`` / ``router_llm`` and
+``Orchestrator`` exposes ``planner_llm`` / ``synthesizer_llm``. That's the
+production-style pattern: each call site is hard-wired to a model chosen
+for that workload's cost / quality / latency budget, no runtime guessing.
+``RoutingLLM`` is the **bring-your-own-selector primitive** for cases
+where per-call-site injection isn't enough:
+  - You're wrapping an existing harness instance you can't restructure.
+  - You're routing based on **capability** (does this query need
+    vision / function calling / >200K context?) — that's a real
+    production pattern, but the metadata is provider-specific so the
+    selector has to be yours.
+  - You're doing **learned routing** (RouteLLM-style classifier) where
+    the selector is a small ML model.
+  - You're doing **cascade routing** (cheap-then-escalate-on-low-confidence)
+    via a custom selector that inspects prior responses.
+Wire it up with your own selector callable that returns a key from the
+``routes`` dict::
+    from harness.llm.routing import RoutingLLM
+    def my_capability_selector(system, messages):
+        # Inspect the call's requirements and pick the cheapest viable model.
+        if _needs_vision(messages):
+            return "vision"
+        if _estimated_tokens(system, messages) > 100_000:
+            return "long_context"
+        return "default"
+    llm = RoutingLLM(
+        routes={
+            "default":      OpenAILLM(model="gpt-4o-mini"),
+            "vision":       OpenAILLM(model="gpt-4o"),
+            "long_context": AnthropicLLM(model="claude-sonnet-4-6"),
+        },
+        selector=my_capability_selector,
+        default_route="default",
+    )
+The harness does **not** ship default selectors. Naive selectors
+(keyword matching, fixed token thresholds) misroute in subtle ways and
+encourage the wrong mental model. If you find yourself reaching for one,
+the per-call-site injection path on ``AgentRuntime`` / ``Orchestrator``
+is almost certainly what you actually want.
+``last_route`` exposes the key of the route that handled the most recent
+call — handy for logging and tests.
+"""
+from __future__ import annotations
+import logging
+from collections.abc import AsyncGenerator, Callable, Mapping
+from typing import Any
+logger = logging.getLogger(__name__)
+Selector = Callable[[str | None, list[dict]], str]
+class RoutingLLM:
+    def __init__(
+        self,
+        routes: Mapping[str, Any],
+        *,
+        selector: Selector,
+        default_route: str,
+    ) -> None:
+        if not routes:
+            raise ValueError("RoutingLLM requires at least one route")
+        if default_route not in routes:
+            raise ValueError(f"default_route {default_route!r} is not in routes")
+        self._routes = dict(routes)
+        self._selector = selector
+        self._default_route = default_route
+        self.last_route: str = default_route
+        self.last_usage: dict | None = None
+    def set_budget(self, guard: Any) -> None:
+        """Forward the budget guard to every routed LLM."""
+        for llm in self._routes.values():
+            if hasattr(llm, "set_budget"):
+                llm.set_budget(guard)
+    def _pick(self, system: str | None, messages: list[dict]) -> tuple[str, Any]:
+        try:
+            key = self._selector(system, messages)
+        except Exception as e:  # noqa: BLE001 — fall back gracefully
+            logger.warning("RoutingLLM selector raised %s — using default route", e)
+            key = self._default_route
+        if key not in self._routes:
+            logger.warning(
+                "RoutingLLM selector returned unknown key %r — using default route %r",
+                key,
+                self._default_route,
+            )
+            key = self._default_route
+        return key, self._routes[key]
+    # ── Non-streaming ────────────────────────────────────────────────────────
+    async def complete(
+        self,
+        system: str | None,
+        messages: list[dict],
+        **kwargs: Any,
+    ) -> dict:
+        key, llm = self._pick(system, messages)
+        self.last_route = key
+        result = await llm.complete(system, messages, **kwargs)
+        self.last_usage = getattr(llm, "last_usage", None)
+        return result
+    # ── Streaming ────────────────────────────────────────────────────────────
+    async def stream_complete(
+        self,
+        system: str | None,
+        messages: list[dict],
+    ) -> AsyncGenerator[str, None]:
+        key, llm = self._pick(system, messages)
+        self.last_route = key
+        if not hasattr(llm, "stream_complete"):
+            # Fall back to non-streaming for routes that don't implement it.
+            result = await llm.complete(system, messages)
+            text = result.get("text", "") if isinstance(result, dict) else str(result)
+            if text:
+                yield text
+            self.last_usage = getattr(llm, "last_usage", None)
+            return
+        async for chunk in llm.stream_complete(system, messages):
+            yield chunk
+        self.last_usage = getattr(llm, "last_usage", None)

{react_agent_harness-0.5.2 → react_agent_harness-0.6.0}/harness/runtime.py RENAMED Viewed

@@ -286,11 +286,36 @@ class AgentRuntime:
         annotation_store: Any | None = None,  # InMemoryAnnotationStore or compatible
         checkpoint_store: Any | None = None,  # FileCheckpointStore / RedisCheckpointStore
         steering_source_factory: Any | None = None,  # passed to each spawned BaseAgent
+        # ── Optional per-call-site LLM overrides ──────────────────────────────
+        # Each defaults to ``llm`` when unset. The dispatch classifier and the
+        # single-agent router both see only the goal + agent descriptions
+        # (~300 tokens) and emit a one-token decision — they're the natural
+        # candidates for a cheaper model. The planner and synthesiser produce
+        # structured DAGs and final answers and should usually stay on the
+        # main model. See README "Smart routing + fallback" for the pattern.
+        classifier_llm: Any | None = None,
+        router_llm: Any | None = None,
+        planner_llm: Any | None = None,
+        synthesizer_llm: Any | None = None,
     ) -> None:
         self._agent_registry = agent_registry
         self._tool_registry = tool_registry
         self._memory = memory
         self._llm = llm
+        self._classifier_llm = classifier_llm or llm
+        self._router_llm = router_llm or llm
+        self._planner_llm = planner_llm or llm
+        self._synthesizer_llm = synthesizer_llm or llm
+        # ``set_budget`` should reach every distinct LLM instance — if the
+        # user injected the same wrapper into multiple slots, dedupe by
+        # object identity so we don't call it N times.
+        self._budget_targets: list[Any] = []
+        for candidate in (llm, classifier_llm, router_llm, planner_llm, synthesizer_llm):
+            if candidate is None:
+                continue
+            if any(candidate is existing for existing in self._budget_targets):
+                continue
+            self._budget_targets.append(candidate)
         self._guardrail_config = guardrail_config or GuardrailConfig()
         self._enable_otel = enable_otel
         self._annotation_store = annotation_store
@@ -307,6 +332,16 @@ class AgentRuntime:
             checkpoint_store = FileCheckpointStore()
         self._checkpoint_store = checkpoint_store
+    def _attach_budget(self, guard: Any) -> None:
+        """Wire the per-run budget guard into every distinct LLM instance.
+        Duck-typed: adapters that don't implement ``set_budget`` (e.g. a
+        bare custom client) are skipped silently.
+        """
+        for target in self._budget_targets:
+            if hasattr(target, "set_budget"):
+                target.set_budget(guard)
     def _steering_lifecycle(self):
         """Wrap the dispatch in the steering factory's lifecycle if it has one.
@@ -343,8 +378,7 @@ class AgentRuntime:
         from agents.base import BaseAgent
         guard = BudgetGuard(self._guardrail_config)
-        if hasattr(self._llm, "set_budget"):
-            self._llm.set_budget(guard)
+        self._attach_budget(guard)
         config = self._agent_registry.get(agent_id)
         agent = BaseAgent(
@@ -393,8 +427,7 @@ class AgentRuntime:
         config = self._agent_registry.get(checkpoint["agent_id"])
         guard = BudgetGuard(self._guardrail_config)
-        if hasattr(self._llm, "set_budget"):
-            self._llm.set_budget(guard)
+        self._attach_budget(guard)
         tracer = self._make_tracer()
         agent = BaseAgent(
@@ -501,8 +534,7 @@ class AgentRuntime:
             outer_run_id = checkpoint["run_id"]
             config = self._agent_registry.get(checkpoint["agent_id"])
             guard = BudgetGuard(self._guardrail_config)
-            if hasattr(self._llm, "set_budget"):
-                self._llm.set_budget(guard)
+            self._attach_budget(guard)
             tracer = self._make_tracer()
             agent = BaseAgent(
                 config=config,
@@ -557,8 +589,7 @@ class AgentRuntime:
         # Adapters that implement set_budget(guard) (e.g. OpenAILLM) get the
         # fresh per-run guard so they can call add_cost() on every completion.
         # Duck-typed so users can plug in any LLM client that doesn't.
-        if hasattr(self._llm, "set_budget"):
-            self._llm.set_budget(guard)
+        self._attach_budget(guard)
         # state lives in memory, not agents — instantiate fresh per run
         agents = {
@@ -583,6 +614,8 @@ class AgentRuntime:
             tracer=tracer,
             guard=guard,
             llm=self._llm,
+            planner_llm=self._planner_llm,
+            synthesizer_llm=self._synthesizer_llm,
             eval_config=EvalConfig(
                 confidence_threshold=self._guardrail_config.confidence_threshold,
                 max_replan_count=self._guardrail_config.max_replan_count,
@@ -605,7 +638,7 @@ class AgentRuntime:
             f"  {aid}: {self._agent_registry.get(aid).role}"
             for aid in self._agent_registry.all_ids()
         )
-        response = await self._llm.complete(
+        response = await self._classifier_llm.complete(
             system=_CLASSIFIER_SYSTEM.format(agent_descriptions=agent_descriptions),
             messages=[{"role": "user", "content": f"Goal: {goal}"}],
             response_format={"type": "json_object"},
@@ -708,7 +741,7 @@ class AgentRuntime:
         agent_descriptions = "\n".join(
             f"  {aid}: {self._agent_registry.get(aid).role}" for aid in all_ids
         )
-        response = await self._llm.complete(
+        response = await self._router_llm.complete(
             system=_ROUTER_SYSTEM.format(agent_descriptions=agent_descriptions),
             messages=[{"role": "user", "content": f"Goal: {goal}"}],
             response_format={"type": "json_object"},

react-agent-harness 0.5.2__tar.gz → 0.6.0__tar.gz

react-agent-harness 0.5.2tar.gz → 0.6.0tar.gz