PyPI - react-agent-harness - Versions diffs - 0.5.2__tar.gz → 0.6.1__tar.gz - Mend

react-agent-harness 0.5.2tar.gz → 0.6.1tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (85) hide show

{react_agent_harness-0.5.2/react_agent_harness.egg-info → react_agent_harness-0.6.1}/PKG-INFO RENAMED Viewed

@@ -1,7 +1,7 @@
 Metadata-Version: 2.4
 Name: react-agent-harness
-Version: 0.5.2
-Summary: Multi-agent LLM orchestration: hybrid DAG planning, two-tier memory, streaming
+Version: 0.6.1
+Summary: Multi-agent LLM orchestration: hybrid DAG planning, two-tier memory, streaming, cost/token budgets with per-call-site breakdown
 Requires-Python: >=3.10
 License-File: LICENSE
 Requires-Dist: prompt_toolkit>=3.0

{react_agent_harness-0.5.2 → react_agent_harness-0.6.1}/README.md RENAMED Viewed

@@ -1,7 +1,9 @@
 # react-agent-harness
 Bring-your-own-LLM multi-agent harness: hybrid DAG planning with replan-on-failure,
-two-tier memory (semantic KV + episodic vector), and a streaming-primary event model.
+two-tier memory (semantic KV + episodic vector), a streaming-primary event model,
+and cost/token budgets with per-call-site attribution (classifier, router,
+planner, synthesizer, agent).
 Config-driven — register tools and agents, run any goal. No subclassing.
@@ -33,13 +35,21 @@ events to stdout and prints elapsed time + cost at the end.
 ## Architecture
 ```
-harness/runtime.py          AgentRuntime — single entry point, wire once run anything
+harness/runtime.py          AgentRuntime — single entry point; BudgetGuard with cost/token caps + per-call-site breakdown
 harness/events.py           BusEvent + EventType — canonical event vocabulary
-harness/llm/openai.py       OpenAILLM — OpenAI adapter with usage + cost tracking
+harness/llm/openai.py       OpenAILLM — OpenAI API-key adapter with usage + cost tracking
+harness/llm/anthropic.py    AnthropicLLM — direct Anthropic API-key adapter with prompt-caching support
+harness/llm/claude_code.py  ClaudeCodeLLM — Claude subscription OAuth adapter (experimental, ToS caveats)
+harness/llm/openai_codex.py OpenAICodexLLM — ChatGPT subscription OAuth adapter (experimental, ToS caveats)
+harness/llm/auth.py         Shared OAuth + auth-file primitives for the subscription adapters
+harness/llm/fallback.py     FallbackLLM — transparent retry on transient upstream errors
+harness/llm/routing.py      RoutingLLM — dispatch calls to different adapters by a selector
+harness/trace.py            JSONL trace recorder + replay — durable, per-event flush
+harness/trace_viewer.py     Local web timeline viewer for recorded JSONL traces
 harness/annotation.py       Annotation store + AnnotationHook — RLHF trajectory capture
 harness/hitl.py             HITL approval gate — interactive CLI, session-allow list
 harness/tool_policy.py      Persistent tool policy — user-scoped allow rules, CLI management
-harness/console.py          ConsoleRenderer — centralised BusEvent formatting for CLI apps
+harness/console.py          ConsoleRenderer — centralised BusEvent formatting + render_budget helper
 harness/steering.py         Async steering — agent.steer(text), StdinRouter pub/sub, FileSteer, factory helpers
 harness/checkpoint.py       CheckpointStore + _ResumeHint + maybe_resume_key — pluggable run-state persistence (file + Redis); auto-resume built into dispatch_stream / run_stream
 harness/otel.py             OTELHook — OpenTelemetry span exporter (opt-in)
@@ -54,7 +64,8 @@ memory/redis_store.py       Redis semantic store — durable KV with TTL
 memory/stores.py            InMemory stores — local dev default, no deps
 tools/builtin/http_fetch.py HTTPFetch — minimal read-only GET tool
 tools/builtin/fetch_image.py FetchImage — fetch URL and return OpenAI image_url block
-tools/mcp/adapter.py        MCP tool adapter — connect any MCP server
+tools/mcp/adapter.py        MCP tool adapter — stdio, SSE, streamable-HTTP transports
+tools/mcp/auth.py           ApiKeyMCPAuth + BrowserOAuthMCPAuth — auth primitives for remote MCP servers
 ```
 Execution is **streaming-primary**: every path yields `BusEvent`s for
@@ -188,6 +199,99 @@ llm = ClaudeCodeLLM(
 )
 ```
+### Cost shaping + reliability
+Two patterns, ordered by how production teams actually solve this:
+**1. Per-call-site LLM injection (the recommended pattern)**
+`AgentRuntime` exposes one slot per orchestrator call site. Each defaults to
+`llm` when unset, so existing code keeps working. The classifier and router
+both see only the goal + agent descriptions (~300 tokens) and emit a
+one-token decision — natural candidates for a cheaper model. The planner
+and synthesiser produce structured DAGs and final answers and usually want
+to stay on the main model.
+```python
+runtime = AgentRuntime(
+    agent_registry=agents,
+    tool_registry=tools,
+    memory=memory,
+    llm=premium,                 # default — agent ReAct loops use this
+    classifier_llm=cheap,        # simple vs complex dispatch decision
+    router_llm=cheap,            # single-agent picker
+    # planner_llm=...            # defaults to llm; override only if you want
+    # synthesizer_llm=...        # defaults to llm
+)
+```
+No guessing, no keyword matching, no fragility — you read the runtime
+construction and you know exactly which model serves which purpose. The
+budget guard is wired into every distinct LLM instance automatically
+(deduped by object identity, so injecting the same wrapper into multiple
+slots costs no extra calls).
+**2. `FallbackLLM` for resilience**
+Try each adapter in order; transparently switch to the next on rate
+limits, timeouts, or 5xx errors:
+```python
+from harness.llm.fallback import FallbackLLM
+llm = FallbackLLM([
+    AnthropicLLM(model="claude-sonnet-4-6"),   # primary
+    OpenAILLM(model="gpt-4o-mini"),            # backup
+])
+runtime = AgentRuntime(..., llm=llm)
+print(llm.last_route)   # 0 if primary worked, 1 if backup did
+```
+Permanent errors (auth, bad request) propagate immediately — only transient
+upstream errors trigger fallback. Customise with `transient_errors=...`.
+Streaming retries only fire before the first token; mid-stream failures
+propagate to preserve response integrity.
+**3. `RoutingLLM` for bring-your-own-selector cases**
+When you need runtime routing — capability gating (`vision` vs
+`long_context`), learned classifiers (RouteLLM-style), cascade
+routing (cheap-then-escalate-on-low-confidence) — wrap a routes dict
+with your own selector callable:
+```python
+from harness.llm.routing import RoutingLLM
+def by_capability(system, messages):
+    if _needs_vision(messages):
+        return "vision"
+    if _estimated_tokens(system, messages) > 100_000:
+        return "long_context"
+    return "default"
+llm = RoutingLLM(
+    routes={
+        "default":      OpenAILLM(model="gpt-4o-mini"),
+        "vision":       OpenAILLM(model="gpt-4o"),
+        "long_context": AnthropicLLM(model="claude-sonnet-4-6"),
+    },
+    selector=by_capability,
+    default_route="default",
+)
+```
+The harness intentionally does not ship default selectors. Naive selectors
+(keyword matching, fixed token thresholds) misroute in subtle ways and
+encourage the wrong mental model — if you're reaching for one, you almost
+certainly want per-call-site injection instead.
+Compose freely: `FallbackLLM([premium, backup])` injected into the
+`llm=` slot gives the agent loops resilience, with `classifier_llm=cheap`
+and `router_llm=cheap` shaping the cheap-call cost — all without a custom
+selector.
+---
 `ClaudeCodeLLM` reads a `claude-code` OAuth entry, refreshes it automatically
 when expired, and retries once after `401`/`403`. This mirrors Pi's Claude
 Pro/Max extension approach rather than shelling out to the Claude CLI. The
@@ -495,6 +599,92 @@ Cost ceiling fires on the *next* `check()` (start of next ReAct step or
 orchestrator batch), not synchronously mid-call — accept this for 0.0.1, the
 guard's job is preventing runaway loops, not bounding individual calls.
+### Token limits + per-call-site breakdown
+`GuardrailConfig.max_input_tokens` / `max_output_tokens` cap raw token
+usage independently of dollar cost. This is the only enforcement available
+to subscription-auth runs (`ClaudeCodeLLM`, `OpenAICodexLLM`) — those tiers
+don't expose pricing, so cost stays 0 and only token caps can fire.
+```python
+runtime = AgentRuntime(
+    ...,
+    guardrail_config=GuardrailConfig(
+        max_total_cost_usd=2.0,
+        max_input_tokens=100_000,
+        max_output_tokens=20_000,
+    ),
+)
+```
+Per-call-site attribution lives on the terminal event's `budget` payload
+— a snapshot of spending bucketed by the LLM slot that ran each call.
+The runtime tags classifier / router / planner / synthesizer calls
+automatically; ReAct agent calls go into the totals but don't get a
+bucket. So `cheap` (used for both `classifier_llm` and `router_llm`) and
+`premium` (used for `planner_llm`) report separately even though one is
+the same physical LLM instance shared across slots:
+```python
+async for event in runtime.dispatch_stream(goal):
+    # Routed (simple) goals terminate with TASK_DONE; orchestrated goals
+    # with DONE. Both carry the same ``budget`` shape.
+    if event.type in (EventType.TASK_DONE, EventType.DONE):
+        budget = event.payload["budget"]
+        print(f"total: in={budget['tokens_in']} out={budget['tokens_out']} "
+              f"${budget['cost_usd']:.4f}")
+        for slot, stats in budget["breakdown"].items():
+            print(f"  {slot}: in={stats['tokens_in']} out={stats['tokens_out']}")
+```
+The same `budget` dict is attached to `runtime.run(...)` and
+`runtime.dispatch(...)` return values under the `budget` key, so blocking
+callers don't need to read events.
+Anthropic / Claude Code adapters count input tokens as the *total* that
+hit the wire (non-cached + cache-creation + cache-read), so token caps
+reflect actual consumption regardless of cache hit rate. Cost calculation
+via `cost_fn` still respects cache pricing.
+### Evals via the trace recorder
+There's no shipped evals framework — opinions on scorers, judge models,
+and golden-set management belong outside the orchestration core. The
+[trace recorder](#trace-recorder--replay--local-viewer) already writes
+per-event token/cost/latency to JSONL, so a few lines of glue cover most
+in-house eval setups:
+```python
+import json
+from harness.trace import record_trace
+# 1. Record traces while running a fixture set.
+for fixture in fixtures:
+    async for _event in record_trace(
+        runtime.dispatch_stream(fixture["input"]),
+        path=f"runs/{fixture['id']}.jsonl",
+    ):
+        pass
+# 2. Score offline by replaying.
+def score_run(path: str, expected: str) -> dict:
+    answer = ""
+    budget = {"tokens_in": 0, "tokens_out": 0, "cost_usd": 0.0, "breakdown": {}}
+    for line in open(path):
+        event = json.loads(line)
+        if event["type"] in ("done", "task_done"):
+            answer = event["payload"].get("answer", "")
+            budget = event["payload"].get("budget", budget)
+    return {
+        "success": expected.lower() in answer.lower(),
+        **budget,  # tokens_in, tokens_out, cost_usd, breakdown
+    }
+```
+Plug in your own scorer (exact-match, LLM-judge, semantic similarity) on
+top. External tools like Braintrust, LangSmith, and Weave are
+purpose-built for this and ingest the same JSONL shape directly.
 ## Tool execution
 Tools that shell out (`kubectl`, `curl`, `sh -c …`) should not run inside the
@@ -722,6 +912,49 @@ The OTEL hook is a side-channel on the existing `Tracer` — the in-memory trace
 is always available via `result["trace"]` regardless of whether OTEL is enabled.
 Zero overhead and zero imports when `enable_otel=False`.
+## Trace recorder + replay + local viewer
+For local debug and post-mortem inspection without an OTEL backend, the
+harness ships a JSONL trace recorder and a stdlib-only HTML viewer. Wrap
+any streaming call:
+```python
+from harness.trace import record_trace, replay
+async for event in record_trace(runtime.dispatch_stream(goal), "run.jsonl"):
+    ...  # your normal handling
+```
+Each `BusEvent` is flushed per-line, so a partial trace survives a crash.
+View the trace in your browser:
+```bash
+agent-harness trace view run.jsonl     # opens http://127.0.0.1:8765/
+```
+The viewer is a single embedded HTML page — vertical timeline, filter by
+agent / event type / text, expandable per-event JSON. No build step, no
+external services.
+Replay a trace through `ConsoleRenderer` (great for grepping or piping
+into another script):
+```bash
+agent-harness trace replay run.jsonl
+agent-harness trace replay run.jsonl --realtime --speed 2.0
+```
+Programmatic replay yields reconstructed `BusEvent` objects:
+```python
+async for event in replay("run.jsonl", realtime=False):
+    ...  # reuse the same loops you write for live streams
+```
+This is complementary to OTEL — OTEL is for production observability and
+long-term storage in Jaeger/Datadog; the JSONL recorder is for local
+debugging, sharing reproductions, and replaying past runs.
 ## Vision / multimodal agents
 `WorkingMemory` accepts `str | list` content so image blocks pass through to

{react_agent_harness-0.5.2 → react_agent_harness-0.6.1}/agents/base.py RENAMED Viewed

@@ -422,6 +422,12 @@ class BaseAgent:
                         "summarizations": self._working_memory.summarization_count,
                     },
                 }
+                # Attach the current budget snapshot so dispatch_stream
+                # consumers can read totals + per-call-site breakdown off
+                # the routed path's terminal event, same shape as the
+                # orchestrator's DONE event.
+                if self._guard is not None and hasattr(self._guard, "snapshot"):
+                    result["budget"] = self._guard.snapshot()
                 logger.info(
                     "Agent %s completed: steps=%d confidence=%.2f summarizations=%d",
                     self.config.agent_id,
@@ -653,11 +659,17 @@ class BaseAgent:
             payload=before_usage,
         )
+        # Tag ReAct spending so it shows up in BudgetGuard.breakdown alongside
+        # classifier/router/planner/synthesizer. Per-agent attribution makes
+        # multi-agent demos surface which specialist agent actually drove the
+        # bulk of token usage.
+        react_source = f"agent:{self.config.agent_id}"
         try:
             if hasattr(self._llm, "stream_complete"):
                 async for token in self._llm.stream_complete(
                     system=None,
                     messages=messages,
+                    source=react_source,
                 ):
                     accumulated += token
                     if self.config.stream_tokens:
@@ -679,6 +691,7 @@ class BaseAgent:
                     system=None,
                     messages=messages,
                     response_format={"type": "json_object"},
+                    source=react_source,
                 )
                 response = _normalize_response(raw)
                 if response is None:

{react_agent_harness-0.5.2 → react_agent_harness-0.6.1}/harness/cli.py RENAMED Viewed

@@ -46,6 +46,19 @@ def main() -> int:
     policy_clear = policy_sub.add_parser("clear", help="remove all policy rules")
     policy_clear.add_argument("--policy-file", default=str(default_policy_file()))
+    trace = sub.add_parser("trace", help="view or replay a recorded run trace")
+    trace_sub = trace.add_subparsers(dest="trace_command", required=True)
+    trace_view = trace_sub.add_parser("view", help="open a local web viewer for a trace")
+    trace_view.add_argument("path", help="path to a JSONL trace produced by record_trace")
+    trace_view.add_argument("--port", type=int, default=8765)
+    trace_view.add_argument("--no-open", action="store_true", help="don't auto-open the browser")
+    trace_replay = trace_sub.add_parser("replay", help="dump a trace to stdout via ConsoleRenderer")
+    trace_replay.add_argument("path", help="path to a JSONL trace produced by record_trace")
+    trace_replay.add_argument(
+        "--realtime", action="store_true", help="preserve recorded inter-event timing"
+    )
+    trace_replay.add_argument("--speed", type=float, default=1.0, help="realtime speed multiplier")
     args = parser.parse_args()
     try:
         if args.command == "login":
@@ -71,6 +84,16 @@ def main() -> int:
                 return _policy_revoke(path, args.rule_id)
             if args.policy_command == "clear":
                 return _policy_clear(path)
+        if args.command == "trace":
+            if args.trace_command == "view":
+                from harness.trace_viewer import serve
+                serve(args.path, port=args.port, open_browser=not args.no_open)
+                return 0
+            if args.trace_command == "replay":
+                return asyncio.run(
+                    _trace_replay(args.path, realtime=args.realtime, speed=args.speed)
+                )
     except Exception as e:
         print(f"agent-harness: {e}", file=sys.stderr)
         return 1
@@ -180,5 +203,16 @@ def _policy_clear(path: Path) -> int:
     return 0
+async def _trace_replay(path: str, *, realtime: bool, speed: float) -> int:
+    """Read a JSONL trace and render it via ConsoleRenderer."""
+    from harness.console import ConsoleRenderer
+    from harness.trace import replay
+    renderer = ConsoleRenderer()
+    async for event in replay(path, realtime=realtime, speed=speed):
+        renderer.render(event)
+    return 0
 if __name__ == "__main__":
     raise SystemExit(main())

{react_agent_harness-0.5.2 → react_agent_harness-0.6.1}/harness/console.py RENAMED Viewed

@@ -178,17 +178,54 @@ class ConsoleRenderer:
             self.sep("═")
             print(p.get("answer", "(no answer)"), file=self._out)
             self.sep()
+            # ``budget`` snapshot supersedes the flat cost/elapsed fields when
+            # present (added with token caps + per-call-site breakdown).
+            budget = p.get("budget") or {}
+            cost = budget.get("cost_usd", p.get("cost_usd", 0))
+            elapsed = budget.get("elapsed_seconds", p.get("elapsed_seconds", 0))
             print(
                 f"Confidence: {p.get('confidence', 0):.2f}  |  "
                 f"Replans: {p.get('replan_count', 0)}  |  "
-                f"Cost: ${p.get('cost_usd', 0):.4f}  |  "
-                f"Time: {p.get('elapsed_seconds', 0):.1f}s",
+                f"Cost: ${cost:.4f}  |  "
+                f"Time: {elapsed:.1f}s",
                 file=self._out,
             )
+            self.render_budget(budget)
         elif t == EventType.ERROR:
             print(f"\n[error]      {event.error}", file=sys.stderr)
+    def render_budget(self, budget: dict | None) -> None:
+        """Print tokens + per-call-site breakdown from a ``BudgetGuard.snapshot()``
+        dict. Safe to call with ``{}`` or ``None`` — prints nothing when
+        there's no usage to show.
+        Exposed publicly so demos and other consumers that own their own
+        DONE / TASK_DONE rendering can still surface the breakdown without
+        duplicating the formatting.
+        """
+        if not budget:
+            return
+        tokens_in = budget.get("tokens_in")
+        tokens_out = budget.get("tokens_out")
+        if tokens_in is not None or tokens_out is not None:
+            print(
+                f"Tokens:     in={int(tokens_in or 0):,}  out={int(tokens_out or 0):,}",
+                file=self._out,
+            )
+        breakdown = budget.get("breakdown") or {}
+        if breakdown:
+            # Right-pad the slot label so columns line up — matters when
+            # the demo prints multiple slots in sequence.
+            width = max(len(name) for name in breakdown)
+            for slot, stats in breakdown.items():
+                print(
+                    f"  {slot:<{width}}  "
+                    f"in={int(stats.get('tokens_in', 0)):>7,}  "
+                    f"out={int(stats.get('tokens_out', 0)):>6,}",
+                    file=self._out,
+                )
     # ── private helpers ───────────────────────────────────────────────────────
     def _label(self, event: BusEvent) -> str:

{react_agent_harness-0.5.2 → react_agent_harness-0.6.1}/harness/llm/anthropic.py RENAMED Viewed

@@ -91,6 +91,8 @@ class AnthropicLLM:
         self,
         system: str | None,
         messages: list[dict],
+        *,
+        source: str | None = None,
         **kwargs: Any,
     ) -> dict:
         max_tokens = int(kwargs.pop("max_tokens", self._max_tokens))
@@ -110,7 +112,7 @@ class AnthropicLLM:
         cost = _compute_cost(usage, self._cost_fn)
         if cost is not None:
             usage["cost_usd"] = cost
-        self._record_cost(usage)
+        self._record_usage(usage, source=source)
         self.last_usage = usage
         text = _collect_text(resp.content)
@@ -122,6 +124,8 @@ class AnthropicLLM:
         self,
         system: str | None,
         messages: list[dict],
+        *,
+        source: str | None = None,
     ) -> AsyncGenerator[str, None]:
         sys_blocks = _system_blocks(system, prompt_caching=self._prompt_caching)
         built_messages = _build_messages(messages, prompt_caching=self._prompt_caching)
@@ -143,17 +147,34 @@ class AnthropicLLM:
             cost = _compute_cost(usage, self._cost_fn)
             if cost is not None:
                 usage["cost_usd"] = cost
-            self._record_cost(usage)
+            self._record_usage(usage, source=source)
             self.last_usage = usage
     # ── Internals ─────────────────────────────────────────────────────────────
-    def _record_cost(self, usage: dict) -> None:
-        if not self._budget:
+    def _record_usage(self, usage: dict, *, source: str | None) -> None:
+        """Forward usage to the budget guard.
+        Token count for budget purposes is the total input that hit the wire
+        — non-cached + cache-creation + cache-read — so token caps reflect
+        real wall-clock consumption regardless of cache hit rate. Cost
+        (which respects cache pricing via ``cost_fn``) is reported when
+        known.
+        """
+        guard = self._budget
+        if not guard:
             return
+        tokens_in = (
+            int(usage.get("tokens_in") or 0)
+            + int(usage.get("cache_read_tokens") or 0)
+            + int(usage.get("cache_creation_tokens") or 0)
+        )
+        tokens_out = int(usage.get("tokens_out") or 0)
+        if (tokens_in or tokens_out) and hasattr(guard, "add_tokens"):
+            guard.add_tokens(tokens_in, tokens_out, source=source)
         cost = usage.get("cost_usd")
         if cost and cost > 0:
-            self._budget.add_cost(cost)
+            guard.add_cost(cost, source=source)
 # ── Module-level helpers ──────────────────────────────────────────────────────

{react_agent_harness-0.5.2 → react_agent_harness-0.6.1}/harness/llm/claude_code.py RENAMED Viewed

@@ -68,12 +68,24 @@ class ClaudeCodeLLM:
         self._user_agent = user_agent or _default_user_agent()
         self._betas = betas
         self._prompt_caching = prompt_caching
+        self._budget: Any = None
         self.last_usage: dict | None = None
+    def set_budget(self, guard: Any) -> None:
+        """Inject a BudgetGuard so token caps fire on subscription-auth runs.
+        Cost stays 0 (no pricing schedule available for the subscription
+        tier), but ``add_tokens`` still lands so ``max_input_tokens`` /
+        ``max_output_tokens`` are enforced.
+        """
+        self._budget = guard
     async def complete(
         self,
         system: str | None,
         messages: list[dict],
+        *,
+        source: str | None = None,
         **kwargs: Any,
     ) -> dict:
         """Collect the streaming response into a single text + usage dict.
@@ -84,7 +96,9 @@ class ClaudeCodeLLM:
         """
         max_tokens = int(kwargs.pop("max_tokens", self._max_tokens))
         parts: list[str] = []
-        async for delta in self._iter_stream(system, messages, max_tokens=max_tokens, extra=kwargs):
+        async for delta in self._iter_stream(
+            system, messages, max_tokens=max_tokens, extra=kwargs, source=source
+        ):
             parts.append(delta)
         text = "".join(parts)
         if not text:
@@ -95,9 +109,11 @@ class ClaudeCodeLLM:
         self,
         system: str | None,
         messages: list[dict],
+        *,
+        source: str | None = None,
     ) -> AsyncGenerator[str, None]:
         async for delta in self._iter_stream(
-            system, messages, max_tokens=self._max_tokens, extra={}
+            system, messages, max_tokens=self._max_tokens, extra={}, source=source
         ):
             yield delta
@@ -114,6 +130,7 @@ class ClaudeCodeLLM:
         *,
         max_tokens: int,
         extra: dict[str, Any],
+        source: str | None = None,
     ) -> AsyncGenerator[str, None]:
         """Single source of truth: open Anthropic SSE stream, yield text
         deltas, populate `self.last_usage`. Auth refresh on 401/403
@@ -182,10 +199,31 @@ class ClaudeCodeLLM:
                     "total_tokens": tokens_in + tokens_out,
                     "provider": "claude-code",
                 }
+                self._record_usage(self.last_usage, source=source)
                 return
         raise RuntimeError("Claude Code authentication failed after refresh")
+    def _record_usage(self, usage: dict, *, source: str | None) -> None:
+        """Report token totals to the budget guard.
+        Tokens budgeted = total input that hit the wire (non-cached +
+        cache-creation + cache-read) plus output tokens — so ``max_input_tokens``
+        / ``max_output_tokens`` reflect real consumption regardless of cache
+        hit rate. No cost is reported (subscription auth, no pricing).
+        """
+        guard = self._budget
+        if not guard or not hasattr(guard, "add_tokens"):
+            return
+        tokens_in = (
+            int(usage.get("tokens_in") or 0)
+            + int(usage.get("cache_read_tokens") or 0)
+            + int(usage.get("cache_creation_tokens") or 0)
+        )
+        tokens_out = int(usage.get("tokens_out") or 0)
+        if tokens_in or tokens_out:
+            guard.add_tokens(tokens_in, tokens_out, source=source)
     async def _get_client(self) -> Any:
         if self._client is None:
             try:

react-agent-harness 0.5.2__tar.gz → 0.6.1__tar.gz

react-agent-harness 0.5.2tar.gz → 0.6.1tar.gz