PyPI - openrtc - Versions diffs - 0.2.1__tar.gz → 0.2.3__tar.gz - Mend

openrtc 0.2.1tar.gz → 0.2.3tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (152) hide show

{openrtc-0.2.1 → openrtc-0.2.3}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: openrtc
-Version: 0.2.1
+Version: 0.2.3
 Summary: Run multiple LiveKit voice agents in a single shared worker process.
 Project-URL: Homepage, https://github.com/mahimailabs/openrtc
 Project-URL: Repository, https://github.com/mahimailabs/openrtc
@@ -203,6 +203,76 @@ If a module has no `@agent_config`, the agent name defaults to the filename stem
 Discovered agents work with `livekit dev` and spawn-based workers on macOS. For `add()`, define agent classes at module scope so worker reload can import them.
+## Migrating from livekit-agents
+Already running one or more `livekit-agents` workers? Each is its own process that
+loads the same VAD and turn-detector models. Collapse them into one `AgentPool`
+worker without changing your agents.
+**Before** (one worker per agent, N processes):
+```python
+# restaurant_worker.py  (plus a near-identical dental_worker.py, support_worker.py, ...)
+from livekit import agents
+from livekit.agents import Agent, AgentSession
+from livekit.plugins import openai, silero
+class RestaurantAgent(Agent):
+    def __init__(self) -> None:
+        super().__init__(instructions="You help callers book tables.")
+async def entrypoint(ctx: agents.JobContext) -> None:
+    session = AgentSession(
+        stt=openai.STT(), llm=openai.LLM(), tts=openai.TTS(), vad=silero.VAD.load()
+    )
+    await session.start(agent=RestaurantAgent(), room=ctx.room)
+    await ctx.connect()
+if __name__ == "__main__":
+    agents.cli.run_app(agents.WorkerOptions(entrypoint_fnc=entrypoint))
+```
+**After** (one worker, N agents, one shared prewarm):
+```python
+# worker.py
+from livekit.agents import Agent
+from livekit.plugins import openai
+from openrtc import AgentPool
+class RestaurantAgent(Agent):  # unchanged
+    def __init__(self) -> None:
+        super().__init__(instructions="You help callers book tables.")
+class DentalAgent(Agent):  # unchanged
+    def __init__(self) -> None:
+        super().__init__(instructions="You help callers manage appointments.")
+pool = AgentPool(default_stt=openai.STT(), default_llm=openai.LLM(), default_tts=openai.TTS())
+pool.add("restaurant", RestaurantAgent)
+pool.add("dental", DentalAgent)
+pool.run()
+```
+Your `Agent` subclasses, tools, and provider objects are unchanged. You delete the
+per-worker boilerplate (`entrypoint`, `AgentSession` wiring, `cli.run_app`) and
+register the agents on one pool; OpenRTC owns prewarm, routing, and per-call
+session construction. On the first run the worker logs the win, for example:
+```text
+OpenRTC: 2 agents in 1 worker (baseline ~410 MB). 2 separate livekit-agents
+workers would cost ~820 MB; sharing one worker saves ~410 MB of idle baseline
+(assumes equal per-worker baselines).
+```
+See [Routing](#routing) for how each incoming call resolves to one registered agent.
 ## Memory: before and after
 Assume an illustrative **~400 MB** idle baseline per worker for the shared stack (VAD, turn detector, and similar). Your measured RSS will differ by provider, model, and OS.
@@ -266,6 +336,71 @@ footprint. Validate against the §8.4 real-LiveKit integration test
 `OPENAI_API_KEY`) before quoting a per-session memory number to your
 operators.
+### Throughput: steady-state event-loop p99
+Memory density is only half the question. N sessions share one event loop and
+one GIL, so the other half is whether the loop keeps up.
+`tests/benchmarks/throughput.py` drives N concurrent sessions through the real
+Silero VAD over synthetic 16 kHz PCM at 50 fps (the continuous on-loop CPU cost)
+and measures event-loop p99 latency, separating the startup burst from steady
+state.
+```bash
+uv run python tests/benchmarks/throughput.py --sessions 1,10,25,50,100
+```
+Sample sweep (Apple M-series laptop, `vad` workload, steady state):
+| Sessions | steady-state loop p99 | peak RSS |
+| ---: | ---: | ---: |
+|   1 | 0.9 ms | 160 MB |
+|  10 | 1.3 ms | 160 MB |
+|  25 | 1.2 ms | 160 MB |
+|  50 | 1.1 ms | 160 MB |
+| 100 | 2.8 ms | 160 MB |
+Steady-state VAD inference stays well under a 100 ms loop-latency budget to 100
+sessions, with flat resident memory (the model loads once). The expensive,
+bursty part is session *startup* (each `session.start()` plus greeting), which
+the benchmark reports as a separate `startup_p99` column and which dominates
+early-life latency. This workload models the continuous VAD path, not the full
+STT/LLM/TTS orchestration, so read it as the on-loop-CPU ceiling rather than a
+full-pipeline guarantee. Run it on your own hardware before quoting a
+sessions-per-worker number.
+### Prove it on your machine
+The process column above is estimated. This script measures both models for
+real on your laptop: it spawns one subprocess per session for the
+process-per-session model, runs the same number of sessions as `asyncio`
+tasks in a single process for the coroutine model, then prints the memory
+used each way. No LiveKit server, no API keys, no model download.
+```bash
+uv run python examples/density_demo.py                 # 16 sessions
+uv run python examples/density_demo.py --sessions 32   # the gap widens with N
+uv run python examples/density_demo.py --sessions 50 --load-vad   # adds the shared Silero VAD model
+```
+Sample output (Apple M-series laptop, import-only mode):
+```text
+Hosting 16 concurrent voice sessions. Measuring resident memory.
+  livekit-agents (process per session):     1861 MB total   ( 116.3 MB/session)
+  OpenRTC coroutine pool (one process):      195 MB total   (  12.2 MB/session)
+  OpenRTC uses 9.5x less memory for the same 16 sessions.
+```
+Your numbers vary by machine, and the ratio grows as you raise `--sessions`
+(the coroutine pool pays the import cost once and amortizes it across every
+session). This default mode counts only the `livekit-agents` import cost, so
+it is a conservative lower bound: `--load-vad` adds the shared Silero VAD
+model weights (paid once in the pool, once per process otherwise), and
+`tests/benchmarks/density.py --sessions 50` proves the 50-sessions-under-4-GB
+ceiling. The full script is [examples/density_demo.py](examples/density_demo.py).
 ## Routing
 One process hosts several agent classes, so each session must resolve to a single registered name. `AgentPool` resolves the agent in this order:

{openrtc-0.2.1 → openrtc-0.2.3}/README.md RENAMED Viewed

@@ -171,6 +171,76 @@ If a module has no `@agent_config`, the agent name defaults to the filename stem
 Discovered agents work with `livekit dev` and spawn-based workers on macOS. For `add()`, define agent classes at module scope so worker reload can import them.
+## Migrating from livekit-agents
+Already running one or more `livekit-agents` workers? Each is its own process that
+loads the same VAD and turn-detector models. Collapse them into one `AgentPool`
+worker without changing your agents.
+**Before** (one worker per agent, N processes):
+```python
+# restaurant_worker.py  (plus a near-identical dental_worker.py, support_worker.py, ...)
+from livekit import agents
+from livekit.agents import Agent, AgentSession
+from livekit.plugins import openai, silero
+class RestaurantAgent(Agent):
+    def __init__(self) -> None:
+        super().__init__(instructions="You help callers book tables.")
+async def entrypoint(ctx: agents.JobContext) -> None:
+    session = AgentSession(
+        stt=openai.STT(), llm=openai.LLM(), tts=openai.TTS(), vad=silero.VAD.load()
+    )
+    await session.start(agent=RestaurantAgent(), room=ctx.room)
+    await ctx.connect()
+if __name__ == "__main__":
+    agents.cli.run_app(agents.WorkerOptions(entrypoint_fnc=entrypoint))
+```
+**After** (one worker, N agents, one shared prewarm):
+```python
+# worker.py
+from livekit.agents import Agent
+from livekit.plugins import openai
+from openrtc import AgentPool
+class RestaurantAgent(Agent):  # unchanged
+    def __init__(self) -> None:
+        super().__init__(instructions="You help callers book tables.")
+class DentalAgent(Agent):  # unchanged
+    def __init__(self) -> None:
+        super().__init__(instructions="You help callers manage appointments.")
+pool = AgentPool(default_stt=openai.STT(), default_llm=openai.LLM(), default_tts=openai.TTS())
+pool.add("restaurant", RestaurantAgent)
+pool.add("dental", DentalAgent)
+pool.run()
+```
+Your `Agent` subclasses, tools, and provider objects are unchanged. You delete the
+per-worker boilerplate (`entrypoint`, `AgentSession` wiring, `cli.run_app`) and
+register the agents on one pool; OpenRTC owns prewarm, routing, and per-call
+session construction. On the first run the worker logs the win, for example:
+```text
+OpenRTC: 2 agents in 1 worker (baseline ~410 MB). 2 separate livekit-agents
+workers would cost ~820 MB; sharing one worker saves ~410 MB of idle baseline
+(assumes equal per-worker baselines).
+```
+See [Routing](#routing) for how each incoming call resolves to one registered agent.
 ## Memory: before and after
 Assume an illustrative **~400 MB** idle baseline per worker for the shared stack (VAD, turn detector, and similar). Your measured RSS will differ by provider, model, and OS.
@@ -234,6 +304,71 @@ footprint. Validate against the §8.4 real-LiveKit integration test
 `OPENAI_API_KEY`) before quoting a per-session memory number to your
 operators.
+### Throughput: steady-state event-loop p99
+Memory density is only half the question. N sessions share one event loop and
+one GIL, so the other half is whether the loop keeps up.
+`tests/benchmarks/throughput.py` drives N concurrent sessions through the real
+Silero VAD over synthetic 16 kHz PCM at 50 fps (the continuous on-loop CPU cost)
+and measures event-loop p99 latency, separating the startup burst from steady
+state.
+```bash
+uv run python tests/benchmarks/throughput.py --sessions 1,10,25,50,100
+```
+Sample sweep (Apple M-series laptop, `vad` workload, steady state):
+| Sessions | steady-state loop p99 | peak RSS |
+| ---: | ---: | ---: |
+|   1 | 0.9 ms | 160 MB |
+|  10 | 1.3 ms | 160 MB |
+|  25 | 1.2 ms | 160 MB |
+|  50 | 1.1 ms | 160 MB |
+| 100 | 2.8 ms | 160 MB |
+Steady-state VAD inference stays well under a 100 ms loop-latency budget to 100
+sessions, with flat resident memory (the model loads once). The expensive,
+bursty part is session *startup* (each `session.start()` plus greeting), which
+the benchmark reports as a separate `startup_p99` column and which dominates
+early-life latency. This workload models the continuous VAD path, not the full
+STT/LLM/TTS orchestration, so read it as the on-loop-CPU ceiling rather than a
+full-pipeline guarantee. Run it on your own hardware before quoting a
+sessions-per-worker number.
+### Prove it on your machine
+The process column above is estimated. This script measures both models for
+real on your laptop: it spawns one subprocess per session for the
+process-per-session model, runs the same number of sessions as `asyncio`
+tasks in a single process for the coroutine model, then prints the memory
+used each way. No LiveKit server, no API keys, no model download.
+```bash
+uv run python examples/density_demo.py                 # 16 sessions
+uv run python examples/density_demo.py --sessions 32   # the gap widens with N
+uv run python examples/density_demo.py --sessions 50 --load-vad   # adds the shared Silero VAD model
+```
+Sample output (Apple M-series laptop, import-only mode):
+```text
+Hosting 16 concurrent voice sessions. Measuring resident memory.
+  livekit-agents (process per session):     1861 MB total   ( 116.3 MB/session)
+  OpenRTC coroutine pool (one process):      195 MB total   (  12.2 MB/session)
+  OpenRTC uses 9.5x less memory for the same 16 sessions.
+```
+Your numbers vary by machine, and the ratio grows as you raise `--sessions`
+(the coroutine pool pays the import cost once and amortizes it across every
+session). This default mode counts only the `livekit-agents` import cost, so
+it is a conservative lower bound: `--load-vad` adds the shared Silero VAD
+model weights (paid once in the pool, once per process otherwise), and
+`tests/benchmarks/density.py --sessions 50` proves the 50-sessions-under-4-GB
+ceiling. The full script is [examples/density_demo.py](examples/density_demo.py).
 ## Routing
 One process hosts several agent classes, so each session must resolve to a single registered name. `AgentPool` resolves the agent in this order:

{openrtc-0.2.1 → openrtc-0.2.3}/docs/changelog.md RENAMED Viewed

@@ -147,16 +147,41 @@ contributor onboarding matches what's in the repo.
 <!-- releases -->
+## [0.2.2] - 2026-05-30
+### Fixed
+- Coroutine mode now establishes the LiveKit job context for the session duration, so `get_job_context()` works inside agents and sessions and shutdown callbacks run (MAH-158).
+- Coroutine sessions are held open until the call ends (room disconnect or `ctx.shutdown()`) instead of being marked SUCCESS when the entrypoint returns, so `max_concurrent_sessions` backpressure and runtime session counts are accurate (MAH-160).
+### Added
+- Real-audio throughput benchmark (`tests/benchmarks/throughput.py`) reporting steady-state event-loop p99 vs session count, separating startup from steady state (MAH-163).
+- `examples/density_demo.py`: a no-server demo comparing process-per-session vs coroutine-pool resident memory.
+### Changed
+- The coroutine real-room integration test is now a correctness gate (job context plus no-failure); throughput moved to the dedicated benchmark.
+---
+## [0.2.1] - 2026-05-06
+## What's Changed
+* [v0.2.1] File watcher infrastructure for agent code (MAH-80) by @mahimairaja in https://github.com/mahimailabs/openrtc-runtime/pull/39
+**Full Changelog**: https://github.com/mahimailabs/openrtc-runtime/compare/v0.1.0...v0.2.1
+---
 ## [0.1.0] - 2026-05-06
-## What's Changed
-* Feat: light websocket by @mahimairaja in https://github.com/mahimailabs/openrtc-runtime/pull/30
-* docs: bring docs/ in sync with v0.1 surface by @mahimairaja in https://github.com/mahimailabs/openrtc-runtime/pull/35
-* Feat: struc refac by @mahimairaja in https://github.com/mahimailabs/openrtc-runtime/pull/36
-* Feat/coroutine pool by @mahimairaja in https://github.com/mahimailabs/openrtc-runtime/pull/37
-* Feat/coroutine pool prod by @mahimairaja in https://github.com/mahimailabs/openrtc-runtime/pull/38
+## What's Changed
+* Feat: light websocket by @mahimairaja in https://github.com/mahimailabs/openrtc-runtime/pull/30
+* docs: bring docs/ in sync with v0.1 surface by @mahimairaja in https://github.com/mahimailabs/openrtc-runtime/pull/35
+* Feat: struc refac by @mahimairaja in https://github.com/mahimailabs/openrtc-runtime/pull/36
+* Feat/coroutine pool by @mahimairaja in https://github.com/mahimailabs/openrtc-runtime/pull/37
+* Feat/coroutine pool prod by @mahimairaja in https://github.com/mahimailabs/openrtc-runtime/pull/38
 **Full Changelog**: https://github.com/mahimailabs/openrtc-runtime/compare/v0.0.17...v0.1.0
 ---

openrtc-0.2.3/examples/density_demo.py ADDED Viewed

@@ -0,0 +1,163 @@
+"""Prove the OpenRTC density win, on one laptop, with real numbers.
+The claim: livekit-agents runs roughly one OS process per session (about
+3 GB each in production). OpenRTC's coroutine pool runs N sessions as
+asyncio tasks inside a single process, so the heavy per-process cost
+(Python interpreter, the livekit-agents import graph, and shared models
+like Silero VAD and the turn detector) is paid ONCE instead of N times.
+This script measures both models for real:
+  * "process-per-session" (what vanilla livekit-agents does):
+    spawn N subprocesses, each imports the agent stack and holds a
+    per-session buffer. We sum the resident memory across all of them.
+  * "OpenRTC coroutine pool" (the default isolation mode):
+    import the stack ONCE, run N asyncio sessions in this single process,
+    each holding the same per-session buffer. We read this process's
+    resident memory.
+Then it prints total memory each way, memory per session, and the ratio.
+No LiveKit server, no network, no model download required.
+Run it:
+    uv run python examples/density_demo.py                 # N = 16
+    uv run python examples/density_demo.py --sessions 32
+    uv run python examples/density_demo.py --sessions 50 --load-vad
+Use --load-vad to also load the real Silero VAD in every worker (the model
+livekit-agents would load per process and OpenRTC shares). It downloads
+ONNX weights on first run, then makes the gap even wider.
+"""
+from __future__ import annotations
+import argparse
+import asyncio
+import contextlib
+import multiprocessing as mp
+import os
+import time
+import psutil
+# Stand-in for one session's live audio plus conversation state. The real
+# per-session cost is dominated by the shared-vs-per-process fixed cost, so
+# the exact buffer size is not load-bearing; it just keeps each session honest.
+_SESSION_BUFFER_MB = 5
+def _import_stack(load_vad: bool) -> None:
+    """Pay the per-process import cost that livekit-agents incurs per session."""
+    import livekit.agents  # noqa: F401  (the real wheel, ~150 MB resident)
+    import openrtc  # noqa: F401
+    if load_vad:
+        # The shared model OpenRTC loads once in prewarm and livekit-agents
+        # loads in every worker process. Widens the gap; needs a one-time
+        # weights download.
+        from livekit.plugins import silero
+        silero.VAD.load()
+def _process_worker(ready: object, stop: object, load_vad: bool) -> None:
+    """One subprocess == one session, the livekit-agents process-per-job model."""
+    _import_stack(load_vad)
+    _buffer = bytearray(_SESSION_BUFFER_MB * 1024 * 1024)  # noqa: F841
+    ready.set()  # type: ignore[attr-defined]
+    stop.wait()  # type: ignore[attr-defined]  hold the buffer until measured
+def measure_process_model(sessions: int, load_vad: bool) -> float:
+    """Sum resident memory of N independent worker processes (MB)."""
+    # "spawn" matches LiveKit's default executor on macOS, so each child pays
+    # the full fresh-interpreter import cost, exactly as in production.
+    ctx = mp.get_context("spawn")
+    ready_events = [ctx.Event() for _ in range(sessions)]
+    stop_event = ctx.Event()
+    procs = [
+        ctx.Process(
+            target=_process_worker, args=(ready_events[i], stop_event, load_vad)
+        )
+        for i in range(sessions)
+    ]
+    for p in procs:
+        p.start()
+    for ev in ready_events:
+        ev.wait(timeout=120)  # every worker finished importing + allocated
+    time.sleep(0.5)  # let resident memory settle
+    total_bytes = 0
+    for p in procs:
+        with contextlib.suppress(
+            psutil.NoSuchProcess
+        ):  # a worker may have exited early
+            total_bytes += psutil.Process(p.pid).memory_info().rss
+    stop_event.set()
+    for p in procs:
+        p.join()
+    return total_bytes / (1024 * 1024)
+async def measure_coroutine_model(sessions: int, load_vad: bool) -> float:
+    """Resident memory of ONE process hosting N asyncio sessions (MB)."""
+    _import_stack(load_vad)  # paid once, in this process
+    async def _session() -> None:
+        _buffer = bytearray(_SESSION_BUFFER_MB * 1024 * 1024)
+        try:
+            await asyncio.sleep(3600)  # stay alive until measured
+        finally:
+            del _buffer
+    tasks = [asyncio.create_task(_session()) for _ in range(sessions)]
+    await asyncio.sleep(0.5)  # let all sessions allocate + settle
+    rss_mb = psutil.Process(os.getpid()).memory_info().rss / (1024 * 1024)
+    for t in tasks:
+        t.cancel()
+    await asyncio.gather(*tasks, return_exceptions=True)
+    return rss_mb
+def main() -> None:
+    parser = argparse.ArgumentParser(description=__doc__.split("\n", 1)[0])
+    parser.add_argument(
+        "--sessions", type=int, default=16, help="concurrent sessions (default 16)"
+    )
+    parser.add_argument(
+        "--load-vad",
+        action="store_true",
+        help="also load real Silero VAD in every worker",
+    )
+    args = parser.parse_args()
+    n = args.sessions
+    print(f"\nHosting {n} concurrent voice sessions. Measuring resident memory.\n")
+    # Process model first so this parent process stays light; the coroutine
+    # measurement then imports the stack into this same process on purpose.
+    process_mb = measure_process_model(n, args.load_vad)
+    coroutine_mb = asyncio.run(measure_coroutine_model(n, args.load_vad))
+    ratio = process_mb / coroutine_mb if coroutine_mb else float("inf")
+    print(
+        f"  livekit-agents (process per session): {process_mb:8.0f} MB total   "
+        f"({process_mb / n:6.1f} MB/session)"
+    )
+    print(
+        f"  OpenRTC coroutine pool (one process): {coroutine_mb:8.0f} MB total   "
+        f"({coroutine_mb / n:6.1f} MB/session)"
+    )
+    print(f"\n  OpenRTC uses {ratio:.1f}x less memory for the same {n} sessions.\n")
+    print("  Same agent code, both ways. In OpenRTC you flip one argument:")
+    print('    AgentPool(isolation="process")    # the left column above')
+    print('    AgentPool(isolation="coroutine")  # the right column (default)\n')
+if __name__ == "__main__":
+    main()

{openrtc-0.2.1 → openrtc-0.2.3}/src/openrtc/execution/coroutine.py RENAMED Viewed

@@ -14,6 +14,8 @@ Contracts derived from:
 from __future__ import annotations
 import asyncio
+import contextlib
+import contextvars
 import inspect
 import logging
 import uuid
@@ -25,7 +27,7 @@ from livekit import rtc
 from livekit.agents import JobContext, JobExecutorType, JobProcess, utils
 from livekit.agents.ipc import inference_executor as inference_executor_mod
 from livekit.agents.ipc.job_executor import JobStatus
-from livekit.agents.job import RunningJobInfo
+from livekit.agents.job import RunningJobInfo, _JobContextVar
 if TYPE_CHECKING:
     from livekit.agents.ipc.job_executor import JobExecutor
@@ -114,6 +116,7 @@ class CoroutineJobExecutor:
         self._session_end_fnc = session_end_fnc
         self._context_factory = context_factory
         self._loop = loop
+        self._shutdown_fut: asyncio.Future[str] | None = None
     @property
     def id(self) -> str:
@@ -271,22 +274,79 @@ class CoroutineJobExecutor:
         self._task = loop.create_task(self._run_entrypoint(ctx))
     async def _run_entrypoint(self, ctx: JobContext) -> None:
+        """Run the session lifecycle, mirroring upstream ``_run_job_task``.
+        Establishes the job context (so ``get_job_context()`` resolves inside
+        the entrypoint and the session), holds the session open until shutdown
+        is requested (room disconnect, ``ctx.shutdown()``, or an entrypoint
+        crash), then runs the teardown sequence. Every ``JobContext`` hook is
+        treated as optional so the executor still runs with the bare stub
+        contexts that unit tests and the density benchmark pass directly.
+        """
         assert self._entrypoint_fnc is not None  # checked in launch_job
+        loop = asyncio.get_running_loop()
+        shutdown_fut: asyncio.Future[str] = loop.create_future()
+        self._shutdown_fut = shutdown_fut
+        def _request_shutdown(reason: str = "shutdown") -> None:
+            if not shutdown_fut.done():
+                shutdown_fut.set_result(reason)
+        # Per-job log fields, then the contextvar (the MAH-158 fix).
+        _on_setup = getattr(ctx, "_on_setup", None)
+        if callable(_on_setup):
+            _on_setup()
+        token: contextvars.Token[JobContext] | None = None
+        with contextlib.suppress(Exception):
+            token = _JobContextVar.set(ctx)
+        # Shutdown triggers (all optional for stub contexts): ctx.shutdown()
+        # via on_shutdown, and the room "disconnected" event (mirrors
+        # job_proc_lazy_main's room-disconnected handler).
+        if hasattr(ctx, "_on_shutdown"):
+            def _on_shutdown(reason: str = "") -> None:
+                _request_shutdown(reason or "shutdown")
+            ctx._on_shutdown = _on_shutdown
+        _room_on = getattr(getattr(ctx, "room", None), "on", None)
+        if callable(_room_on):
+            _room_on("disconnected", lambda *_a: _request_shutdown("room disconnected"))
         try:
-            await self._entrypoint_fnc(ctx)
+            try:
+                await self._entrypoint_fnc(ctx)
+            except asyncio.CancelledError:
+                if self._status is JobStatus.RUNNING:
+                    self._status = JobStatus.FAILED
+                raise
+            except Exception:
+                if self._status is JobStatus.RUNNING:
+                    self._status = JobStatus.FAILED
+                logger.exception(
+                    "entrypoint raised in CoroutineJobExecutor",
+                    extra=self.logging_extra(),
+                )
+                return
+            # Entrypoint returned cleanly. Hold a real job open until the call
+            # ends (the MAH-160 fix), then run teardown. A setup-only entrypoint
+            # (no live session) or a fake job (simulate_job, which has no live
+            # room to disconnect) completes on return instead.
+            _is_fake = getattr(ctx, "is_fake_job", None)
+            fake_job = bool(_is_fake()) if callable(_is_fake) else False
+            if (
+                getattr(ctx, "_primary_agent_session", None) is not None
+                and not fake_job
+            ):
+                try:
+                    await shutdown_fut
+                except asyncio.CancelledError:
+                    if self._status is JobStatus.RUNNING:
+                        self._status = JobStatus.FAILED
+                    raise
+                await self._teardown(ctx, shutdown_fut.result())
             if self._status is JobStatus.RUNNING:
                 self._status = JobStatus.SUCCESS
-        except asyncio.CancelledError:
-            if self._status is JobStatus.RUNNING:
-                self._status = JobStatus.FAILED
-            raise
-        except Exception:
-            if self._status is JobStatus.RUNNING:
-                self._status = JobStatus.FAILED
-            logger.exception(
-                "entrypoint raised in CoroutineJobExecutor",
-                extra=self.logging_extra(),
-            )
         finally:
             if self._session_end_fnc is not None:
                 try:
@@ -296,6 +356,43 @@ class CoroutineJobExecutor:
                         "session_end_fnc raised in CoroutineJobExecutor",
                         extra=self.logging_extra(),
                     )
+            if token is not None:
+                with contextlib.suppress(Exception):
+                    _JobContextVar.reset(token)
+    async def _teardown(self, ctx: JobContext, reason: str) -> None:
+        """Run the post-shutdown lifecycle (mirrors upstream ``_run_job_task``).
+        Closes the primary ``AgentSession``, runs ``_on_session_end`` and the
+        registered shutdown callbacks, cancels pending tasks, and cleans up.
+        Every hook is optional so stub contexts in tests and benchmarks are
+        tolerated.
+        """
+        primary = getattr(ctx, "_primary_agent_session", None)
+        if primary is not None and hasattr(primary, "aclose"):
+            with contextlib.suppress(Exception):
+                await primary.aclose()
+        _on_session_end = getattr(ctx, "_on_session_end", None)
+        if callable(_on_session_end):
+            with contextlib.suppress(Exception):
+                await _on_session_end()
+        for callback in list(getattr(ctx, "_shutdown_callbacks", None) or []):
+            try:
+                await callback(reason)
+            except Exception:
+                logger.exception(
+                    "shutdown callback raised in CoroutineJobExecutor",
+                    extra=self.logging_extra(),
+                )
+        pending = list(getattr(ctx, "_pending_tasks", None) or [])
+        if pending:
+            for task in pending:
+                task.cancel()
+            await asyncio.gather(*pending, return_exceptions=True)
+        _on_cleanup = getattr(ctx, "_on_cleanup", None)
+        if callable(_on_cleanup):
+            with contextlib.suppress(Exception):
+                _on_cleanup()
     def logging_extra(self) -> dict[str, Any]:
         return {"executor_id": self._id}

openrtc 0.2.1__tar.gz → 0.2.3__tar.gz

openrtc 0.2.1tar.gz → 0.2.3tar.gz