PyPI - verifiers - Versions diffs - 0.1.11.dev0__tar.gz → 0.1.11.dev1__tar.gz - Mend

verifiers 0.1.11.dev0tar.gz → 0.1.11.dev1tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (185) hide show

{verifiers-0.1.11.dev0 → verifiers-0.1.11.dev1}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: verifiers
-Version: 0.1.11.dev0
+Version: 0.1.11.dev1
 Summary: Verifiers: Environments for LLM Reinforcement Learning
 Project-URL: Homepage, https://github.com/primeintellect-ai/verifiers
 Project-URL: Documentation, https://github.com/primeintellect-ai/verifiers
@@ -106,7 +106,7 @@ Verifiers: Environments for LLM Reinforcement Learning
 - [01/08/26] v0.1.9 is released, featuring a number of new experimental environment class types, monitor rubrics for automatic metric collection, improved workspace setup flow, improved error handling, bug fixes, and a documentation overhaul.
 - [11/19/25] v0.1.8 is released, featuring a major refactor of the rollout system to use trajectory-based tracking for token-in token-out training across turns, as well as support for truncated or branching rollouts.
-- [11/07/25] Verifiers v0.1.7 is released! This includes an improved quickstart configuration for training with [prime-rl], a new included "nano" trainer (`vf.RLTrainer`, replacing `vf.GRPOTrainer`), and a number of bug fixes and improvements to the documentation.
+- [11/07/25] Verifiers v0.1.7 is released! This includes an improved quickstart configuration for training with [prime-rl](https://github.com/PrimeIntellect-ai/prime-rl), a new included "nano" trainer (`vf.RLTrainer`, replacing `vf.GRPOTrainer`), and a number of bug fixes and improvements to the documentation.
 - [10/27/25] A new iteration of the Prime Intellect [Environments Program](https://docs.google.com/spreadsheets/d/13UDfRDjgIZXsMI2s9-Lmn8KSMMsgk2_zsfju6cx_pNU/edit?gid=0#gid=0) is live!
@@ -229,17 +229,17 @@ prime eval run primeintellect/math-python
 ## Documentation
-**[Environments](environments.md)** — Create datasets, rubrics, and custom multi-turn interaction protocols.
+**[Environments](docs/environments.md)** — Create datasets, rubrics, and custom multi-turn interaction protocols.
-**[Evaluation](evaluation.md)** - Evaluate models using your environments.
+**[Evaluation](docs/evaluation.md)** - Evaluate models using your environments.
-**[Training](training.md)** — Train models in your environments with reinforcement learning.
+**[Training](docs/training.md)** — Train models in your environments with reinforcement learning.
-**[Development](development.md)** — Contributing to verifiers
+**[Development](docs/development.md)** — Contributing to verifiers
-**[API Reference](reference.md)** — Understanding the API and data structures
+**[API Reference](docs/reference.md)** — Understanding the API and data structures
-**[FAQs](faqs.md)** - Other frequently asked questions.
+**[FAQs](docs/faqs.md)** - Other frequently asked questions.
 ## Citation

{verifiers-0.1.11.dev0 → verifiers-0.1.11.dev1}/README.md RENAMED Viewed

@@ -36,7 +36,7 @@ Verifiers: Environments for LLM Reinforcement Learning
 - [01/08/26] v0.1.9 is released, featuring a number of new experimental environment class types, monitor rubrics for automatic metric collection, improved workspace setup flow, improved error handling, bug fixes, and a documentation overhaul.
 - [11/19/25] v0.1.8 is released, featuring a major refactor of the rollout system to use trajectory-based tracking for token-in token-out training across turns, as well as support for truncated or branching rollouts.
-- [11/07/25] Verifiers v0.1.7 is released! This includes an improved quickstart configuration for training with [prime-rl], a new included "nano" trainer (`vf.RLTrainer`, replacing `vf.GRPOTrainer`), and a number of bug fixes and improvements to the documentation.
+- [11/07/25] Verifiers v0.1.7 is released! This includes an improved quickstart configuration for training with [prime-rl](https://github.com/PrimeIntellect-ai/prime-rl), a new included "nano" trainer (`vf.RLTrainer`, replacing `vf.GRPOTrainer`), and a number of bug fixes and improvements to the documentation.
 - [10/27/25] A new iteration of the Prime Intellect [Environments Program](https://docs.google.com/spreadsheets/d/13UDfRDjgIZXsMI2s9-Lmn8KSMMsgk2_zsfju6cx_pNU/edit?gid=0#gid=0) is live!
@@ -159,17 +159,17 @@ prime eval run primeintellect/math-python
 ## Documentation
-**[Environments](environments.md)** — Create datasets, rubrics, and custom multi-turn interaction protocols.
+**[Environments](docs/environments.md)** — Create datasets, rubrics, and custom multi-turn interaction protocols.
-**[Evaluation](evaluation.md)** - Evaluate models using your environments.
+**[Evaluation](docs/evaluation.md)** - Evaluate models using your environments.
-**[Training](training.md)** — Train models in your environments with reinforcement learning.
+**[Training](docs/training.md)** — Train models in your environments with reinforcement learning.
-**[Development](development.md)** — Contributing to verifiers
+**[Development](docs/development.md)** — Contributing to verifiers
-**[API Reference](reference.md)** — Understanding the API and data structures
+**[API Reference](docs/reference.md)** — Understanding the API and data structures
-**[FAQs](faqs.md)** - Other frequently asked questions.
+**[FAQs](docs/faqs.md)** - Other frequently asked questions.
 ## Citation

{verifiers-0.1.11.dev0 → verifiers-0.1.11.dev1}/tests/conftest.py RENAMED Viewed

@@ -554,6 +554,9 @@ def make_metadata() -> Callable[..., GenerateMetadata]:
         time_ms: float = 0.0,
         avg_reward: float = 0.0,
         avg_metrics: dict[str, float] = {},
+        pass_at_k: dict[str, float] = {},
+        pass_all_k: dict[str, float] = {},
+        pass_threshold: float = 0.5,
         usage: dict[str, float] | None = None,
         version_info: dict | None = None,
         state_columns: list[str] = ["foo"],
@@ -579,6 +582,9 @@ def make_metadata() -> Callable[..., GenerateMetadata]:
             time_ms=time_ms,
             avg_reward=avg_reward,
             avg_metrics=avg_metrics,
+            pass_at_k=pass_at_k,
+            pass_all_k=pass_all_k,
+            pass_threshold=pass_threshold,
             usage=usage,
             version_info=version_info,
             state_columns=state_columns,

verifiers-0.1.11.dev1/tests/test_env_crash_recovery.py ADDED Viewed

@@ -0,0 +1,237 @@
+"""Tests for environment server crash detection and recovery."""
+import asyncio
+import time
+from unittest.mock import patch
+import pytest
+from verifiers.workers.client.zmq_env_client import ZMQEnvClient
+from verifiers.workers.types import (
+    HealthRequest,
+    HealthResponse,
+    PendingRequest,
+    ServerState,
+)
+class TestStateTransitions:
+    """Tests for health-check-driven state transitions (via dedicated thread callbacks)."""
+    @pytest.mark.asyncio
+    async def test_startup_to_healthy_to_unhealthy(self):
+        """Callbacks drive STARTUP → HEALTHY → UNHEALTHY via healthy_event."""
+        client = ZMQEnvClient(
+            address="tcp://127.0.0.1:5555",
+            health_check_interval=0,  # disable auto thread
+        )
+        client.loop = asyncio.get_running_loop()
+        assert client.server_state == ServerState.STARTUP
+        assert not client.healthy_event.is_set()
+        # STARTUP → HEALTHY
+        client.on_became_healthy(ServerState.STARTUP)
+        assert client.server_state == ServerState.HEALTHY
+        assert client.healthy_event.is_set()
+        # HEALTHY → UNHEALTHY (after 5 consecutive failures)
+        client.on_became_unhealthy(5)
+        await asyncio.sleep(0.1)  # let _do_cancel_pending run
+        assert client.server_state == ServerState.UNHEALTHY
+        assert not client.healthy_event.is_set()
+        await client.close()
+    @pytest.mark.asyncio
+    async def test_unhealthy_cancels_pending_with_server_error(self):
+        """HEALTHY → UNHEALTHY transition cancels pending requests with ServerError."""
+        client = ZMQEnvClient(
+            address="tcp://127.0.0.1:5555",
+            health_check_interval=0,  # disable auto thread
+        )
+        client.loop = asyncio.get_running_loop()
+        # Start in HEALTHY state
+        client.server_state = ServerState.HEALTHY
+        client.healthy_event.set()
+        # Add a pending request
+        future = asyncio.Future()
+        async with client.pending_lock:
+            client.pending_requests["test_req"] = PendingRequest(
+                request_id="test_req",
+                request=HealthRequest(),
+                submitted_at=time.time(),
+                timeout=10.0,
+                future=future,
+            )
+        # Trigger UNHEALTHY
+        client.on_became_unhealthy(5)
+        await asyncio.sleep(0.1)  # let _do_cancel_pending run
+        assert future.done()
+        assert len(client.pending_requests) == 0
+        with pytest.raises(RuntimeError, match="unhealthy"):
+            future.result()
+        await client.close()
+class TestRetryOnServerError:
+    """Tests for send_request retry after ServerError."""
+    @pytest.mark.asyncio
+    async def test_retry_after_recovery(self):
+        """ServerError → wait for healthy_event → retry succeeds."""
+        client = ZMQEnvClient(
+            address="tcp://127.0.0.1:5555",
+            health_check_interval=0,
+        )
+        attempt_count = 0
+        async def mock_send(*args, **kwargs):
+            nonlocal attempt_count
+            attempt_count += 1
+            if attempt_count == 1:
+                # First attempt: simulate server crash
+                async def fail_then_recover():
+                    await asyncio.sleep(0.1)
+                    await client.cancel_all_pending("Connection lost")
+                    await asyncio.sleep(0.1)
+                    client.healthy_event.set()
+                asyncio.create_task(fail_then_recover())
+            else:
+                # Second attempt: succeed
+                async def succeed():
+                    await asyncio.sleep(0.05)
+                    req_id = list(client.pending_requests.keys())[0]
+                    pending = client.pending_requests.get(req_id)
+                    if pending and not pending.future.done():
+                        pending.future.set_result(
+                            HealthResponse(success=True).model_dump()
+                        )
+                asyncio.create_task(succeed())
+        with (
+            patch.object(client.socket, "connect"),
+            patch.object(client.socket, "send_multipart", new=mock_send),
+        ):
+            await client.ensure_started()
+            response = await client.send_request(
+                HealthRequest(), HealthResponse, timeout=5.0
+            )
+            assert attempt_count == 2
+            assert response.success
+            await client.close()
+    @pytest.mark.asyncio
+    async def test_recovery_timeout(self):
+        """ServerError + no recovery within timeout → TimeoutError."""
+        client = ZMQEnvClient(
+            address="tcp://127.0.0.1:5555",
+            health_check_interval=0,
+            recovery_timeout=0.5,
+        )
+        async def mock_send(*args, **kwargs):
+            async def fail():
+                await asyncio.sleep(0.05)
+                await client.cancel_all_pending("Connection lost")
+            asyncio.create_task(fail())
+        with (
+            patch.object(client.socket, "connect"),
+            patch.object(client.socket, "send_multipart", new=mock_send),
+        ):
+            await client.ensure_started()
+            with pytest.raises(TimeoutError, match="did not recover"):
+                await client.send_request(HealthRequest(), HealthResponse, timeout=5.0)
+            await client.close()
+    @pytest.mark.asyncio
+    async def test_no_retry_on_runtime_error(self):
+        """Plain RuntimeError propagates immediately without retry."""
+        client = ZMQEnvClient(
+            address="tcp://127.0.0.1:5555",
+            health_check_interval=0,
+        )
+        attempt_count = 0
+        async def mock_send(*args, **kwargs):
+            nonlocal attempt_count
+            attempt_count += 1
+            async def fail():
+                await asyncio.sleep(0.05)
+                req_id = list(client.pending_requests.keys())[0]
+                pending = client.pending_requests.get(req_id)
+                if pending and not pending.future.done():
+                    pending.future.set_exception(RuntimeError("Bad request"))
+            asyncio.create_task(fail())
+        with (
+            patch.object(client.socket, "connect"),
+            patch.object(client.socket, "send_multipart", new=mock_send),
+        ):
+            await client.ensure_started()
+            with pytest.raises(RuntimeError, match="Bad request"):
+                await client.send_request(HealthRequest(), HealthResponse, timeout=5.0)
+            assert attempt_count == 1
+            await client.close()
+class TestWaitForServerStartup:
+    """Tests for event-based wait_for_server_startup."""
+    @pytest.mark.asyncio
+    async def test_delayed_startup(self):
+        """Startup succeeds when health thread detects server after a delay."""
+        client = ZMQEnvClient(
+            address="tcp://127.0.0.1:5555",
+            health_check_interval=0,  # disable auto thread
+        )
+        client.loop = asyncio.get_running_loop()
+        # Simulate health thread detecting server after a delay
+        async def simulate_health_thread():
+            await asyncio.sleep(0.2)
+            client.on_became_healthy(ServerState.STARTUP)
+        asyncio.create_task(simulate_health_thread())
+        with patch.object(client.socket, "connect"):
+            await client.wait_for_server_startup(timeout=3.0)
+        assert client.healthy_event.is_set()
+        await client.close()
+    @pytest.mark.asyncio
+    async def test_startup_timeout(self):
+        """Startup raises TimeoutError when server never becomes healthy."""
+        client = ZMQEnvClient(
+            address="tcp://127.0.0.1:5555",
+            health_check_interval=0,  # disable auto thread
+        )
+        with patch.object(client.socket, "connect"):
+            with pytest.raises(TimeoutError, match="did not become healthy"):
+                await client.wait_for_server_startup(timeout=0.5)
+        await client.close()

verifiers 0.1.11.dev0__tar.gz → 0.1.11.dev1__tar.gz

verifiers 0.1.11.dev0tar.gz → 0.1.11.dev1tar.gz