PyPI - benchflow - Versions diffs - 0.4.0__tar.gz → 0.5.1.dev869__tar.gz - Mend

benchflow 0.4.0tar.gz → 0.5.1.dev869tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (458) hide show

{benchflow-0.4.0 → benchflow-0.5.1.dev869}/.gitignore RENAMED Viewed

@@ -185,3 +185,4 @@ tests/.smoke-jobs/
 context/
 tutorials/
 .playwright-mcp/
+/.claude/handoffs

{benchflow-0.4.0 → benchflow-0.5.1.dev869}/CHANGELOG.md RENAMED Viewed

@@ -2,6 +2,16 @@
 ## [Unreleased]
+### Added
+- **Daytona usage telemetry by default** — Daytona runs now start a sandbox-local provider usage proxy so token/cost telemetry works without an external tunnel; use `--usage-tracking off` to bypass proxying when needed.
+- **Azure AI Foundry providers** — new `azure-foundry-openai/` and `azure-foundry-anthropic/` prefixes routing through Foundry's unified resource. Export `AZURE_API_KEY` plus `AZURE_API_ENDPOINT` (e.g. `https://<resource>.openai.azure.com/`); benchflow derives the resource name from the endpoint host, builds the per-surface base URL, and maps the key onto the agent-native auth env automatically. Missing/unrecognized endpoints and unsupported agent/provider protocol pairings fail fast with clear errors instead of falling through to the wrong endpoint.
+- **Azure Foundry auth guidance** — agent discovery output and docs now call out that provider-prefixed models can use provider-specific credentials instead of the agent's native/default API key.
+### Fixed
+- Inherit `BENCHFLOW_PROVIDER_BASE_URL` / `BENCHFLOW_PROVIDER_API_KEY` from the host environment so self-hosted / OpenAI-compatible endpoints route correctly instead of falling back to `api.openai.com`; empty or whitespace-only host values are skipped so they cannot shadow the resolved provider URL (benchflow-ai/skillsbench#817).
 ## 0.3.3 — 2026-05-15
 ### Added

{benchflow-0.4.0 → benchflow-0.5.1.dev869}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: benchflow
-Version: 0.4.0
+Version: 0.5.1.dev869
 Summary: Multi-turn agent benchmarking with ACP — run any agent, any model, any provider.
 Project-URL: Homepage, https://github.com/benchflow-ai/benchflow
 Project-URL: Repository, https://github.com/benchflow-ai/benchflow
@@ -18,22 +18,30 @@ Classifier: Programming Language :: Python :: 3
 Classifier: Programming Language :: Python :: 3.12
 Classifier: Programming Language :: Python :: 3.13
 Requires-Python: >=3.12
+Requires-Dist: agent-client-protocol>=0.10
 Requires-Dist: anyio>=4.0
 Requires-Dist: httpx>=0.27.0
-Requires-Dist: pydantic>=2.0
+Requires-Dist: litellm[proxy]==1.88.0rc1
+Requires-Dist: pydantic>=2.7
 Requires-Dist: pyyaml>=6.0
 Requires-Dist: rich>=13.0
+Requires-Dist: tomli-w>=1.0
 Requires-Dist: typer>=0.9
 Provides-Extra: bedrock
 Requires-Dist: boto3>=1.40; extra == 'bedrock'
 Provides-Extra: dev
+Requires-Dist: packaging>=24; extra == 'dev'
 Requires-Dist: pre-commit>=3.7; extra == 'dev'
 Requires-Dist: pytest-asyncio>=0.24.0; extra == 'dev'
 Requires-Dist: pytest>=9.0.3; extra == 'dev'
 Requires-Dist: ruff>=0.7.0; extra == 'dev'
 Requires-Dist: ty>=0.0.1a1; extra == 'dev'
+Provides-Extra: judge
+Requires-Dist: anthropic>=0.40; extra == 'judge'
+Requires-Dist: google-genai>=1.0; extra == 'judge'
+Requires-Dist: openai>=1.40; extra == 'judge'
 Provides-Extra: sandbox-daytona
-Requires-Dist: daytona>=0.153.0; extra == 'sandbox-daytona'
+Requires-Dist: daytona>=0.184.0; extra == 'sandbox-daytona'
 Requires-Dist: tenacity>=8.0; extra == 'sandbox-daytona'
 Provides-Extra: sandbox-modal
 Requires-Dist: modal>=0.73; extra == 'sandbox-modal'
@@ -66,7 +74,7 @@ BenchFlow runs AI agents against benchmark tasks in sandboxed environments. Sing
 uv tool install benchflow
 ```
-Requires Python 3.12+ and [uv](https://docs.astral.sh/uv/). Set `DAYTONA_API_KEY` for Daytona runs or configure Modal auth for Modal runs; export the relevant agent API key (`GEMINI_API_KEY`, `ANTHROPIC_API_KEY`, etc.) or run `claude login` / `codex --login` for subscription auth.
+Requires Python 3.12+ and [uv](https://docs.astral.sh/uv/). Set `DAYTONA_API_KEY` for Daytona runs or configure Modal auth for Modal runs; export the relevant agent API key (`GEMINI_API_KEY`, `ANTHROPIC_API_KEY`, etc.) or run `claude login` / `codex --login` for subscription auth. Provider-prefixed models may use provider-specific credentials; Azure Foundry models use `AZURE_API_KEY` plus `AZURE_API_ENDPOINT`.
 ## Documentation
@@ -81,6 +89,7 @@ Start with [Getting started](./docs/getting-started.md), then [Concepts](./docs/
 | Multi-round single-agent (progressive disclosure, oracle access) | [Progressive disclosure](./docs/progressive-disclosure.md) |
 | Skill evaluation (when the artifact is a skill, not a workspace) | [Skill eval](./docs/skill-eval.md) |
 | Understand the security model | [Sandbox hardening](./docs/sandbox-hardening.md) |
+| Use public vs internal preview SDK releases | [Release channels](./docs/release.md) |
 | CLI flags + commands | [CLI reference](./docs/reference/cli.md) |
 | Python API surface | [Python API reference](./docs/reference/python-api.md) |
@@ -91,20 +100,20 @@ Notebooks and runnable example scripts live under [`docs/examples/`](./docs/exam
 Benchmark datasets live in external Git repos and are referenced with two fields:
 ```yaml
-# benchmarks/skillsbench-claude-glm51.yaml
+# benchmarks/harvey-lab/harvey-lab-gemini-flash-lite.yaml
 source:
-  repo: benchflow-ai/skillsbench   # GitHub org/repo
-  path: tasks                       # optional subpath within repo
+  repo: benchflow-ai/benchmarks    # GitHub org/repo
+  path: datasets/harvey-lab/tasks  # optional subpath within repo
   ref: main                         # optional branch/tag
-agent: claude-agent-acp
-model: claude-sonnet-4-6
+agent: gemini
+model: gemini/gemini-3.1-flash-lite-preview
 ```
 Run any benchmark via the CLI:
 ```bash
-# From a YAML config
-bench eval create --config benchmarks/skillsbench-claude-glm51.yaml
+# From a YAML config (shipped with the repo)
+bench eval create --config benchmarks/harvey-lab/harvey-lab-gemini-flash-lite.yaml
 # Inline — mirrors the YAML source fields
 bench eval create \
@@ -114,10 +123,9 @@ bench eval create \
 Repos are cloned and cached locally under `.cache/datasets/` on first use.
-SkillsBench itself sources BenchFlow from GitHub `main` in its
-[`pyproject.toml`](https://github.com/benchflow-ai/skillsbench/blob/main/pyproject.toml).
-After a BenchFlow change lands, run `uv lock --upgrade-package benchflow` in
-SkillsBench when you need its lockfile to point at the newest BenchFlow commit.
+Downstream projects should depend on the public PyPI release by default. For
+internal validation before the next public release, install or lock the internal
+preview channel with prereleases enabled; see [Release channels](./docs/release.md).
 ## Featured
@@ -141,7 +149,9 @@ Two runnable labs validate the security story:
 PRs welcome. Open against `main`. CI runs ruff + tests on every PR; please run `ruff check .` and `pytest tests/` locally first.
-For a release: bump `pyproject.toml` to the next stable version, tag `v<version>` on main, push the tag — CI publishes to PyPI. Then bump main to the next `.dev0`.
+Release channels are documented in [Release channels](./docs/release.md). In
+short: merges to `main` publish an internal preview after CI passes, while a
+matching `v<version>` tag publishes the public release.
 ## License

{benchflow-0.4.0 → benchflow-0.5.1.dev869}/README.md RENAMED Viewed

@@ -24,7 +24,7 @@ BenchFlow runs AI agents against benchmark tasks in sandboxed environments. Sing
 uv tool install benchflow
 ```
-Requires Python 3.12+ and [uv](https://docs.astral.sh/uv/). Set `DAYTONA_API_KEY` for Daytona runs or configure Modal auth for Modal runs; export the relevant agent API key (`GEMINI_API_KEY`, `ANTHROPIC_API_KEY`, etc.) or run `claude login` / `codex --login` for subscription auth.
+Requires Python 3.12+ and [uv](https://docs.astral.sh/uv/). Set `DAYTONA_API_KEY` for Daytona runs or configure Modal auth for Modal runs; export the relevant agent API key (`GEMINI_API_KEY`, `ANTHROPIC_API_KEY`, etc.) or run `claude login` / `codex --login` for subscription auth. Provider-prefixed models may use provider-specific credentials; Azure Foundry models use `AZURE_API_KEY` plus `AZURE_API_ENDPOINT`.
 ## Documentation
@@ -39,6 +39,7 @@ Start with [Getting started](./docs/getting-started.md), then [Concepts](./docs/
 | Multi-round single-agent (progressive disclosure, oracle access) | [Progressive disclosure](./docs/progressive-disclosure.md) |
 | Skill evaluation (when the artifact is a skill, not a workspace) | [Skill eval](./docs/skill-eval.md) |
 | Understand the security model | [Sandbox hardening](./docs/sandbox-hardening.md) |
+| Use public vs internal preview SDK releases | [Release channels](./docs/release.md) |
 | CLI flags + commands | [CLI reference](./docs/reference/cli.md) |
 | Python API surface | [Python API reference](./docs/reference/python-api.md) |
@@ -49,20 +50,20 @@ Notebooks and runnable example scripts live under [`docs/examples/`](./docs/exam
 Benchmark datasets live in external Git repos and are referenced with two fields:
 ```yaml
-# benchmarks/skillsbench-claude-glm51.yaml
+# benchmarks/harvey-lab/harvey-lab-gemini-flash-lite.yaml
 source:
-  repo: benchflow-ai/skillsbench   # GitHub org/repo
-  path: tasks                       # optional subpath within repo
+  repo: benchflow-ai/benchmarks    # GitHub org/repo
+  path: datasets/harvey-lab/tasks  # optional subpath within repo
   ref: main                         # optional branch/tag
-agent: claude-agent-acp
-model: claude-sonnet-4-6
+agent: gemini
+model: gemini/gemini-3.1-flash-lite-preview
 ```
 Run any benchmark via the CLI:
 ```bash
-# From a YAML config
-bench eval create --config benchmarks/skillsbench-claude-glm51.yaml
+# From a YAML config (shipped with the repo)
+bench eval create --config benchmarks/harvey-lab/harvey-lab-gemini-flash-lite.yaml
 # Inline — mirrors the YAML source fields
 bench eval create \
@@ -72,10 +73,9 @@ bench eval create \
 Repos are cloned and cached locally under `.cache/datasets/` on first use.
-SkillsBench itself sources BenchFlow from GitHub `main` in its
-[`pyproject.toml`](https://github.com/benchflow-ai/skillsbench/blob/main/pyproject.toml).
-After a BenchFlow change lands, run `uv lock --upgrade-package benchflow` in
-SkillsBench when you need its lockfile to point at the newest BenchFlow commit.
+Downstream projects should depend on the public PyPI release by default. For
+internal validation before the next public release, install or lock the internal
+preview channel with prereleases enabled; see [Release channels](./docs/release.md).
 ## Featured
@@ -99,7 +99,9 @@ Two runnable labs validate the security story:
 PRs welcome. Open against `main`. CI runs ruff + tests on every PR; please run `ruff check .` and `pytest tests/` locally first.
-For a release: bump `pyproject.toml` to the next stable version, tag `v<version>` on main, push the tag — CI publishes to PyPI. Then bump main to the next `.dev0`.
+Release channels are documented in [Release channels](./docs/release.md). In
+short: merges to `main` publish an internal preview after CI passes, while a
+matching `v<version>` tag publishes the public release.
 ## License

{benchflow-0.4.0 → benchflow-0.5.1.dev869}/pyproject.toml RENAMED Viewed

@@ -1,16 +1,19 @@
 [project]
 name = "benchflow"
-version = "0.4.0"
+version = "0.5.1.dev869"
 description = "Multi-turn agent benchmarking with ACP — run any agent, any model, any provider."
 readme = "README.md"
 requires-python = ">=3.12"
 keywords = ["benchmark", "llm-agents", "acp", "agent-evaluation", "multi-turn", "skillsbench"]
 dependencies = [
+    "agent-client-protocol>=0.10",
     "httpx>=0.27.0",
     "anyio>=4.0",
-    "pydantic>=2.0",
+    "pydantic>=2.7",
     "pyyaml>=6.0",
     "rich>=13.0",
+    "litellm[proxy]==1.88.0rc1",
+    "tomli-w>=1.0",
     "typer>=0.9",
 ]
 authors = [
@@ -35,6 +38,7 @@ classifiers = [
 [project.optional-dependencies]
 dev = [
+    "packaging>=24",
     "pre-commit>=3.7",
     "pytest>=9.0.3",
     "pytest-asyncio>=0.24.0",
@@ -42,7 +46,12 @@ dev = [
     "ty>=0.0.1a1",
 ]
 sandbox-daytona = [
-    "daytona>=0.153.0",
+    # >=0.183: list() returns an auto-paginating Iterator[Sandbox] (the older
+    # paged list(page=, limit=) -> .items API was removed).
+    # >=0.184: the top-level sync `Daytona` export is present (0.176-0.183 only
+    # shipped `AsyncDaytona`); the dashboard's daytona_status.snapshot() uses the
+    # sync client, so this floor is required for that panel to import.
+    "daytona>=0.184.0",
     "tenacity>=8.0",
 ]
 sandbox-modal = [
@@ -52,6 +61,13 @@ sandbox-modal = [
 bedrock = [
     "boto3>=1.40",
 ]
+# Provider SDKs for the llm-judge verifier (type = "llm-judge").
+# llm.py routes judge calls across all three; install at least one.
+judge = [
+    "anthropic>=0.40",
+    "openai>=1.40",
+    "google-genai>=1.0",
+]
 [project.scripts]
 benchflow = "benchflow.cli.main:app"
@@ -90,7 +106,13 @@ markers = [
 [tool.ruff]
 target-version = "py312"
-extend-exclude = [".claude/skills/skill-creator"]
+# Vendored third-party service packages baked into task images (e.g. the
+# smolclaws claw-* sources copied under a ClawsBench task's environment/) are
+# not BenchFlow code — do not lint them.
+extend-exclude = [
+    ".claude/skills/skill-creator",
+    "benchmarks/**/tasks/**/environment/claw-*",
+]
 [tool.ruff.lint]
 select = [
@@ -127,7 +149,7 @@ python-version = "3.12"
 unresolved-import = "ignore"
 [tool.ty.src]
-include = ["src"]
+include = ["src", "tools"]
 # Modules that heavily use optional-dep types (daytona, modal, openai, boto3, …)
 # produce cascading type errors when those packages aren't installed.
 exclude = [
@@ -139,6 +161,5 @@ exclude = [
     "src/benchflow/rewards/llm.py",
     "src/benchflow/rewards/file_readers.py",
     "src/benchflow/rewards/rubric_config.py",
-    "src/benchflow/providers/bedrock_runtime.py",
     "src/benchflow/experimental/mcp/reviewer_server.py",
 ]

{benchflow-0.4.0 → benchflow-0.5.1.dev869}/src/benchflow/__init__.py RENAMED Viewed

@@ -3,16 +3,20 @@
 Public API surface:
 - Sandbox protocol for isolated execution environments
 - ACP client for multi-turn agent communication
-- Trajectory capture (HTTP proxy, OTel collector, ACP native)
+- Trajectory capture (LiteLLM callbacks, OTel collector, ACP native)
 - Rollout lifecycle for single-task execution
 - Evaluation orchestration with retries and concurrency
 - Rewards protocol (composable Rubric + RewardFunc)
 - Metrics collection and aggregation
 """
+from importlib.metadata import PackageNotFoundError
 from importlib.metadata import version as _version
-__version__ = _version("benchflow")
+try:
+    __version__ = _version("benchflow")
+except PackageNotFoundError:
+    __version__ = "0+unknown"
 # Core types
 from benchflow._types import Role, Scene, Turn
@@ -33,6 +37,12 @@ from benchflow.agents.registry import (
     list_agents,
     register_agent,
 )
+from benchflow.contracts.user import (
+    BaseUser,
+    FunctionUser,
+    PassthroughUser,
+    RoundResult,
+)
 from benchflow.evaluation import (
     Evaluation,
     EvaluationConfig,
@@ -41,13 +51,23 @@ from benchflow.evaluation import (
 )
 from benchflow.metrics import BenchmarkMetrics, collect_metrics
 from benchflow.models import AgentInstallError, AgentTimeoutError, RolloutResult
+from benchflow.monitor import (
+    Monitor,
+    MonitorConfig,
+    MonitorNotImplementedError,
+    MonitorResult,
+)
-# Rewards protocol (v0.4 — composable Rubric + RewardFunc)
+# Rewards plane. Reward is the canonical node-based contract
+# (``score(node) -> VerifyResult``); RewardFunc is the legacy path-based shape
+# (``score(rollout_dir) -> float``) adapted into Reward via PathReward.
 from benchflow.rewards import (
     CodeExecRewardFunc,
     Criterion,
     JudgeConfig,
     LLMJudgeRewardFunc,
+    PathReward,
+    Reward,
     RewardEvent,
     RewardFunc,
     Rubric,
@@ -56,6 +76,8 @@ from benchflow.rewards import (
     StringMatchRewardFunc,
     TestRewardFunc,
     VerifyResult,
+    load_rubric,
+    load_rubric_json,
     load_rubric_toml,
 )
 from benchflow.rollout import Rollout, RolloutConfig
@@ -73,6 +95,8 @@ from benchflow.sandbox import (
     ImageConfig,
     ImageRef,
     Sandbox,
+    SandboxImage,
+    SandboxSnapshotNotSupported,
     build_service_hooks,
     detect_services_from_dockerfile,
     register_service,
@@ -82,10 +106,15 @@ from benchflow.sandbox import (
 from benchflow.sandbox import ExecResult as SandboxExecResult
 from benchflow.sandbox.protocol import ExecResult
 from benchflow.sandbox.setup import stage_dockerfile_deps
-from benchflow.sandbox.snapshot import list_snapshots, restore, snapshot
-from benchflow.sandbox.user import BaseUser, FunctionUser, PassthroughUser, RoundResult
-from benchflow.scenes import MailboxTransport, Message, MessageTransport, SceneRole
-from benchflow.scenes import Scene as SceneRuntime
+from benchflow.sandbox.snapshot import (
+    list_snapshots,
+    list_workspace_snapshots,
+    restore,
+    snapshot,
+    workspace_restore,
+    workspace_snapshot,
+)
+from benchflow.scenes import compile_scenes_to_steps
 from benchflow.sdk import SDK
 from benchflow.skills import SkillInfo, discover_skills, install_skill, parse_skill
 from benchflow.task import (
@@ -95,17 +124,18 @@ from benchflow.task import (
     VerifierResult,
 )
 from benchflow.trajectories.otel import OTelCollector
-from benchflow.trajectories.proxy import TrajectoryProxy
 from benchflow.trajectories.types import Trajectory
 # Public API surface. Anything not in this list is implementation detail and
 # may change without notice.
 __all__ = [
     "__version__",
-    # Rewards protocol (v0.4)
+    # Rewards plane
+    "Reward",
     "Rubric",
     "RewardFunc",
     "RewardEvent",
+    "PathReward",
     "VerifyResult",
     "TestRewardFunc",
     "LLMJudgeRewardFunc",
@@ -115,10 +145,14 @@ __all__ = [
     "JudgeConfig",
     "RubricConfig",
     "ScoringConfig",
+    "load_rubric",
+    "load_rubric_json",
     "load_rubric_toml",
     # Sandbox protocol
     "Sandbox",
     "SandboxExecResult",
+    "SandboxImage",
+    "SandboxSnapshotNotSupported",
     "ImageBuilder",
     "ImageConfig",
     "ImageRef",
@@ -149,6 +183,11 @@ __all__ = [
     "AgentInstallError",
     "AgentTimeoutError",
     "RolloutResult",
+    # Monitor mode — scaffolded API surface (#386)
+    "Monitor",
+    "MonitorConfig",
+    "MonitorResult",
+    "MonitorNotImplementedError",
     # Runtime
     "Agent",
     "Environment",
@@ -161,13 +200,13 @@ __all__ = [
     "Role",
     "Scene",
     "Turn",
-    # Multi-agent scene runtime
-    "SceneRole",
-    "SceneRuntime",
-    "Message",
-    "MessageTransport",
-    "MailboxTransport",
-    # Env snapshots
+    # Scene authoring desugaring
+    "compile_scenes_to_steps",
+    # Workspace snapshots (filesystem helper — NOT the Sandbox primitive, #384)
+    "workspace_snapshot",
+    "workspace_restore",
+    "list_workspace_snapshots",
+    # Backward-compatible aliases for the above (pre-#384 names)
     "snapshot",
     "restore",
     "list_snapshots",
@@ -195,7 +234,6 @@ __all__ = [
     "parse_skill",
     # Trajectories
     "OTelCollector",
-    "TrajectoryProxy",
     "Trajectory",
     # External adapters
     "InspectAdapter",

benchflow-0.5.1.dev869/src/benchflow/_paths.py ADDED Viewed

@@ -0,0 +1,218 @@
+"""Path safety helpers — reject unsafe inputs and refuse to follow symlinks.
+Two independent helper sets live here:
+1. **Segment validation** (``safe_path_segment``, ``assert_within``):
+   Reject user-controlled strings (case ids, skill names) that would traverse
+   outside the intended tree.
+2. **Symlink defense** (``is_safe_regular_file``, ``iter_safe_tree``, etc.):
+   Walk directories we do not own without following symlinks, so an
+   attacker-placed link cannot pull host files into dashboard payloads,
+   judge prompts, or sandbox uploads.
+"""
+from __future__ import annotations
+import logging
+import os
+import stat
+from collections.abc import Iterator
+from pathlib import Path
+__all__ = [
+    "safe_path_segment",
+    "assert_within",
+    "is_safe_regular_file",
+    "is_safe_regular_dir",
+    "iter_safe_children",
+    "iter_safe_tree",
+    "ignore_symlinks",
+]
+logger = logging.getLogger(__name__)
+# ── Segment validation ───────────────────────────────────────────────
+def safe_path_segment(name: str, *, kind: str = "name") -> str:
+    """Return ``name`` unchanged if safe as a single path segment.
+    Raises :class:`ValueError` for inputs that cannot be used as a directory
+    or file name without risking path traversal or shell ambiguity.
+    Rejected forms:
+    * empty string
+    * ``.`` or ``..`` (current/parent directory references)
+    * any string containing ``/`` or ``\\`` (multi-segment paths)
+    * any string containing a NUL byte
+    * leading or trailing whitespace
+    * leading ``-`` (would be interpreted as a CLI flag by downstream tools)
+    All other Unicode is accepted; this is a security boundary, not a
+    cosmetic slugifier. Callers that want forgiving behaviour should slugify
+    *before* calling this function.
+    Args:
+        name: The candidate path segment.
+        kind: A human label used in the error message (e.g. ``"case id"``,
+            ``"skill name"``).
+    Returns:
+        The input ``name`` unchanged.
+    Raises:
+        ValueError: If ``name`` is not safe as a single path segment.
+    """
+    if not isinstance(name, str):
+        raise ValueError(f"{kind} must be a string, got {type(name).__name__}")
+    if name == "":
+        raise ValueError(f"{kind} must not be empty")
+    if name in (".", ".."):
+        raise ValueError(f"{kind} must not be '.' or '..' (got {name!r})")
+    if "/" in name or "\\" in name:
+        raise ValueError(f"{kind} must not contain path separators (got {name!r})")
+    if "\x00" in name:
+        raise ValueError(f"{kind} must not contain NUL bytes (got {name!r})")
+    if name != name.strip():
+        raise ValueError(
+            f"{kind} must not have leading or trailing whitespace (got {name!r})"
+        )
+    if name.startswith("-"):
+        raise ValueError(
+            f"{kind} must not start with '-' (got {name!r}); "
+            "would be misread as a CLI flag"
+        )
+    return name
+def assert_within(child: Path, root: Path) -> Path:
+    """Resolve both paths and assert ``child`` is under ``root``.
+    Uses :meth:`Path.resolve` so symlinks are followed and ``..`` segments
+    collapsed before the containment check. Returns the resolved child.
+    Args:
+        child: A path that should be inside ``root``.
+        root: The directory ``child`` must not escape.
+    Returns:
+        The resolved ``child`` path.
+    Raises:
+        ValueError: If the resolved ``child`` is not under the resolved
+            ``root``.
+    """
+    resolved_root = root.resolve()
+    resolved_child = child.resolve()
+    try:
+        resolved_child.relative_to(resolved_root)
+    except ValueError as exc:
+        raise ValueError(
+            f"path {child} resolves to {resolved_child}, "
+            f"which is outside {resolved_root}"
+        ) from exc
+    return resolved_child
+# ── Symlink defense ──────────────────────────────────────────────────
+def is_safe_regular_file(path: Path) -> bool:
+    """True if *path* exists, is a regular file, and is not a symlink.
+    Uses ``os.lstat`` so symlinks, fifos, sockets, and device files all
+    return False. A non-existent path also returns False.
+    """
+    try:
+        st = os.lstat(path)
+    except OSError:
+        return False
+    return stat.S_ISREG(st.st_mode) and not stat.S_ISLNK(st.st_mode)
+def is_safe_regular_dir(path: Path) -> bool:
+    """True if *path* is a directory and not a symlink to one."""
+    try:
+        st = os.lstat(path)
+    except OSError:
+        return False
+    return stat.S_ISDIR(st.st_mode) and not stat.S_ISLNK(st.st_mode)
+def iter_safe_children(
+    directory: Path,
+    *,
+    context: str = "directory walk",
+) -> Iterator[Path]:
+    """Yield direct children of *directory*, skipping symlinks with a warning."""
+    try:
+        entries = sorted(directory.iterdir())
+    except (OSError, NotADirectoryError):
+        return
+    for child in entries:
+        if child.is_symlink():
+            logger.warning(
+                "%s: skipping symlink %s (refusing to follow)", context, child
+            )
+            continue
+        yield child
+def iter_safe_tree(
+    root: Path,
+    *,
+    context: str = "tree walk",
+) -> Iterator[Path]:
+    """Recursively yield regular files under *root*, never following symlinks.
+    Uses ``os.walk(followlinks=False)`` so directory symlinks are also not
+    descended into.
+    """
+    if not is_safe_regular_dir(root):
+        if Path(root).is_symlink():
+            logger.warning(
+                "%s: refusing to descend into symlinked root %s", context, root
+            )
+        return
+    for dirpath, dirnames, filenames in os.walk(root, followlinks=False):
+        base = Path(dirpath)
+        kept_dirs: list[str] = []
+        for name in dirnames:
+            child = base / name
+            if child.is_symlink():
+                logger.warning(
+                    "%s: skipping symlinked directory %s (refusing to follow)",
+                    context,
+                    child,
+                )
+                continue
+            kept_dirs.append(name)
+        dirnames[:] = sorted(kept_dirs)
+        for name in sorted(filenames):
+            f = base / name
+            if not is_safe_regular_file(f):
+                logger.warning(
+                    "%s: skipping non-regular path %s (symlink or special file)",
+                    context,
+                    f,
+                )
+                continue
+            yield f
+def ignore_symlinks(directory: str, contents: list[str]) -> list[str]:
+    """``shutil.copytree`` ``ignore=`` callback that drops every symlink."""
+    skipped: list[str] = []
+    for name in contents:
+        if Path(directory, name).is_symlink():
+            skipped.append(name)
+    if skipped:
+        logger.warning(
+            "copytree: skipping symlinked entries under %s: %s",
+            directory,
+            ", ".join(sorted(skipped)),
+        )
+    return skipped

benchflow 0.4.0__tar.gz → 0.5.1.dev869__tar.gz

benchflow 0.4.0tar.gz → 0.5.1.dev869tar.gz