PyPI - eval-toolkit - Versions diffs - 0.43.0__tar.gz → 0.45.0__tar.gz - Mend

eval-toolkit 0.43.0tar.gz → 0.45.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (174) hide show

{eval_toolkit-0.43.0 → eval_toolkit-0.45.0}/.gitignore RENAMED Viewed

@@ -39,6 +39,12 @@ coverage.json
 # Logs
 *.log
+# Local environment overrides (machine-local credentials / config)
+.env.local
+# Mutation-testing output (mutmut / cargo-mutants — local run artifacts)
+mutants/
 # Claude Code project settings (machine-local)
 .claude/

{eval_toolkit-0.43.0 → eval_toolkit-0.45.0}/CHANGELOG.md RENAMED Viewed

@@ -5,6 +5,78 @@ All notable changes to this project will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+## [0.45.0] — 2026-05-21 — Stacking: MetaLearner Protocol + LogisticStacker (closes #52)
+First minor of the staggered v0.45 → v0.46 → v0.47 → v0.48 → v1.0 sequence
+(per the v1.0 plan at `~/.claude/plans/evaluate-all-the-work-twinkly-kite.md`).
+Non-breaking — purely additive. No Protocol shape edits to the existing 6
+Tier-2 contracts (Gate 2 streak continues: 6 of 6 consecutive minors without
+Protocol-shape changes).
+### Added
+- `eval_toolkit.stacking` — new module providing the `MetaLearner` Protocol
+  and one reference impl, `LogisticStacker`, for combining outputs from
+  multiple binary detectors into a calibrated ensemble. Wraps
+  `sklearn.linear_model.LogisticRegression` with a stacker-shaped public API
+  (sklearn-style `fit(score_matrix, y)`, `predict(score_matrix)`,
+  `predict_proba(score_matrix)`, plus `coef_` / `classes_` / `intercept_`
+  attributes). No new dependencies — `scikit-learn` is already core since
+  v0.27. Closes #52.
+- `MetaLearner` Protocol — `@runtime_checkable`; sklearn-shape contract
+  taking a `(n_samples, n_detectors)` score matrix. Sized as a v1.0 Tier-2
+  contract per the v1.0 plan Decision M (tiered stability — strict freeze at
+  v1.0; additive subprotocols permitted in minor releases). Mirrors the
+  `Probe` Protocol pattern from v0.43.
+- `LogisticStacker` reference impl — configurable C, fit_intercept,
+  class_weight, penalty, solver, max_iter, random_state. Class-weight default
+  `"balanced"` for the common imbalanced-detection setting. Composes with the
+  4-binary-calibrator family (v0.40 + v0.42) via `fit_platt_binary` /
+  `fit_isotonic_binary` chaining on stacked output.
+- 24-test coverage in `tests/test_stacking.py`: Protocol satisfaction (both
+  structural and duck-typed), shape contracts (3-detector × 500-sample
+  fixtures), regularization behavior (C, L1 penalty), signal ordering,
+  calibration chaining (Platt + Isotonic), bootstrap CI on stacker output
+  (Audit F6a-aware — uses correct `BootstrapCI.ci_low/ci_high` attribute
+  names), determinism under fixed `random_state`, hypothesis property on
+  signal monotonicity, input validation (shape mismatch, single-class,
+  non-finite, unfit, wrong-n-detectors).
+- `docs/source/examples/stacking.md` — myst-nb worked example: 3 synthetic
+  detectors with descending signal-to-noise, stacker fit, post-stacking
+  Platt calibration. Cites Wolpert 1992 + Breiman 1996.
+### Notes
+- Sklearn 1.8+ deprecates `LogisticRegression(penalty=...)` in favor of
+  `l1_ratio`. The public `LogisticStacker(penalty=...)` API is preserved;
+  internal sklearn-side migration to `l1_ratio` will land when sklearn 1.10
+  lands and the warning becomes more visible. No user-facing impact.
+## [0.44.0] — 2026-05-19 — Defenses + losses: Spotlighting variants + RecallAtLowFPR (closes #50, #51)
+### Added
+- `eval_toolkit.preprocessing` — new module with 3 Spotlighting
+  structural-defense variants from Hines et al. 2024
+  (arXiv 2403.14720): `delimit(text, delimiter='<<')`,
+  `datamark(text, marker='^')`, `encode(text, encoding='base64')`,
+  plus a `sweep(texts, variants=..., kwargs=...)` batch wrapper that
+  returns a `(N*3)`-row DataFrame. Includes a `spotlighting`
+  SimpleNamespace exposing the upstream issue's function-style API
+  (`spotlighting.delimit(text)`, etc.). Base-install safe (pure
+  stdlib). Closes #51.
+- `eval_toolkit.losses` — new module with `RecallAtLowFPR` — the
+  Meta Prompt Guard 2 (PG2) training recipe: a differentiable
+  approximation of recall-at-fixed-FPR via soft-rank, returning a
+  scalar `torch.nn.Module` loss for use in standard training loops.
+  Optimizes detector ranking at a constrained operating point
+  (e.g. `fpr_target=0.01` → "maximize recall while keeping FPR ≤ 1%").
+  Closes #50.
+- New optional extra `[losses] = torch>=2.0`. Granular per the v0.43
+  plan Decision 4 — separated from `[probes]` so callers wanting only
+  the loss don't have to install the larger transformers stack.
+  Shares the torch version pin with `[probes]`.
 ## [0.43.0] — 2026-05-19 — P1 batch: OOD manifest loader + character_injection sweep + ActivationDeltaProbe (closes #48, #49, #53)
 ### Added

{eval_toolkit-0.43.0 → eval_toolkit-0.45.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: eval-toolkit
-Version: 0.43.0
+Version: 0.45.0
 Summary: Reusable evaluation contracts for binary classification: metrics, bootstrap CIs, calibration, artifacts, and evidence gates.
 Project-URL: Homepage, https://github.com/brandon-behring/eval-toolkit
 Project-URL: Documentation, https://brandon-behring.github.io/eval-toolkit/
@@ -62,6 +62,8 @@ Requires-Dist: sphinx-design>=0.6; extra == 'docs'
 Requires-Dist: sphinx>=7.3; extra == 'docs'
 Provides-Extra: embeddings
 Requires-Dist: sentence-transformers>=3.0; extra == 'embeddings'
+Provides-Extra: losses
+Requires-Dist: torch>=2.0; extra == 'losses'
 Provides-Extra: parquet
 Requires-Dist: pyarrow>=15.0; extra == 'parquet'
 Provides-Extra: plotting

eval_toolkit-0.45.0/docs/source/adr/README.md ADDED Viewed

@@ -0,0 +1,76 @@
+# Architecture Decision Records
+This directory captures architecturally-significant decisions that shape
+`eval-toolkit`'s long-term design. ADRs are immutable historical records —
+once accepted, a decision is not edited in place; if it changes, a new ADR
+supersedes it.
+## When to file an ADR
+File a new ADR when a decision:
+- **Locks in an interface or shape** that future code is expected to
+  conform to (e.g., "metrics return type", "Protocol vs ABC").
+- **Closes off alternatives** that were seriously considered, so the
+  reasoning isn't lost.
+- **Carries cost** to reverse (e.g., a public-API contract that promises
+  stability across a release line).
+Routine refactors, bug fixes, and internal-only patterns do not need ADRs —
+the commit message + CHANGELOG entry are enough.
+## Numbering
+Sequential, zero-padded: `0001-flat-module-layout.md`,
+`0002-scorecard-as-primary-metric-surface.md`, etc. Number is assigned
+at the time of writing; if two ADRs are drafted in parallel, the second
+to merge takes the next number.
+## Format
+Each ADR uses this skeleton (loosely based on MADR — Markdown ADR — without
+the heavyweight template):
+```markdown
+# ADR NNNN: Title
+**Status:** Proposed | Accepted | Superseded by ADR-MMMM
+**Date:** YYYY-MM-DD
+**Deciders:** (names or roles)
+## Context
+What's the situation that requires a decision? What constraints are at play?
+## Decision
+What did we decide?
+## Consequences
+What follows from this decision? (Both positive and negative.)
+## Alternatives considered
+What else was on the table, and why wasn't it chosen?
+## Trigger to revisit
+What would have to change for this decision to be reopened?
+(Optional but useful — keeps the ADR self-documenting.)
+```
+## Cross-references
+- [`docs/RELEASING.md`](../../RELEASING.md) — release-flow process; ADRs
+  are typically drafted as part of release prep.
+- [`docs/source/roadmap.md`](../roadmap.md) — long-term direction;
+  ADRs explain how individual roadmap decisions were made.
+## Index
+(Updated as ADRs are added.)
+| # | Title | Status | Date |
+|---|---|---|---|
+| _none yet_ | | | |

{eval_toolkit-0.43.0 → eval_toolkit-0.45.0}/pyproject.toml RENAMED Viewed

@@ -69,6 +69,11 @@ transformers = ["transformers>=4.0"]
 # is base-install-safe (lazy imports inside ActivationDeltaProbe methods);
 # the extra is strictly for callers wanting to actually fit / predict.
 probes = ["torch>=2.0", "transformers>=4.40"]
+# v0.44.0: RecallAtLowFPR loss (Meta Prompt Guard 2 recipe; closes #50).
+# torch-only (no transformers); separated from [probes] per Decision 4
+# (granular extras — losses callers should not have to install the larger
+# transformers stack). Shares the torch version pin with [probes].
+losses = ["torch>=2.0"]
 # DEPRECATED (announced v0.30.1, removal v0.33.0).
 #
 # Retained as a transitive no-op so `pip install eval-toolkit[validation]`
@@ -177,7 +182,7 @@ warn_no_return = true
 strict_equality = true
 [[tool.mypy.overrides]]
-module = ["scipy.*", "sklearn.*", "matplotlib.*", "pandas.*", "yaml.*", "sentence_transformers.*", "joblib.*"]
+module = ["scipy.*", "sklearn.*", "matplotlib.*", "pandas.*", "yaml.*", "sentence_transformers.*", "joblib.*", "torch.*", "transformers.*"]
 ignore_missing_imports = true
 [tool.pytest.ini_options]

{eval_toolkit-0.43.0 → eval_toolkit-0.45.0}/src/eval_toolkit/__init__.py RENAMED Viewed

@@ -40,6 +40,13 @@ _EXPORTS: dict[str, str] = {
     "WhitespaceInjection": "eval_toolkit.adversarial",
     "ZeroWidthSpaceInjection": "eval_toolkit.adversarial",
     "character_injection": "eval_toolkit.adversarial",
+    # --- losses ---
+    "RecallAtLowFPR": "eval_toolkit.losses",
+    # --- preprocessing ---
+    "datamark": "eval_toolkit.preprocessing",
+    "delimit": "eval_toolkit.preprocessing",
+    "encode": "eval_toolkit.preprocessing",
+    "spotlighting": "eval_toolkit.preprocessing",
     # --- probes ---
     "ActivationDeltaProbe": "eval_toolkit.probes",
     "ActivationExtractor": "eval_toolkit.probes",
@@ -287,6 +294,8 @@ _EXPORTS: dict[str, str] = {
     "recall_at_fpr": "eval_toolkit.thresholds",
     "select_threshold": "eval_toolkit.thresholds",
     "wilson_interval": "eval_toolkit.thresholds",
+    "LogisticStacker": "eval_toolkit.stacking",
+    "MetaLearner": "eval_toolkit.stacking",
 }
 __all__ = ["__version__", *_EXPORTS.keys()]

{eval_toolkit-0.43.0 → eval_toolkit-0.45.0}/src/eval_toolkit/_version.py RENAMED Viewed

@@ -2,4 +2,4 @@
 __all__ = ["__version__"]
-__version__ = "0.43.0"
+__version__ = "0.45.0"

eval_toolkit-0.45.0/src/eval_toolkit/losses.py ADDED Viewed

@@ -0,0 +1,225 @@
+"""Differentiable losses for prompt-injection detector training.
+Implements :class:`RecallAtLowFPR` — the Meta Prompt Guard 2 (PG2) training
+recipe, a differentiable approximation of recall-at-fixed-FPR. Optimizes
+detector ranking at a constrained operating point (e.g. FPR ≤ 0.01)
+rather than the implicit FPR-agnostic posture of cross-entropy.
+This module is base-install safe: ``torch`` is soft-imported inside the
+class methods. ``pip install eval-toolkit[losses]`` installs torch.
+The lazy-import pattern matches the ``[probes]`` precedent (separate
+extra so callers wanting only the loss don't have to install
+transformers).
+The formulation follows the soft-rank approximation described in
+Meta's PG2 release notes and similar metric-learning losses (Liu et al.
+NeurIPS 2020 family):
+1. Compute the empirical FPR-target threshold from the negative-class
+   scores in the batch via the ``fpr_target``-th percentile.
+2. Smooth the indicator ``I(s_i >= threshold)`` with
+   ``sigmoid(beta * (s_i - threshold))`` so gradients flow.
+3. Recall@FPR ≈ ``Σ approx_indicator * y / Σ y``; the loss returned is
+   ``1 - Recall@FPR``.
+References
+----------
+.. [1] Meta. 2024. "Prompt Guard 2 — release notes & training recipe."
+.. [2] Liu, X., et al. 2020. "Black-box ranking under FPR constraints."
+       NeurIPS 2020.
+"""
+from __future__ import annotations
+from typing import Any, Literal
+__all__ = [
+    "RecallAtLowFPR",
+]
+ReductionMode = Literal["mean", "sum", "none"]
+def _require_torch() -> Any:
+    """Import torch with a copy-paste install hint if [losses] is missing."""
+    try:
+        import torch
+    except ImportError as exc:
+        raise ImportError(
+            "RecallAtLowFPR requires torch. Install with: pip install eval-toolkit[losses]"
+        ) from exc
+    return torch
+def _build_module_class() -> Any:
+    """Build the :class:`RecallAtLowFPR` ``nn.Module`` lazily.
+    Defined as a factory so importing :mod:`eval_toolkit.losses` does not
+    pull torch at module-import time. The class itself is built on first
+    instantiation; the factory caches the class on the module so repeated
+    construction is constant-time after the first call.
+    """
+    torch = _require_torch()
+    nn = torch.nn
+    # ``nn.Module`` is a runtime-constructed base; mypy can't follow the dynamic
+    # class creation. The runtime behavior is correct (nn.Module API + autograd).
+    class _RecallAtLowFPR(nn.Module):  # type: ignore[misc, name-defined]
+        def __init__(
+            self,
+            fpr_target: float = 0.01,
+            fpr_smoothing_beta: float = 10.0,
+            pos_weight: float = 1.0,
+            reduction: ReductionMode = "mean",
+        ) -> None:
+            super().__init__()
+            if not 0.0 < fpr_target <= 1.0:
+                raise ValueError(f"RecallAtLowFPR: fpr_target must be in (0, 1]; got {fpr_target}")
+            if fpr_smoothing_beta <= 0:
+                raise ValueError(
+                    f"RecallAtLowFPR: fpr_smoothing_beta must be > 0; got {fpr_smoothing_beta}"
+                )
+            if reduction not in ("mean", "sum", "none"):
+                raise ValueError(
+                    f"RecallAtLowFPR: reduction must be 'mean'|'sum'|'none'; got {reduction!r}"
+                )
+            self.fpr_target = float(fpr_target)
+            self.fpr_smoothing_beta = float(fpr_smoothing_beta)
+            self.pos_weight = float(pos_weight)
+            self.reduction = reduction
+        def forward(
+            self,
+            logits: Any,
+            labels: Any,
+        ) -> Any:
+            """Compute the (differentiable) 1 - Recall@FPR loss.
+            Parameters
+            ----------
+            logits : torch.Tensor
+                Predicted scores, shape ``(B,)`` or ``(B, 1)``. Higher
+                value → higher probability of positive class.
+            labels : torch.Tensor
+                Binary labels in ``{0, 1}``, shape ``(B,)``.
+            Returns
+            -------
+            torch.Tensor
+                Scalar (``reduction="mean"`` or ``"sum"``) or
+                per-positive-sample loss (``reduction="none"``).
+            """
+            scores = logits.squeeze(-1) if logits.dim() == 2 else logits
+            if scores.shape != labels.shape:
+                raise ValueError(
+                    f"RecallAtLowFPR: logits shape {tuple(scores.shape)} != "
+                    f"labels shape {tuple(labels.shape)}"
+                )
+            labels_f = labels.float()
+            neg_mask = labels_f < 0.5
+            pos_mask = labels_f >= 0.5
+            if not torch.any(pos_mask):
+                # No positives → recall is undefined; return zero loss with grad.
+                return scores.sum() * 0.0
+            # Threshold = (1 - fpr_target)-th quantile of negative scores.
+            # quantile is straight-through differentiable through neg_scores in PyTorch.
+            neg_scores = scores[neg_mask]
+            if neg_scores.numel() == 0:
+                # No negatives → no FPR constraint binds; threshold at -inf so
+                # everything ranks above it (recall = 1 → loss = 0).
+                threshold = scores.min().detach() - 1.0
+            else:
+                # quantile q = 1 - fpr_target means we want the score above which
+                # exactly fpr_target fraction of negatives sit.
+                q = 1.0 - self.fpr_target
+                threshold = torch.quantile(neg_scores, q)
+            # Soft indicator: sigmoid(beta * (s - t)) → near-step function as beta → ∞.
+            approx_above = torch.sigmoid(self.fpr_smoothing_beta * (scores - threshold))
+            # Recall@FPR = (Σ I(s_i ≥ t) * y_i * pos_weight) / (Σ y_i * pos_weight)
+            tp_weighted = approx_above * labels_f * self.pos_weight
+            denom = labels_f.sum() * self.pos_weight
+            recall_at_fpr = tp_weighted.sum() / denom.clamp(min=1e-9)
+            per_pos = 1.0 - approx_above[pos_mask]  # per-positive contribution
+            if self.reduction == "mean":
+                return torch.tensor(1.0, device=scores.device) - recall_at_fpr
+            if self.reduction == "sum":
+                return per_pos.sum()
+            return per_pos  # "none"
+    return _RecallAtLowFPR
+_CLASS_CACHE: dict[str, Any] = {}
+def RecallAtLowFPR(  # noqa: N802 — matches issue spec PascalCase class-like name
+    fpr_target: float = 0.01,
+    fpr_smoothing_beta: float = 10.0,
+    pos_weight: float = 1.0,
+    reduction: ReductionMode = "mean",
+) -> Any:
+    """Construct a Recall@LowFPR loss module.
+    Differentiable approximation of recall at a constrained false-positive
+    rate, per the Meta Prompt Guard 2 training recipe. Optimizes
+    detector ranking at a specific operating point (e.g. ``fpr_target=0.01``
+    → "maximize recall while keeping FPR ≤ 1%").
+    Parameters
+    ----------
+    fpr_target : float, optional
+        Target false-positive rate (operating point constraint).
+        Must be in ``(0, 1]``. Default ``0.01`` (1% FPR).
+    fpr_smoothing_beta : float, optional
+        Temperature of the soft-indicator approximation; higher values
+        make the loss sharper (closer to the hard step function) but
+        produce smaller gradients away from the threshold. Default ``10.0``.
+        Increase toward training convergence; start low for stable
+        gradient flow.
+    pos_weight : float, optional
+        Per-positive-sample weight applied to the recall numerator and
+        denominator. Default ``1.0`` (unweighted).
+    reduction : {"mean", "sum", "none"}, optional
+        How to reduce the per-positive loss. Default ``"mean"``.
+        ``"mean"`` returns the scalar ``1 - Recall@FPR`` (the canonical
+        training objective). ``"sum"`` returns the sum of per-positive
+        ``1 - approx_indicator``. ``"none"`` returns the per-positive
+        ``1 - approx_indicator`` tensor for custom downstream weighting.
+    Returns
+    -------
+    torch.nn.Module
+        The constructed loss module. Drop into any standard PyTorch
+        training loop.
+    Raises
+    ------
+    ImportError
+        If the ``[losses]`` extra is not installed.
+    ValueError
+        On invalid ``fpr_target`` / ``fpr_smoothing_beta`` / ``reduction``.
+    Examples
+    --------
+    >>> # Requires the [losses] extra.
+    >>> # import torch
+    >>> # loss = RecallAtLowFPR(fpr_target=0.01)
+    >>> # logits = torch.randn(32, requires_grad=True)
+    >>> # labels = torch.randint(0, 2, (32,))
+    >>> # loss(logits, labels).backward()
+    """
+    if "cls" not in _CLASS_CACHE:
+        _CLASS_CACHE["cls"] = _build_module_class()
+    cls = _CLASS_CACHE["cls"]
+    return cls(
+        fpr_target=fpr_target,
+        fpr_smoothing_beta=fpr_smoothing_beta,
+        pos_weight=pos_weight,
+        reduction=reduction,
+    )

eval-toolkit 0.43.0__tar.gz → 0.45.0__tar.gz

eval-toolkit 0.43.0tar.gz → 0.45.0tar.gz