PyPI - eval-toolkit - Versions diffs - 0.34.0__tar.gz → 0.35.0__tar.gz - Mend

eval-toolkit 0.34.0tar.gz → 0.35.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (156) hide show

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/CHANGELOG.md RENAMED Viewed

@@ -7,6 +7,39 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ## [Unreleased]
+## [0.35.0] — 2026-05-18 — `fit_temperature_binary` + Scorer picklability ADR
+Small, additive release. Adds a binary-classification calibration helper
+that lets consumers drop the ~50 LOC scalar-proba adapter many were
+carrying, plus a design ADR that unblocks the v0.36 harness / operating-
+point parallelization work (#29, #30) without re-litigating picklability.
+### Added
+- `eval_toolkit.fit_temperature_binary(y_true, y_score)` — scalar-proba
+  adapter for the multi-class `fit_temperature` fitter. Converts `(n,)`
+  probabilities of class 1 to a 2-column logit array via clipped logit
+  (`[0, logit(p)]` so softmax row 1 reproduces `p`), delegates to the
+  deployment-quality fitter, and returns `(T_opt, apply)` where
+  `apply: (n,) -> (n,)` does scalar-in / scalar-out T-scaling. Unlike
+  `fit_temperature_oracle`, no warning — the contract assumes val / test
+  separation (deployment-quality calibration, not fit-on-test). Closes
+  #28.
+### Documentation
+- `docs/source/methodology/parallelism.md` — new `## Scorer picklability`
+  sub-section documenting the Scorer protocol's picklability contract
+  for `n_jobs > 1` usage. Includes worked picklable / broken-closure /
+  fix examples plus a list of common non-picklable patterns to watch for
+  in user-supplied Scorers (closures, lambdas on instances, local-scope
+  classes, attributes holding live sockets / file handles). Anchors on
+  the existing v0.34.0 `parallel_map` pickle sniff + `TypeError`
+  channel — no new exception class. Unblocks v0.36 implementation of
+  #29 and #30.
+- `eval_toolkit.protocols.Scorer` docstring — Notes block pointing at
+  the new methodology section.
 ## [0.34.0] — 2026-05-17 — Phase 4 stats unblockers + unified parallelism + cookbook (BREAKING)
 Closes all 7 open backlog issues in one consumer-closing release. Also

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: eval-toolkit
-Version: 0.34.0
+Version: 0.35.0
 Summary: Reusable evaluation contracts for binary classification: metrics, bootstrap CIs, calibration, artifacts, and evidence gates.
 Project-URL: Homepage, https://github.com/brandon-behring/eval-toolkit
 Project-URL: Documentation, https://brandon-behring.github.io/eval-toolkit/

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/src/eval_toolkit/__init__.py RENAMED Viewed

@@ -87,6 +87,7 @@ _EXPORTS: dict[str, str] = {
     "fit_isotonic_calibrator": "eval_toolkit.calibration",
     "fit_platt_calibrator": "eval_toolkit.calibration",
     "fit_temperature": "eval_toolkit.calibration",
+    "fit_temperature_binary": "eval_toolkit.calibration",
     "fit_temperature_oracle": "eval_toolkit.calibration",
     "reliability_curve": "eval_toolkit.calibration",
     "reliability_diagram_data": "eval_toolkit.calibration",

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/src/eval_toolkit/_version.py RENAMED Viewed

@@ -2,4 +2,4 @@
 __all__ = ["__version__"]
-__version__ = "0.34.0"
+__version__ = "0.35.0"

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/src/eval_toolkit/calibration.py RENAMED Viewed

@@ -57,6 +57,7 @@ __all__ = [
     "fit_isotonic_calibrator",
     "fit_platt_calibrator",
     "fit_temperature",
+    "fit_temperature_binary",
     "fit_temperature_oracle",
     "maximum_calibration_error",
     "reliability_curve",
@@ -1038,6 +1039,102 @@ def _negative_log_likelihood(t: float, logits: np.ndarray, labels: np.ndarray) -
     return float(-log_probs[np.arange(len(labels)), labels].mean())
+def fit_temperature_binary(
+    y_true: np.ndarray,
+    y_score: np.ndarray,
+    *,
+    bounds: tuple[float, float] = (0.05, 20.0),
+) -> tuple[float, Callable[[np.ndarray], np.ndarray]]:
+    r"""Binary-probability adapter for :func:`fit_temperature` (Guo et al. 2017 [#guo]_).
+    Fits a scalar T > 0 on *validation* probabilities of class 1 and returns
+    both T and a callable that applies the same T-scaling to test
+    probabilities. Internally:
+    1. Clips ``y_score`` to ``[1e-7, 1-1e-7]`` for finite logit inversion.
+    2. Builds a 2-column logit array ``[0, logit(p)]`` so softmax row 1
+       reproduces ``p`` exactly.
+    3. Delegates to :func:`fit_temperature` for the bounded NLL minimization.
+    4. Returns ``(T, apply)`` where ``apply(p_test) = sigmoid(logit(p_test)/T)``.
+    Unlike :func:`fit_temperature_oracle`, this does NOT emit a warning — the
+    contract is that ``y_true`` / ``y_score`` come from a held-out validation
+    set and ``apply`` is invoked on a separate test set (deployment-quality
+    calibration, not fit-on-test).
+    Parameters
+    ----------
+    y_true : np.ndarray, shape (n,)
+        Binary validation labels in {0, 1}.
+    y_score : np.ndarray, shape (n,)
+        Validation predicted probabilities of class 1, in [0, 1]. Values at
+        the extremes are clipped to ``[1e-7, 1 - 1e-7]``.
+    bounds : tuple of float, optional
+        ``(lo, hi)`` bracket for T. Default ``(0.05, 20.0)``, matches
+        :func:`fit_temperature`.
+    Returns
+    -------
+    tuple
+        ``(T_optimal, apply)`` where ``apply: (n,) -> (n,)`` maps any input
+        probability array through :math:`\sigma(\mathrm{logit}(p) / T)`.
+    Raises
+    ------
+    ValueError
+        On shape mismatch, empty input, non-finite scores, or single-class
+        ``y_true``.
+    RuntimeError
+        If the bounded scalar optimizer fails to converge.
+    Examples
+    --------
+    >>> import numpy as np
+    >>> rng = np.random.default_rng(0)
+    >>> n = 500
+    >>> y_val = rng.binomial(1, 0.3, size=n).astype(int)
+    >>> p_val = np.clip(y_val * 0.6 + rng.normal(0, 0.2, n), 0.01, 0.99)
+    >>> T, apply = fit_temperature_binary(y_val, p_val)
+    >>> T > 0
+    True
+    >>> p_test = np.array([0.1, 0.5, 0.9])
+    >>> apply(p_test).shape == (3,)
+    True
+    See Also
+    --------
+    fit_temperature : underlying multi-class fitter (operates on 2-col logits)
+    fit_temperature_oracle : diagnostic-only variant that fits T on the same
+        probabilities it scores
+    References
+    ----------
+    .. [#guo] Guo, C., Pleiss, G., Sun, Y., & Weinberger, K. Q. "On
+       calibration of modern neural networks." ICML 2017. arXiv:1706.04599.
+    """
+    y_true_arr, y_score_arr = _validate_calibrator_inputs(y_true, y_score)
+    # Build 2-col logits [0, logit(p)] so softmax([0, logit(p)])[1] == p exactly.
+    s_clipped = np.clip(y_score_arr, _SCORE_CLIP_LO, _SCORE_CLIP_HI)
+    logit_pos = np.log(s_clipped / (1.0 - s_clipped))
+    val_logits_2col = np.column_stack([np.zeros_like(logit_pos), logit_pos])
+    result = fit_temperature(val_logits_2col, y_true_arr, bounds=bounds)
+    t_optimal = float(result["temperature"])
+    def apply(scores: np.ndarray) -> np.ndarray:
+        arr = np.asarray(scores, dtype=float).ravel()
+        if not np.isfinite(arr).all():
+            raise ValueError("scores contains NaN or inf")
+        clipped = np.clip(arr, _SCORE_CLIP_LO, _SCORE_CLIP_HI)
+        logit = np.log(clipped / (1.0 - clipped))
+        scaled = logit / t_optimal
+        out: np.ndarray = (1.0 / (1.0 + np.exp(-scaled))).astype(float)
+        return out
+    return t_optimal, apply
 def fit_temperature_oracle(
     y_true: np.ndarray, y_score: np.ndarray
 ) -> tuple[float, Callable[[np.ndarray], np.ndarray]]:

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/src/eval_toolkit/protocols.py RENAMED Viewed

@@ -31,6 +31,16 @@ class Scorer(Protocol):
     Accepts ``list[str]``, ``np.ndarray``, or ``pd.Series`` of features.
     Pandas is imported under ``TYPE_CHECKING`` only, so this Protocol
     has no runtime pandas dependency.
+    Notes
+    -----
+    When passed to a parallel-capable harness call (``n_jobs > 1``), Scorer
+    instances MUST be picklable — joblib's loky backend serializes the entire
+    delayed call (function plus bound arguments) before worker dispatch.
+    Closures, lambdas, local-scope classes, and attributes holding live
+    sockets / file handles break pickling. See
+    ``docs/source/methodology/parallelism.md#scorer-picklability`` for the
+    full contract and worked examples.
     """
     def predict_proba(  # pragma: no cover

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/golden/public_api/snapshot.json RENAMED Viewed

@@ -137,6 +137,7 @@
     "fit_operating_points",
     "fit_platt_calibrator",
     "fit_temperature",
+    "fit_temperature_binary",
     "fit_temperature_oracle",
     "from_yaml",
     "frozen_config",
@@ -1016,7 +1017,7 @@
       "doc_first_line": "str(object='') -> str",
       "kind": "value",
       "type": "str",
-      "value": "'0.34.0'"
+      "value": "'0.35.0'"
     },
     "apply_operating_points": {
       "doc_first_line": "Apply fitted thresholds to a mixed-class or single-class target slice.",
@@ -1203,6 +1204,11 @@
       "kind": "function",
       "signature": "(val_logits: 'np.ndarray', val_labels: 'np.ndarray', bounds: 'tuple[float, float]' = (0.05, 20.0)) -> 'dict[str, float]'"
     },
+    "fit_temperature_binary": {
+      "doc_first_line": "Binary-probability adapter for :func:`fit_temperature` (Guo et al. 2017 [#guo]_).",
+      "kind": "function",
+      "signature": "(y_true: 'np.ndarray', y_score: 'np.ndarray', *, bounds: 'tuple[float, float]' = (0.05, 20.0)) -> 'tuple[float, Callable[[np.ndarray], np.ndarray]]'"
+    },
     "fit_temperature_oracle": {
       "doc_first_line": "**DIAGNOSTIC ONLY** \u2014 fit-on-test oracle T-scaling per Guo et al. 2017 [#guo]_.",
       "kind": "function",

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/test_calibration_unit.py RENAMED Viewed

@@ -16,6 +16,7 @@ from eval_toolkit.calibration import (
     fit_isotonic_calibrator,
     fit_platt_calibrator,
     fit_temperature,
+    fit_temperature_binary,
     fit_temperature_oracle,
     maximum_calibration_error,
     reliability_curve,
@@ -361,3 +362,128 @@ def test_fit_platt_matches_sklearn_canonical() -> None:
     ours_out = ours(grid)
     sk_out = sk_cal.predict(grid)
     np.testing.assert_allclose(ours_out, sk_out, atol=1e-6, rtol=1e-6)
+# --- fit_temperature_binary (#28) -------------------------------------------------
+@pytest.mark.unit
+def test_fit_temperature_binary_runs(well_separated: tuple[np.ndarray, np.ndarray]) -> None:
+    """Smoke test: returns positive T + callable; calibrated outputs in (0, 1)."""
+    y, s = well_separated
+    s_clipped = np.clip(s, 0.01, 0.99)
+    T, apply = fit_temperature_binary(y, s_clipped)
+    assert T > 0
+    out = apply(s_clipped)
+    assert out.shape == s_clipped.shape  # scalar (n,) in/out contract
+    assert (out > 0.0).all() and (out < 1.0).all()
+@pytest.mark.unit
+def test_fit_temperature_binary_shape_contract() -> None:
+    """Apply returns shape (n,), never (n, 2). Guards against 2-col regressions."""
+    rng = np.random.default_rng(0)
+    y = rng.binomial(1, 0.3, size=200).astype(int)
+    s = np.clip(y * 0.6 + rng.normal(0, 0.2, 200), 0.01, 0.99)
+    _, apply = fit_temperature_binary(y, s)
+    for shape in [(1,), (3,), (50,)]:
+        p_test = rng.uniform(0.05, 0.95, size=shape)
+        assert apply(p_test).shape == shape
+@pytest.mark.unit
+def test_fit_temperature_binary_handles_extremes() -> None:
+    """Probas at exactly 0 and 1 produce finite outputs (clipping covers the logit pole).
+    Contract: ``logit(0)`` and ``logit(1)`` are infinite, but the internal
+    clipping to ``[1e-7, 1-1e-7]`` keeps the math finite. Outputs may hit the
+    float64 boundary (0.0 or 1.0) at extreme inputs with small T — that is
+    correct behavior, not a violation. The real failure mode this test guards
+    against is ``inf`` / ``nan`` in either fit or apply.
+    """
+    rng = np.random.default_rng(0)
+    n = 200
+    y = rng.binomial(1, 0.5, size=n).astype(int)
+    s = y.astype(float)  # exact 0s and 1s in val data
+    T, apply = fit_temperature_binary(y, s)
+    assert np.isfinite(T)
+    # Apply to extremes — must be finite + in [0, 1] (boundary-inclusive)
+    p_test = np.array([0.0, 0.5, 1.0])
+    out = apply(p_test)
+    assert np.isfinite(out).all()
+    assert (out >= 0.0).all() and (out <= 1.0).all()
+@pytest.mark.unit
+def test_fit_temperature_binary_parity_with_multiclass() -> None:
+    """fit_temperature_binary(y, p) matches manual fit_temperature(2-col-logits, y).
+    Establishes the contract that the binary adapter is a thin wrapper, not a
+    re-implementation: identical T, identical applied probabilities.
+    """
+    rng = np.random.default_rng(7)
+    n = 400
+    y = rng.binomial(1, 0.4, size=n).astype(int)
+    p_val = np.clip(y * 0.5 + rng.normal(0, 0.25, n), 0.01, 0.99)
+    p_test = rng.uniform(0.05, 0.95, size=50)
+    T_binary, apply_binary = fit_temperature_binary(y, p_val)
+    # Manual multi-class path: build 2-col logits, fit T, apply via softmax row 1.
+    logit_val = np.log(p_val / (1.0 - p_val))
+    val_logits_2col = np.column_stack([np.zeros_like(logit_val), logit_val])
+    result_mc = fit_temperature(val_logits_2col, y)
+    T_mc = result_mc["temperature"]
+    logit_test = np.log(p_test / (1.0 - p_test))
+    test_logits_2col = np.column_stack([np.zeros_like(logit_test), logit_test]) / T_mc
+    # softmax row 1 = exp(z1) / (exp(0) + exp(z1)) = sigmoid(z1)
+    expected = 1.0 / (1.0 + np.exp(-test_logits_2col[:, 1]))
+    assert T_binary == pytest.approx(T_mc, rel=1e-9)
+    np.testing.assert_allclose(apply_binary(p_test), expected, rtol=1e-9, atol=1e-12)
+@pytest.mark.unit
+def test_fit_temperature_binary_improves_nll() -> None:
+    """T_post NLL ≤ T_pre NLL (T=1 is always a feasible point in the bracket)."""
+    rng = np.random.default_rng(0)
+    n = 500
+    y = rng.binomial(1, 0.4, size=n).astype(int)
+    # Overconfident probabilities: push away from 0.5
+    raw = y * 0.7 + rng.normal(0, 0.15, n)
+    p = np.clip(0.5 + 2.5 * (raw - 0.5), 0.01, 0.99)
+    T, apply = fit_temperature_binary(y, p)
+    eps = 1e-12
+    def _binary_nll(probs: np.ndarray, labels: np.ndarray) -> float:
+        c = np.clip(probs, eps, 1 - eps)
+        return float(-(labels * np.log(c) + (1 - labels) * np.log(1 - c)).mean())
+    nll_pre = _binary_nll(p, y)
+    nll_post = _binary_nll(apply(p), y)
+    assert nll_post <= nll_pre + 1e-9
+@pytest.mark.unit
+def test_fit_temperature_binary_validates() -> None:
+    """Error paths inherit from _validate_calibrator_inputs."""
+    with pytest.raises(ValueError, match="shape mismatch"):
+        fit_temperature_binary(np.zeros(5, dtype=int), np.zeros(7))
+    with pytest.raises(ValueError, match="empty"):
+        fit_temperature_binary(np.array([], dtype=int), np.array([]))
+    with pytest.raises(ValueError, match="NaN or inf"):
+        fit_temperature_binary(np.array([0, 1, 0, 1]), np.array([0.1, np.nan, 0.3, 0.7]))
+    with pytest.raises(ValueError, match="both classes"):
+        fit_temperature_binary(np.ones(50, dtype=int), np.linspace(0.1, 0.9, 50))
+@pytest.mark.unit
+def test_fit_temperature_binary_apply_rejects_nonfinite() -> None:
+    """Apply rejects non-finite test-time scores (does not silently mask)."""
+    rng = np.random.default_rng(0)
+    y = rng.binomial(1, 0.3, size=200).astype(int)
+    s = np.clip(y * 0.6 + rng.normal(0, 0.2, 200), 0.01, 0.99)
+    _, apply = fit_temperature_binary(y, s)
+    with pytest.raises(ValueError, match="NaN or inf"):
+        apply(np.array([0.5, np.nan, 0.7]))

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/.gitignore RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/LICENSE RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/README.md RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/STYLE.md RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/docs/archive/README.md RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/docs/research/README.md RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/docs/research/datasets/README.md RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/docs/research/papers/data-integrity/README.md RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/docs/research/papers/eval-ecosystem/README.md RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/docs/research/papers/inference/README.md RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/docs/research/papers/prompt-injection/README.md RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/docs/source/methodology/README.md RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/pyproject.toml RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/src/eval_toolkit/__main__.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/src/eval_toolkit/_deprecated.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/src/eval_toolkit/_parallel.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/src/eval_toolkit/analysis.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/src/eval_toolkit/artifacts.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/src/eval_toolkit/bootstrap.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/src/eval_toolkit/claims.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/src/eval_toolkit/config.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/src/eval_toolkit/docs.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/src/eval_toolkit/embeddings.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/src/eval_toolkit/evidence.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/src/eval_toolkit/harness.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/src/eval_toolkit/leakage.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/src/eval_toolkit/loaders.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/src/eval_toolkit/manifest.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/src/eval_toolkit/metrics.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/src/eval_toolkit/operating_points.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/src/eval_toolkit/paths.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/src/eval_toolkit/plotting.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/src/eval_toolkit/provenance.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/src/eval_toolkit/py.typed RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/src/eval_toolkit/schemas/manifest.v1.json RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/src/eval_toolkit/schemas/manifest.v2.json RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/src/eval_toolkit/schemas/manifest.v3.json RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/src/eval_toolkit/schemas/results.v1.json RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/src/eval_toolkit/schemas/results_full.v1.json RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/src/eval_toolkit/seeds.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/src/eval_toolkit/splits.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/src/eval_toolkit/text_dedup.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/src/eval_toolkit/thresholds.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/baseline/test_plotting_visual/plot_bootstrap_distribution.png RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/baseline/test_plotting_visual/plot_confusion_matrix_grid.png RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/baseline/test_plotting_visual/plot_lift_ci.png RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/baseline/test_plotting_visual/plot_metric_bars.png RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/baseline/test_plotting_visual/plot_pareto_frontier.png RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/baseline/test_plotting_visual/plot_pr_curve.png RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/baseline/test_plotting_visual/plot_reliability_diagram.png RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/baseline/test_plotting_visual/plot_roc_curve.png RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/baseline/test_plotting_visual/plot_score_histograms.png RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/baseline/test_plotting_visual/plot_slice_metric_heatmap.png RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/benchmarks/__init__.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/benchmarks/test_kernel_benchmarks.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/conftest.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/golden/bootstrap_ci/cases.json RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/golden/data/dedup_holdout.jsonl RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/golden/data/dedup_holdout_expected.json RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/golden/data/dedup_holdout_provenance.md RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/golden/docs/expected.md RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/golden/docs/input.md RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/golden/docs/metrics.json RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/golden/test_dedup_holdout_calibration.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/strategies.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/test_analysis.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/test_artifacts.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/test_block_bootstrap_on_folds.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/test_bootstrap_calibration_mc.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/test_bootstrap_edge_cases.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/test_bootstrap_golden.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/test_bootstrap_njobs.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/test_bootstrap_props.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/test_bootstrap_research_grounded.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/test_bootstrap_unit.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/test_calibration_bootstrap_chain.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/test_calibration_determinism.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/test_calibration_optimization_failures.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/test_calibration_props.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/test_calibration_research_grounded.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/test_claims.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/test_claims_coverage.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/test_claims_props.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/test_cli.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/test_config.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/test_coverage_bootstrap.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/test_coverage_calibration.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/test_coverage_harness.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/test_coverage_metrics.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/test_coverage_plotting.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/test_dedup_split_leakage_chain.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/test_deprecations.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/test_docs_golden.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/test_docs_props.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/test_embeddings.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/test_evidence_validators.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/test_harness_edge_cases.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/test_harness_fault_injection.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/test_harness_folded.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/test_harness_internals.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/test_harness_metric_options.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/test_harness_smoke.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/test_import_boundaries.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/test_leakage.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/test_leakage_error_paths.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/test_leakage_props.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/test_loaders.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/test_loaders_coverage.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/test_loaders_props.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/test_logging.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/test_manifest.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/test_manifest_contamination_round_trip.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/test_manifest_props.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/test_manifest_validation.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/test_metrics_props.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/test_metrics_stratified_subsets.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/test_metrics_unit.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/test_misc_coverage.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/test_numeric_edge_cases.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/test_operating_points.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/test_operating_points_props.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/test_parallel.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/test_paths.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/test_pipeline_e2e.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/test_plotting_edge.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/test_plotting_smoke.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/test_plotting_visual.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/test_protocol_conformance.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/test_provenance.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/test_public_api.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/test_recall_at_fpr.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/test_reference_equivalence.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/test_reproducibility_integration.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/test_schemas.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/test_seeds.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/test_splits.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/test_splits_leakage_integration.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/test_splits_props.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/test_text_dedup.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/test_text_dedup_coverage.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/test_text_dedup_props.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/test_text_dedup_strategies.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/test_thresholds.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/test_thresholds_constant_score.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/test_thresholds_coverage.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/test_thresholds_props.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/test_thresholds_research_grounded.py RENAMED Viewed

File without changes

{eval_toolkit-0.34.0 → eval_toolkit-0.35.0}/tests/test_v09_contracts.py RENAMED Viewed

File without changes

eval-toolkit 0.34.0__tar.gz → 0.35.0__tar.gz

eval-toolkit 0.34.0tar.gz → 0.35.0tar.gz