PyPI - eval-toolkit - Versions diffs - 0.42.0__tar.gz → 0.44.0__tar.gz - Mend

eval-toolkit 0.42.0tar.gz → 0.44.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (171) hide show

{eval_toolkit-0.42.0 → eval_toolkit-0.44.0}/CHANGELOG.md RENAMED Viewed

@@ -5,7 +5,92 @@ All notable changes to this project will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
-## [Unreleased]
+## [0.44.0] — 2026-05-19 — Defenses + losses: Spotlighting variants + RecallAtLowFPR (closes #50, #51)
+### Added
+- `eval_toolkit.preprocessing` — new module with 3 Spotlighting
+  structural-defense variants from Hines et al. 2024
+  (arXiv 2403.14720): `delimit(text, delimiter='<<')`,
+  `datamark(text, marker='^')`, `encode(text, encoding='base64')`,
+  plus a `sweep(texts, variants=..., kwargs=...)` batch wrapper that
+  returns a `(N*3)`-row DataFrame. Includes a `spotlighting`
+  SimpleNamespace exposing the upstream issue's function-style API
+  (`spotlighting.delimit(text)`, etc.). Base-install safe (pure
+  stdlib). Closes #51.
+- `eval_toolkit.losses` — new module with `RecallAtLowFPR` — the
+  Meta Prompt Guard 2 (PG2) training recipe: a differentiable
+  approximation of recall-at-fixed-FPR via soft-rank, returning a
+  scalar `torch.nn.Module` loss for use in standard training loops.
+  Optimizes detector ranking at a constrained operating point
+  (e.g. `fpr_target=0.01` → "maximize recall while keeping FPR ≤ 1%").
+  Closes #50.
+- New optional extra `[losses] = torch>=2.0`. Granular per the v0.43
+  plan Decision 4 — separated from `[probes]` so callers wanting only
+  the loss don't have to install the larger transformers stack.
+  Shares the torch version pin with `[probes]`.
+## [0.43.0] — 2026-05-19 — P1 batch: OOD manifest loader + character_injection sweep + ActivationDeltaProbe (closes #48, #49, #53)
+### Added
+- `ood_dataset_from_manifest(yaml_path, slices=..., cache_dir=...)` —
+  declarative loader for multiple OOD eval slates (BIPIA, AgentDojo,
+  InjecAgent, NotInject, PINT, LLMail-Inject, …) into a single
+  unified DataFrame with columns `text` / `label` / `source` /
+  `row_id` / `sha`. Bytes are downloaded once, sha256-verified
+  against the manifest, and cached on-disk keyed by content hash
+  (default `~/.cache/eval-toolkit/ood/`). Closes #48 — drops the
+  per-source loader boilerplate carried by
+  `prompt-injection-portfolio` and `prompt-injection-detection-submission`.
+- `OodManifestLoader` — `DatasetLoader`-Protocol-compliant wrapper
+  around the factory, returning `{"all": EvalSlice}` with
+  `source` as the default strata column for harness pipelines.
+- `src/eval_toolkit/schemas/ood_manifest.v1.json` — Draft 2020-12
+  JSON schema for the OOD manifest YAML; auto-validated by
+  `uv run eval-toolkit schemas check`.
+- `eval_toolkit.adversarial` — new module with character-injection
+  bypass suite (Microsoft Research 2024, arXiv 2404.13208).
+  Six core techniques shipped as frozen-dataclass strategies:
+  `ZeroWidthSpaceInjection`, `HomoglyphSubstitution`,
+  `DiacriticInjection`, `WhitespaceInjection`, `CaseRandomization`,
+  `PunctuationInjection`. All implement a `CharacterInjectionStrategy`
+  Protocol with `transform(text: str) -> str`. Six advanced techniques
+  (bidi RTL, tag stripping, synonym, token splitting, Unicode
+  normalization, invisible chars) scheduled for v0.43.1 — the sweep
+  API stabilizes in v0.43.0 so the v0.43.1 additions are pure
+  extensions. Closes #49 (core-6).
+- `adversarial.sweep(texts, scorer, techniques="all", threshold=0.5)`
+  — Scorer-Protocol-compliant adversarial-robustness sweep. Returns
+  a DataFrame with `(text_id, technique, original_score,
+  transformed_score, asr)` rows for matrix analysis.
+  Aggregate ASR with `df.groupby("technique")["asr"].mean()`.
+- `adversarial.character_injection` — `SimpleNamespace` exposing the
+  function-style API from the upstream issue spec
+  (`character_injection.zero_width_space(text)`,
+  `character_injection.sweep(...)`, etc.).
+- `eval_toolkit.probes` — new module with `ActivationDeltaProbe`:
+  TaskTracker-style linear probe over HuggingFace transformer
+  hidden-state activation deltas (Abdelnabi et al. 2024,
+  arXiv 2406.00799). Backbone-agnostic (encoder OR decoder).
+  Sklearn-compatible API: `.fit(clean_texts, injected_texts)`,
+  `.predict()` → `(n,)`, `.predict_proba()` → `(n, 2)`,
+  `.coef_`, `.classes_`. Activations cached to
+  `$XDG_CACHE_HOME/eval-toolkit/probes/` keyed by
+  `(backbone, layer_index, aggregate, sha256(text))` so re-runs are
+  near-instant. Aggregate modes: `mean`, `max`, `cls`. Closes #53.
+- `Probe` Protocol — minimal sklearn-shaped probe surface (`fit`,
+  `predict`, `predict_proba`, `coef_`, `classes_`). Distinct from
+  `Scorer` (which returns 1-D `P(positive)`); wrap with
+  `lambda p, X: p.predict_proba(X)[:, 1]` to adapt.
+- `ActivationExtractor` Protocol — pluggable hidden-state-extraction
+  contract for `ActivationDeltaProbe`; injectable for tests to avoid
+  loading a real backbone.
+- New optional extra `[probes] = torch>=2.0, transformers>=4.40`.
+  Follows the `[embeddings]` precedent — opt-in only, NOT in
+  `[all]` or `[dev]`, since the transitive install is ~600MB+.
+  Module is base-install-safe: a friendly `ImportError` fires only
+  if you try to use the default HF extractor without the extra.
 ## [0.42.0] — 2026-05-19 — fit_isotonic_binary completes 4-calibrator family (closes #44)

{eval_toolkit-0.42.0 → eval_toolkit-0.44.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: eval-toolkit
-Version: 0.42.0
+Version: 0.44.0
 Summary: Reusable evaluation contracts for binary classification: metrics, bootstrap CIs, calibration, artifacts, and evidence gates.
 Project-URL: Homepage, https://github.com/brandon-behring/eval-toolkit
 Project-URL: Documentation, https://brandon-behring.github.io/eval-toolkit/
@@ -62,11 +62,16 @@ Requires-Dist: sphinx-design>=0.6; extra == 'docs'
 Requires-Dist: sphinx>=7.3; extra == 'docs'
 Provides-Extra: embeddings
 Requires-Dist: sentence-transformers>=3.0; extra == 'embeddings'
+Provides-Extra: losses
+Requires-Dist: torch>=2.0; extra == 'losses'
 Provides-Extra: parquet
 Requires-Dist: pyarrow>=15.0; extra == 'parquet'
 Provides-Extra: plotting
 Requires-Dist: matplotlib>=3.8; extra == 'plotting'
 Requires-Dist: pillow>=10.0; extra == 'plotting'
+Provides-Extra: probes
+Requires-Dist: torch>=2.0; extra == 'probes'
+Requires-Dist: transformers>=4.40; extra == 'probes'
 Provides-Extra: property
 Requires-Dist: hypothesis>=6.100; extra == 'property'
 Provides-Extra: transformers
@@ -308,22 +313,6 @@ tests with large `max_examples` and a few bootstrap tests with
 `n_resamples >= 200`). `make fast` keeps the developer iteration loop
 under ~30 seconds.
-## Downstream contract testing (v4 sibling-smoke)
-A separate CI workflow (`.github/workflows/v4-smoke.yml`) checks out
-the downstream consumer `prompt-injection-v4` at `main`, installs it
-with this branch's eval-toolkit as an editable sibling dep (via v4's
-`[tool.uv.sources]`), and runs v4's fast `-m smoke` suite. This catches
-contract regressions at PR time rather than in v4's own CI post-merge.
-The workflow requires a `HF_TOKEN` repo secret (gated HuggingFace
-datasets used by v4's smoke fixtures). Set it at:
-`https://github.com/brandon-behring/eval-toolkit/settings/secrets/actions`
-The workflow runs with `continue-on-error: true` during a 2-3 week
-trial period; it'll be promoted to a required gate once the false-
-positive rate (from independent v4 main breakage or HF rate-limits)
-is characterized.
 ## Standards

{eval_toolkit-0.42.0 → eval_toolkit-0.44.0}/README.md RENAMED Viewed

@@ -230,22 +230,6 @@ tests with large `max_examples` and a few bootstrap tests with
 `n_resamples >= 200`). `make fast` keeps the developer iteration loop
 under ~30 seconds.
-## Downstream contract testing (v4 sibling-smoke)
-A separate CI workflow (`.github/workflows/v4-smoke.yml`) checks out
-the downstream consumer `prompt-injection-v4` at `main`, installs it
-with this branch's eval-toolkit as an editable sibling dep (via v4's
-`[tool.uv.sources]`), and runs v4's fast `-m smoke` suite. This catches
-contract regressions at PR time rather than in v4's own CI post-merge.
-The workflow requires a `HF_TOKEN` repo secret (gated HuggingFace
-datasets used by v4's smoke fixtures). Set it at:
-`https://github.com/brandon-behring/eval-toolkit/settings/secrets/actions`
-The workflow runs with `continue-on-error: true` during a 2-3 week
-trial period; it'll be promoted to a required gate once the false-
-positive rate (from independent v4 main breakage or HF rate-limits)
-is characterized.
 ## Standards

{eval_toolkit-0.42.0 → eval_toolkit-0.44.0}/pyproject.toml RENAMED Viewed

@@ -63,6 +63,17 @@ embeddings = ["sentence-transformers>=3.0"]
 # itself does not import transformers, so the optional install is
 # strictly for callers wanting AutoTokenizer.from_pretrained(...).
 transformers = ["transformers>=4.0"]
+# v0.43.0: ActivationDeltaProbe (TaskTracker-style linear activation probe;
+# closes #53). Pulls torch + transformers (~600MB+ transitive). Follows
+# the [embeddings] precedent: opt-in only, NOT in [all] / [dev]. Module
+# is base-install-safe (lazy imports inside ActivationDeltaProbe methods);
+# the extra is strictly for callers wanting to actually fit / predict.
+probes = ["torch>=2.0", "transformers>=4.40"]
+# v0.44.0: RecallAtLowFPR loss (Meta Prompt Guard 2 recipe; closes #50).
+# torch-only (no transformers); separated from [probes] per Decision 4
+# (granular extras — losses callers should not have to install the larger
+# transformers stack). Shares the torch version pin with [probes].
+losses = ["torch>=2.0"]
 # DEPRECATED (announced v0.30.1, removal v0.33.0).
 #
 # Retained as a transitive no-op so `pip install eval-toolkit[validation]`
@@ -171,7 +182,7 @@ warn_no_return = true
 strict_equality = true
 [[tool.mypy.overrides]]
-module = ["scipy.*", "sklearn.*", "matplotlib.*", "pandas.*", "yaml.*", "sentence_transformers.*", "joblib.*"]
+module = ["scipy.*", "sklearn.*", "matplotlib.*", "pandas.*", "yaml.*", "sentence_transformers.*", "joblib.*", "torch.*", "transformers.*"]
 ignore_missing_imports = true
 [tool.pytest.ini_options]

{eval_toolkit-0.42.0 → eval_toolkit-0.44.0}/src/eval_toolkit/__init__.py RENAMED Viewed

@@ -30,6 +30,27 @@ _logging.getLogger("eval_toolkit").addHandler(_logging.NullHandler())
 # dividers below are informational only; the snapshot in
 # tests/golden/public_api/ reads dict keys + values, not comments.
 _EXPORTS: dict[str, str] = {
+    # --- adversarial ---
+    "CORE_TECHNIQUES": "eval_toolkit.adversarial",
+    "CaseRandomization": "eval_toolkit.adversarial",
+    "CharacterInjectionStrategy": "eval_toolkit.adversarial",
+    "DiacriticInjection": "eval_toolkit.adversarial",
+    "HomoglyphSubstitution": "eval_toolkit.adversarial",
+    "PunctuationInjection": "eval_toolkit.adversarial",
+    "WhitespaceInjection": "eval_toolkit.adversarial",
+    "ZeroWidthSpaceInjection": "eval_toolkit.adversarial",
+    "character_injection": "eval_toolkit.adversarial",
+    # --- losses ---
+    "RecallAtLowFPR": "eval_toolkit.losses",
+    # --- preprocessing ---
+    "datamark": "eval_toolkit.preprocessing",
+    "delimit": "eval_toolkit.preprocessing",
+    "encode": "eval_toolkit.preprocessing",
+    "spotlighting": "eval_toolkit.preprocessing",
+    # --- probes ---
+    "ActivationDeltaProbe": "eval_toolkit.probes",
+    "ActivationExtractor": "eval_toolkit.probes",
+    "Probe": "eval_toolkit.probes",
     # --- analysis ---
     "CsvPredictionReader": "eval_toolkit.analysis",
     "JsonlPredictionReader": "eval_toolkit.analysis",
@@ -156,8 +177,10 @@ _EXPORTS: dict[str, str] = {
     "DataFrameLoader": "eval_toolkit.loaders",
     "DatasetLoader": "eval_toolkit.loaders",
     "HFDatasetsLoader": "eval_toolkit.loaders",
+    "OodManifestLoader": "eval_toolkit.loaders",
     "ParquetGlobLoader": "eval_toolkit.loaders",
     "SingleSliceLoader": "eval_toolkit.loaders",
+    "ood_dataset_from_manifest": "eval_toolkit.loaders",
     # --- manifest ---
     "MANIFEST_SCHEMA_VERSION": "eval_toolkit.manifest",
     "RunManifest": "eval_toolkit.manifest",

{eval_toolkit-0.42.0 → eval_toolkit-0.44.0}/src/eval_toolkit/_version.py RENAMED Viewed

@@ -2,4 +2,4 @@
 __all__ = ["__version__"]
-__version__ = "0.42.0"
+__version__ = "0.44.0"

eval-toolkit 0.42.0__tar.gz → 0.44.0__tar.gz

eval-toolkit 0.42.0tar.gz → 0.44.0tar.gz