PyPI - proofbundle - Versions diffs - 0.5.0__tar.gz → 0.6.0__tar.gz - Mend

proofbundle 0.5.0tar.gz → 0.6.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (43) hide show

{proofbundle-0.5.0/src/proofbundle.egg-info → proofbundle-0.6.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: proofbundle
-Version: 0.5.0
+Version: 0.6.0
 Summary: Emit and verify portable cryptographic evidence bundles, offline: Ed25519 + RFC 6962 Merkle + optional SD-JWT.
 Author: Konrad Gruszka
 License: MIT
@@ -55,17 +55,19 @@ signed and anchored in a tamper-evident log — and optionally carries a
 selectively disclosable credential. Pure Python, no server, no daemon, one JSON file.**
 [![CI](https://github.com/b7n0de/proofbundle/actions/workflows/ci.yml/badge.svg)](https://github.com/b7n0de/proofbundle/actions/workflows/ci.yml)
-[![PyPI](https://img.shields.io/pypi/v/proofbundle.svg?color=D6248A)](https://pypi.org/project/proofbundle/)
-[![Python](https://img.shields.io/pypi/pyversions/proofbundle.svg?color=D6248A)](https://pypi.org/project/proofbundle/)
+[![PyPI](https://img.shields.io/pypi/v/proofbundle.svg?color=D6248A&cacheSeconds=3600)](https://pypi.org/project/proofbundle/)
+[![Python](https://img.shields.io/pypi/pyversions/proofbundle.svg?color=D6248A&cacheSeconds=3600)](https://pypi.org/project/proofbundle/)
+[![Downloads](https://static.pepy.tech/badge/proofbundle)](https://pepy.tech/project/proofbundle)
 [![License: MIT](https://img.shields.io/badge/license-MIT-D6248A.svg)](LICENSE)
 [![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)
 [![SLSA build provenance](https://img.shields.io/badge/SLSA-build_provenance-D6248A.svg)](https://slsa.dev)
+[![PyPI attestations](https://img.shields.io/badge/PyPI-attestations_(PEP_740)-D6248A.svg)](https://pypi.org/project/proofbundle/)
 </div>
 **At a glance:** `proofbundle emit` signs and anchors a payload; `proofbundle
 verify` checks one self-contained `bundle.json` with three offline cryptographic
-checks → `OK` or `FAILED`. No network, no daemon, no own crypto. 62 tests.
+checks → `OK` or `FAILED`. No network, no daemon, no own crypto. 63 tests.
 ## Contents
@@ -286,13 +288,19 @@ commitments — it does **not** prove the evaluation was well designed or that t
 itself is correct. Those are human judgements; what it removes is the need to simply
 trust the number.
-### Since v0.5: framework adapter, in-toto, selective disclosure
+### A verification layer for trustworthy eval logs
-- **inspect_ai adapter** (`pip install "proofbundle[inspect]"`) reads a UK AISI
-  [inspect_ai](https://github.com/UKGovernmentBEIS/inspect_ai) eval log via the stable
-  `read_eval_log` API (lazy import; the core stays dependency-free) and maps it to a claim.
-  `proofbundle.adapters.from_lm_eval_results` reads lm-evaluation-harness `results.json`
-  without importing anything.
+The UK AISI inspect_ai team names an open gap ([arXiv:2507.06893](https://arxiv.org/abs/2507.06893)):
+a database of trustworthy evaluation results with proper provenance tracking. proofbundle is the
+missing **signature + selective-disclosure layer** for exactly that — complementary to metadata
+aggregation (Every Eval Ever) and documentation taxonomies (Eval Factsheets), not a competitor.
+See [INTEROP.md](INTEROP.md) for how it maps to OpenSSF Model Signing, CycloneDX ML-BOM, and in-toto.
+- **Two framework adapters** — `pip install "proofbundle[inspect]"` reads a UK AISI
+  [inspect_ai](https://github.com/UKGovernmentBEIS/inspect_ai) eval log via the stable `read_eval_log`
+  API (lazy import). `proofbundle.adapters.from_lm_eval_results` reads a real EleutherAI
+  [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) `results_*.json` (the
+  genuine `acc,none` filter-suffix format) and captures run provenance — no framework import either way.
 - **in-toto Statement v1** — `proofbundle.intoto.to_intoto_statement(claim, root_b64=…)`
   emits the receipt as an in-toto statement with a self-hosted predicate type. The subject
   digest is an *honest salted commitment* under a custom key, never `sha256` (see
@@ -303,6 +311,9 @@ trust the number.
   bundle payload is the source of truth; the SD-JWT is a derived, bundle-bound view, verified
   by proofbundle's own verifier **and** the `sd-jwt-python` reference.
+Every release ships **PEP 740 attestations** (Trusted Publishing) + an SLSA build-provenance
+attestation — see [SECURITY.md](SECURITY.md).
 ## Roadmap
 - **v0.1** — the offline verifier plus a real example bundle.
@@ -310,8 +321,9 @@ trust the number.
 - **v0.3** — external RFC 6962 conformance vectors + real Sigstore Rekor interop.
 - **v0.4** — the eval-receipt emitter (`emit_eval_receipt` / `proofbundle emit-eval`),
   salted commitments, issuer binding.
-- **v0.5 (current release)** — inspect_ai adapter (stable API), in-toto Statement v1 view,
-  and SD-JWT **issuance** per RFC 9901 (selective disclosure of the exact score).
+- **v0.5** — inspect_ai adapter (stable API), in-toto Statement v1 view, SD-JWT **issuance** (RFC 9901).
+- **v0.6 (current release)** — a second eval adapter (lm-evaluation-harness, real format + provenance),
+  INTEROP.md, CITATION.cff, PEP 740 attestations documented.
 - **Deferred** (explicitly not yet built) — SD-JWT VC conformance + `vct` metadata,
   Key-Binding JWT, status lists / revocation, an official in-toto PR, DSSE / a full in-toto client.

{proofbundle-0.5.0 → proofbundle-0.6.0}/README.md RENAMED Viewed

@@ -12,17 +12,19 @@ signed and anchored in a tamper-evident log — and optionally carries a
 selectively disclosable credential. Pure Python, no server, no daemon, one JSON file.**
 [![CI](https://github.com/b7n0de/proofbundle/actions/workflows/ci.yml/badge.svg)](https://github.com/b7n0de/proofbundle/actions/workflows/ci.yml)
-[![PyPI](https://img.shields.io/pypi/v/proofbundle.svg?color=D6248A)](https://pypi.org/project/proofbundle/)
-[![Python](https://img.shields.io/pypi/pyversions/proofbundle.svg?color=D6248A)](https://pypi.org/project/proofbundle/)
+[![PyPI](https://img.shields.io/pypi/v/proofbundle.svg?color=D6248A&cacheSeconds=3600)](https://pypi.org/project/proofbundle/)
+[![Python](https://img.shields.io/pypi/pyversions/proofbundle.svg?color=D6248A&cacheSeconds=3600)](https://pypi.org/project/proofbundle/)
+[![Downloads](https://static.pepy.tech/badge/proofbundle)](https://pepy.tech/project/proofbundle)
 [![License: MIT](https://img.shields.io/badge/license-MIT-D6248A.svg)](LICENSE)
 [![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)
 [![SLSA build provenance](https://img.shields.io/badge/SLSA-build_provenance-D6248A.svg)](https://slsa.dev)
+[![PyPI attestations](https://img.shields.io/badge/PyPI-attestations_(PEP_740)-D6248A.svg)](https://pypi.org/project/proofbundle/)
 </div>
 **At a glance:** `proofbundle emit` signs and anchors a payload; `proofbundle
 verify` checks one self-contained `bundle.json` with three offline cryptographic
-checks → `OK` or `FAILED`. No network, no daemon, no own crypto. 62 tests.
+checks → `OK` or `FAILED`. No network, no daemon, no own crypto. 63 tests.
 ## Contents
@@ -243,13 +245,19 @@ commitments — it does **not** prove the evaluation was well designed or that t
 itself is correct. Those are human judgements; what it removes is the need to simply
 trust the number.
-### Since v0.5: framework adapter, in-toto, selective disclosure
+### A verification layer for trustworthy eval logs
-- **inspect_ai adapter** (`pip install "proofbundle[inspect]"`) reads a UK AISI
-  [inspect_ai](https://github.com/UKGovernmentBEIS/inspect_ai) eval log via the stable
-  `read_eval_log` API (lazy import; the core stays dependency-free) and maps it to a claim.
-  `proofbundle.adapters.from_lm_eval_results` reads lm-evaluation-harness `results.json`
-  without importing anything.
+The UK AISI inspect_ai team names an open gap ([arXiv:2507.06893](https://arxiv.org/abs/2507.06893)):
+a database of trustworthy evaluation results with proper provenance tracking. proofbundle is the
+missing **signature + selective-disclosure layer** for exactly that — complementary to metadata
+aggregation (Every Eval Ever) and documentation taxonomies (Eval Factsheets), not a competitor.
+See [INTEROP.md](INTEROP.md) for how it maps to OpenSSF Model Signing, CycloneDX ML-BOM, and in-toto.
+- **Two framework adapters** — `pip install "proofbundle[inspect]"` reads a UK AISI
+  [inspect_ai](https://github.com/UKGovernmentBEIS/inspect_ai) eval log via the stable `read_eval_log`
+  API (lazy import). `proofbundle.adapters.from_lm_eval_results` reads a real EleutherAI
+  [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) `results_*.json` (the
+  genuine `acc,none` filter-suffix format) and captures run provenance — no framework import either way.
 - **in-toto Statement v1** — `proofbundle.intoto.to_intoto_statement(claim, root_b64=…)`
   emits the receipt as an in-toto statement with a self-hosted predicate type. The subject
   digest is an *honest salted commitment* under a custom key, never `sha256` (see
@@ -260,6 +268,9 @@ trust the number.
   bundle payload is the source of truth; the SD-JWT is a derived, bundle-bound view, verified
   by proofbundle's own verifier **and** the `sd-jwt-python` reference.
+Every release ships **PEP 740 attestations** (Trusted Publishing) + an SLSA build-provenance
+attestation — see [SECURITY.md](SECURITY.md).
 ## Roadmap
 - **v0.1** — the offline verifier plus a real example bundle.
@@ -267,8 +278,9 @@ trust the number.
 - **v0.3** — external RFC 6962 conformance vectors + real Sigstore Rekor interop.
 - **v0.4** — the eval-receipt emitter (`emit_eval_receipt` / `proofbundle emit-eval`),
   salted commitments, issuer binding.
-- **v0.5 (current release)** — inspect_ai adapter (stable API), in-toto Statement v1 view,
-  and SD-JWT **issuance** per RFC 9901 (selective disclosure of the exact score).
+- **v0.5** — inspect_ai adapter (stable API), in-toto Statement v1 view, SD-JWT **issuance** (RFC 9901).
+- **v0.6 (current release)** — a second eval adapter (lm-evaluation-harness, real format + provenance),
+  INTEROP.md, CITATION.cff, PEP 740 attestations documented.
 - **Deferred** (explicitly not yet built) — SD-JWT VC conformance + `vct` metadata,
   Key-Binding JWT, status lists / revocation, an official in-toto PR, DSSE / a full in-toto client.

{proofbundle-0.5.0 → proofbundle-0.6.0}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "proofbundle"
-version = "0.5.0"
+version = "0.6.0"
 description = "Emit and verify portable cryptographic evidence bundles, offline: Ed25519 + RFC 6962 Merkle + optional SD-JWT."
 readme = "README.md"
 requires-python = ">=3.9"

{proofbundle-0.5.0 → proofbundle-0.6.0}/src/proofbundle/__init__.py RENAMED Viewed

@@ -13,7 +13,7 @@ from .emit import emit_bundle, generate_signer
 from .errors import Check, ProofBundleError, VerificationResult
 from .merkle import verify_consistency, verify_inclusion
-__version__ = "0.5.0"
+__version__ = "0.6.0"
 __all__ = [
     "__version__",

proofbundle-0.6.0/src/proofbundle/adapters/lm_eval.py ADDED Viewed

@@ -0,0 +1,76 @@
+"""Adapter for EleutherAI lm-evaluation-harness results_*.json (file-based, NO lm_eval import).
+Parses the exported result JSON only — no runtime dependency on lm_eval, no runner rebuild.
+Real 0.4.x format (validated against a genuine harness run, see tests/fixtures/lm_eval_arc_easy_real.json):
+the metric keys carry a *filter suffix*, e.g. `"acc,none"`, and the standard error is a **sibling** key
+`"acc_stderr,none"` (not nested). So a caller asking for metric `"acc"` is matched against `"acc,none"`
+(or `"acc,<filter>"`). Provenance (git_hash, harness/task version, n-shot) is copied into the receipt's
+optional `provenance` field so a verifier can trace exactly which run produced it.
+"""
+from __future__ import annotations
+import json
+from pathlib import Path
+from typing import Optional
+from ..evalclaim import build_eval_claim
+def _find_metric(res: dict, metric: str):
+    """Return (value, stderr, matched_key) for `metric`, handling the `metric,<filter>` suffix format.
+    Prefers an exact `metric` key, then `metric,none`, then any `metric,<filter>`. The stderr sibling is
+    `metric_stderr,<same filter>`."""
+    if metric in res:                       # bare key (older/simple exports)
+        stderr = res.get(f"{metric}_stderr")
+        return res[metric], stderr, metric
+    if f"{metric},none" in res:
+        return res[f"{metric},none"], res.get(f"{metric}_stderr,none"), f"{metric},none"
+    for key in res:                         # any filter, e.g. metric,custom-filter
+        if key == metric or (key.startswith(f"{metric},") and not key.startswith(f"{metric}_stderr")):
+            flt = key.split(",", 1)[1] if "," in key else "none"
+            return res[key], res.get(f"{metric}_stderr,{flt}"), key
+    return None, None, None
+def from_lm_eval_results(path, task: str, metric: str, *, comparator: str, threshold: str,
+                         timestamp: str, model_salt: Optional[bytes] = None,
+                         dataset_salt: Optional[bytes] = None):
+    """Read an lm-evaluation-harness results_*.json and build an eval claim for `task`/`metric`.
+    `metric` is the bare name (e.g. "acc"); the real key may be "acc,none". The score is read as a STRING
+    to avoid float canonicalization issues. Returns (claim, salts).
+    """
+    data = json.loads(Path(path).read_text(encoding="utf-8"))
+    res = data.get("results", {}).get(task)
+    if res is None:
+        raise ValueError(f"task not found in results: {task!r}")
+    value, stderr, matched = _find_metric(res, metric)
+    if value is None:
+        raise ValueError(f"metric {metric!r} not found in results[{task!r}] "
+                         f"(available: {sorted(k for k in res if ',' in k)})")
+    score = value if isinstance(value, str) else repr(value)
+    n_samples = data.get("n-samples", {}).get(task, {})
+    n = int(n_samples.get("effective") or n_samples.get("original") or res.get("sample_len") or 0)
+    cfg = data.get("config", {})
+    model_id = str(cfg.get("model_name") or cfg.get("model") or "unknown")
+    if cfg.get("model_args"):
+        model_id = f"{model_id}::{cfg['model_args']}"   # include args so the commitment pins the exact model
+    provenance = {"harness": "lm-evaluation-harness", "matched_metric_key": matched}
+    if data.get("git_hash"):
+        provenance["git_hash"] = str(data["git_hash"])
+    if data.get("versions", {}).get(task) is not None:
+        provenance["task_version"] = str(data["versions"][task])
+    if data.get("n-shot", {}).get(task) is not None:
+        provenance["n_shot"] = str(data["n-shot"][task])
+    if stderr is not None:
+        provenance["stderr"] = repr(stderr) if not isinstance(stderr, str) else stderr
+    return build_eval_claim(
+        suite=task, suite_version=str(data.get("versions", {}).get(task, "lm-eval")),
+        metric=metric, comparator=comparator, threshold=threshold, score=str(score), n=n,
+        model_id=model_id, dataset_id=task, issuer="", timestamp=timestamp,
+        provenance=provenance, model_salt=model_salt, dataset_salt=dataset_salt)

{proofbundle-0.5.0 → proofbundle-0.6.0}/src/proofbundle/evalclaim.py RENAMED Viewed

@@ -37,7 +37,7 @@ _MAX_SAFE_INT = 2 ** 53 - 1
 # The exact key set of an eval claim; decode/validate reject anything else.
 _REQUIRED = {"schema", "suite", "suite_version", "metric", "comparator", "threshold",
              "passed", "n", "model_id_commit", "dataset_id_commit", "commit_alg", "issuer", "timestamp"}
-_OPTIONAL = {"context_binding", "ci95", "multiple_testing", "prereg_sha256"}
+_OPTIONAL = {"context_binding", "ci95", "multiple_testing", "prereg_sha256", "provenance"}
 __all__ = [
     "EVAL_CLAIM_SCHEMA", "COMMIT_ALG", "canonicalize", "build_eval_claim",
@@ -126,7 +126,7 @@ def build_eval_claim(*, suite: str, suite_version: str, metric: str, comparator:
                      threshold: str, score: str, n: int, model_id: str, dataset_id: str,
                      issuer: str, timestamp: str, context_binding: Optional[str] = None,
                      ci95: Optional[Sequence[str]] = None, multiple_testing: Optional[str] = None,
-                     prereg_sha256: Optional[str] = None,
+                     prereg_sha256: Optional[str] = None, provenance: Optional[dict] = None,
                      model_salt: Optional[bytes] = None, dataset_salt: Optional[bytes] = None):
     """Build a valid eval claim from raw values. Computes `passed` ITSELF from the comparator
     (never trusts the caller), creates salted commitments, and returns (claim, salts) with the
@@ -163,6 +163,8 @@ def build_eval_claim(*, suite: str, suite_version: str, metric: str, comparator:
         claim["multiple_testing"] = multiple_testing
     if prereg_sha256 is not None:
         claim["prereg_sha256"] = prereg_sha256
+    if provenance is not None:
+        claim["provenance"] = provenance
     _reject_non_jcs(claim)
     return claim, {"model_salt": m_salt, "dataset_salt": d_salt}

{proofbundle-0.5.0 → proofbundle-0.6.0/src/proofbundle.egg-info}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: proofbundle
-Version: 0.5.0
+Version: 0.6.0
 Summary: Emit and verify portable cryptographic evidence bundles, offline: Ed25519 + RFC 6962 Merkle + optional SD-JWT.
 Author: Konrad Gruszka
 License: MIT
@@ -55,17 +55,19 @@ signed and anchored in a tamper-evident log — and optionally carries a
 selectively disclosable credential. Pure Python, no server, no daemon, one JSON file.**
 [![CI](https://github.com/b7n0de/proofbundle/actions/workflows/ci.yml/badge.svg)](https://github.com/b7n0de/proofbundle/actions/workflows/ci.yml)
-[![PyPI](https://img.shields.io/pypi/v/proofbundle.svg?color=D6248A)](https://pypi.org/project/proofbundle/)
-[![Python](https://img.shields.io/pypi/pyversions/proofbundle.svg?color=D6248A)](https://pypi.org/project/proofbundle/)
+[![PyPI](https://img.shields.io/pypi/v/proofbundle.svg?color=D6248A&cacheSeconds=3600)](https://pypi.org/project/proofbundle/)
+[![Python](https://img.shields.io/pypi/pyversions/proofbundle.svg?color=D6248A&cacheSeconds=3600)](https://pypi.org/project/proofbundle/)
+[![Downloads](https://static.pepy.tech/badge/proofbundle)](https://pepy.tech/project/proofbundle)
 [![License: MIT](https://img.shields.io/badge/license-MIT-D6248A.svg)](LICENSE)
 [![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)
 [![SLSA build provenance](https://img.shields.io/badge/SLSA-build_provenance-D6248A.svg)](https://slsa.dev)
+[![PyPI attestations](https://img.shields.io/badge/PyPI-attestations_(PEP_740)-D6248A.svg)](https://pypi.org/project/proofbundle/)
 </div>
 **At a glance:** `proofbundle emit` signs and anchors a payload; `proofbundle
 verify` checks one self-contained `bundle.json` with three offline cryptographic
-checks → `OK` or `FAILED`. No network, no daemon, no own crypto. 62 tests.
+checks → `OK` or `FAILED`. No network, no daemon, no own crypto. 63 tests.
 ## Contents
@@ -286,13 +288,19 @@ commitments — it does **not** prove the evaluation was well designed or that t
 itself is correct. Those are human judgements; what it removes is the need to simply
 trust the number.
-### Since v0.5: framework adapter, in-toto, selective disclosure
+### A verification layer for trustworthy eval logs
-- **inspect_ai adapter** (`pip install "proofbundle[inspect]"`) reads a UK AISI
-  [inspect_ai](https://github.com/UKGovernmentBEIS/inspect_ai) eval log via the stable
-  `read_eval_log` API (lazy import; the core stays dependency-free) and maps it to a claim.
-  `proofbundle.adapters.from_lm_eval_results` reads lm-evaluation-harness `results.json`
-  without importing anything.
+The UK AISI inspect_ai team names an open gap ([arXiv:2507.06893](https://arxiv.org/abs/2507.06893)):
+a database of trustworthy evaluation results with proper provenance tracking. proofbundle is the
+missing **signature + selective-disclosure layer** for exactly that — complementary to metadata
+aggregation (Every Eval Ever) and documentation taxonomies (Eval Factsheets), not a competitor.
+See [INTEROP.md](INTEROP.md) for how it maps to OpenSSF Model Signing, CycloneDX ML-BOM, and in-toto.
+- **Two framework adapters** — `pip install "proofbundle[inspect]"` reads a UK AISI
+  [inspect_ai](https://github.com/UKGovernmentBEIS/inspect_ai) eval log via the stable `read_eval_log`
+  API (lazy import). `proofbundle.adapters.from_lm_eval_results` reads a real EleutherAI
+  [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) `results_*.json` (the
+  genuine `acc,none` filter-suffix format) and captures run provenance — no framework import either way.
 - **in-toto Statement v1** — `proofbundle.intoto.to_intoto_statement(claim, root_b64=…)`
   emits the receipt as an in-toto statement with a self-hosted predicate type. The subject
   digest is an *honest salted commitment* under a custom key, never `sha256` (see
@@ -303,6 +311,9 @@ trust the number.
   bundle payload is the source of truth; the SD-JWT is a derived, bundle-bound view, verified
   by proofbundle's own verifier **and** the `sd-jwt-python` reference.
+Every release ships **PEP 740 attestations** (Trusted Publishing) + an SLSA build-provenance
+attestation — see [SECURITY.md](SECURITY.md).
 ## Roadmap
 - **v0.1** — the offline verifier plus a real example bundle.
@@ -310,8 +321,9 @@ trust the number.
 - **v0.3** — external RFC 6962 conformance vectors + real Sigstore Rekor interop.
 - **v0.4** — the eval-receipt emitter (`emit_eval_receipt` / `proofbundle emit-eval`),
   salted commitments, issuer binding.
-- **v0.5 (current release)** — inspect_ai adapter (stable API), in-toto Statement v1 view,
-  and SD-JWT **issuance** per RFC 9901 (selective disclosure of the exact score).
+- **v0.5** — inspect_ai adapter (stable API), in-toto Statement v1 view, SD-JWT **issuance** (RFC 9901).
+- **v0.6 (current release)** — a second eval adapter (lm-evaluation-harness, real format + provenance),
+  INTEROP.md, CITATION.cff, PEP 740 attestations documented.
 - **Deferred** (explicitly not yet built) — SD-JWT VC conformance + `vct` metadata,
   Key-Binding JWT, status lists / revocation, an official in-toto PR, DSSE / a full in-toto client.

{proofbundle-0.5.0 → proofbundle-0.6.0}/tests/test_adapters.py RENAMED Viewed

@@ -9,15 +9,23 @@ TS = "2026-07-01T12:00:00Z"
 class TestAdapters(unittest.TestCase):
-    def test_lm_eval(self):
-        claim, salts = from_lm_eval_results(FX / "lm_eval_results.json", "hellaswag", "acc",
-                                            comparator=">=", threshold="0.70", timestamp=TS,
+    def test_lm_eval_real_acc_none_format(self):
+        # REAL lm-evaluation-harness 0.4.12 export: metric key is "acc,none", stderr sibling "acc_stderr,none".
+        claim, salts = from_lm_eval_results(FX / "lm_eval_arc_easy_real.json", "arc_easy", "acc",
+                                            comparator=">=", threshold="0.30", timestamp=TS,
                                             model_salt=b"0" * 16, dataset_salt=b"1" * 16)
-        self.assertEqual(claim["suite"], "hellaswag")
-        self.assertEqual(claim["threshold"], "0.70")
-        self.assertTrue(claim["passed"])              # 0.7534 >= 0.70
-        self.assertNotIn("acme/model-x", str(claim))  # id only as salted commitment
-        self.assertEqual(claim["n"], 10042)
+        self.assertEqual(claim["suite"], "arc_easy")
+        self.assertTrue(claim["passed"])                       # acc 0.5 >= 0.30
+        self.assertEqual(claim["provenance"]["matched_metric_key"], "acc,none")  # suffix handled
+        self.assertIn("git_hash", claim["provenance"])         # provenance captured
+        self.assertEqual(claim["provenance"]["n_shot"], "0")
+        self.assertIn("stderr", claim["provenance"])           # sibling stderr, not nested
+    def test_lm_eval_missing_metric_lists_available(self):
+        with self.assertRaises(ValueError):
+            from_lm_eval_results(FX / "lm_eval_arc_easy_real.json", "arc_easy", "nonexistent",
+                                 comparator=">=", threshold="0.5", timestamp=TS,
+                                 model_salt=b"0" * 16, dataset_salt=b"1" * 16)
     def test_inspect_ai_stable_api(self):
         # Real .eval log fixture, read via the stable inspect_ai.log.read_eval_log API (proofbundle[inspect]).

proofbundle-0.5.0/src/proofbundle/adapters/lm_eval.py DELETED Viewed

@@ -1,32 +0,0 @@
-"""Adapter for EleutherAI lm-evaluation-harness results.json (file-based, no framework import)."""
-from __future__ import annotations
-import json
-from pathlib import Path
-from typing import Optional
-from ..evalclaim import build_eval_claim
-def from_lm_eval_results(path, task: str, metric: str, *, comparator: str, threshold: str,
-                         timestamp: str, model_salt: Optional[bytes] = None,
-                         dataset_salt: Optional[bytes] = None):
-    """Read an lm-evaluation-harness results.json and build an eval claim for `task`/`metric`.
-    Expects the standard shape: {"results": {task: {metric: <number>, ...}, ...},
-    "n-samples": {task: {"effective": n}}, "config"/"model_name": ...}. The score is read as a
-    STRING to avoid float canonicalization issues. Returns (claim, salts).
-    """
-    data = json.loads(Path(path).read_text(encoding="utf-8"))
-    res = data.get("results", {}).get(task)
-    if res is None or metric not in res:
-        raise ValueError(f"task/metric not found in results: {task}/{metric}")
-    score = repr(res[metric]) if not isinstance(res[metric], str) else res[metric]
-    n = int(data.get("n-samples", {}).get(task, {}).get("effective")
-            or data.get("n-samples", {}).get(task, {}).get("original") or 0)
-    model_id = str(data.get("model_name") or data.get("config", {}).get("model") or "unknown")
-    return build_eval_claim(
-        suite=task, suite_version=str(data.get("config", {}).get("model_source", "lm-eval")),
-        metric=metric, comparator=comparator, threshold=threshold, score=str(score), n=n,
-        model_id=model_id, dataset_id=task, issuer="", timestamp=timestamp,
-        model_salt=model_salt, dataset_salt=dataset_salt)

{proofbundle-0.5.0 → proofbundle-0.6.0}/LICENSE RENAMED Viewed

File without changes

{proofbundle-0.5.0 → proofbundle-0.6.0}/setup.cfg RENAMED Viewed

File without changes

{proofbundle-0.5.0 → proofbundle-0.6.0}/src/proofbundle/adapters/__init__.py RENAMED Viewed

File without changes

{proofbundle-0.5.0 → proofbundle-0.6.0}/src/proofbundle/adapters/inspect_ai.py RENAMED Viewed

File without changes

{proofbundle-0.5.0 → proofbundle-0.6.0}/src/proofbundle/bundle.py RENAMED Viewed

File without changes

{proofbundle-0.5.0 → proofbundle-0.6.0}/src/proofbundle/cli.py RENAMED Viewed

File without changes

{proofbundle-0.5.0 → proofbundle-0.6.0}/src/proofbundle/emit.py RENAMED Viewed

File without changes

{proofbundle-0.5.0 → proofbundle-0.6.0}/src/proofbundle/errors.py RENAMED Viewed

File without changes

{proofbundle-0.5.0 → proofbundle-0.6.0}/src/proofbundle/intoto.py RENAMED Viewed

File without changes

{proofbundle-0.5.0 → proofbundle-0.6.0}/src/proofbundle/merkle.py RENAMED Viewed

File without changes

{proofbundle-0.5.0 → proofbundle-0.6.0}/src/proofbundle/py.typed RENAMED Viewed

File without changes

{proofbundle-0.5.0 → proofbundle-0.6.0}/src/proofbundle/sdjwt.py RENAMED Viewed

File without changes

{proofbundle-0.5.0 → proofbundle-0.6.0}/src/proofbundle/sdjwt_issue.py RENAMED Viewed

File without changes

{proofbundle-0.5.0 → proofbundle-0.6.0}/src/proofbundle/signature.py RENAMED Viewed

File without changes

{proofbundle-0.5.0 → proofbundle-0.6.0}/src/proofbundle.egg-info/SOURCES.txt RENAMED Viewed

File without changes

{proofbundle-0.5.0 → proofbundle-0.6.0}/src/proofbundle.egg-info/dependency_links.txt RENAMED Viewed

File without changes

{proofbundle-0.5.0 → proofbundle-0.6.0}/src/proofbundle.egg-info/entry_points.txt RENAMED Viewed

File without changes

{proofbundle-0.5.0 → proofbundle-0.6.0}/src/proofbundle.egg-info/requires.txt RENAMED Viewed

File without changes

{proofbundle-0.5.0 → proofbundle-0.6.0}/src/proofbundle.egg-info/top_level.txt RENAMED Viewed

File without changes

{proofbundle-0.5.0 → proofbundle-0.6.0}/tests/test_bundle.py RENAMED Viewed

File without changes

{proofbundle-0.5.0 → proofbundle-0.6.0}/tests/test_cli.py RENAMED Viewed

File without changes

{proofbundle-0.5.0 → proofbundle-0.6.0}/tests/test_cli_eval.py RENAMED Viewed

File without changes

{proofbundle-0.5.0 → proofbundle-0.6.0}/tests/test_emit.py RENAMED Viewed

File without changes

{proofbundle-0.5.0 → proofbundle-0.6.0}/tests/test_eval_claim_schema.py RENAMED Viewed

File without changes

{proofbundle-0.5.0 → proofbundle-0.6.0}/tests/test_evalclaim.py RENAMED Viewed

File without changes

{proofbundle-0.5.0 → proofbundle-0.6.0}/tests/test_intoto.py RENAMED Viewed

File without changes

{proofbundle-0.5.0 → proofbundle-0.6.0}/tests/test_merkle.py RENAMED Viewed

File without changes

{proofbundle-0.5.0 → proofbundle-0.6.0}/tests/test_merkle_property.py RENAMED Viewed

File without changes

{proofbundle-0.5.0 → proofbundle-0.6.0}/tests/test_rekor_interop.py RENAMED Viewed

File without changes

{proofbundle-0.5.0 → proofbundle-0.6.0}/tests/test_rfc6962_external_vectors.py RENAMED Viewed

File without changes

{proofbundle-0.5.0 → proofbundle-0.6.0}/tests/test_schema.py RENAMED Viewed

File without changes

{proofbundle-0.5.0 → proofbundle-0.6.0}/tests/test_sdjwt_issue.py RENAMED Viewed

File without changes

{proofbundle-0.5.0 → proofbundle-0.6.0}/tests/test_sdjwt_reference.py RENAMED Viewed

File without changes

{proofbundle-0.5.0 → proofbundle-0.6.0}/tests/test_signature.py RENAMED Viewed

File without changes

proofbundle 0.5.0__tar.gz → 0.6.0__tar.gz

proofbundle 0.5.0tar.gz → 0.6.0tar.gz