PyPI - proofbundle - Versions diffs - 1.0.0__tar.gz → 1.1.0__tar.gz - Mend

proofbundle 1.0.0tar.gz → 1.1.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (58) hide show

{proofbundle-1.0.0/src/proofbundle.egg-info → proofbundle-1.1.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: proofbundle
-Version: 1.0.0
+Version: 1.1.0
 Summary: Emit and verify portable cryptographic evidence bundles, offline: Ed25519 + RFC 6962 Merkle + optional SD-JWT.
 Author: Konrad Gruszka
 License: MIT
@@ -111,6 +111,33 @@ disclosable receipt. The verifier shipped first, small and correct, so it could
 be reviewed and trusted on its own; `emit_bundle` now creates bundles that
 `verify_bundle` accepts, fully offline on both sides.
+## What a receipt proves (and what it does not)
+A receipt is a **tamper-evident, signed statement of authorship and integrity** over an eval or test result —
+not a proof that the number is *true* or that the evaluation was well designed. Hold these apart:
+- **It proves:** the payload was signed by the stated issuer (authorship), no byte changed since (integrity,
+  Ed25519 + RFC 6962), the model/dataset behind salted commitments, and — since v1.1 — the **assurance level**
+  is signed in — tamper-evident and bound to the issuer, so a third party cannot alter it. `show-eval`
+  displays the level, warns on the weakest combination (self_attested with no pre-registration), and shows
+  withheld SD-JWT fields + receipt age; the `verify_commitment` library call (the holder presents the
+  identifier + salt out of band) makes a model-swap visible.
+- **It does not prove:** that a *self-attested* issuer is honest. The level is issuer-DECLARED: a dishonest
+  issuer can sign `reproduced` on a self-run eval — the signature binds *who claimed it* to them, it does not
+  make the claim true (same as the score). The warning catches the honest self_attested case; a higher level
+  is only as trustworthy as the process behind it.
+- **Also not proven:** that a result was not cherry-picked from many runs without pre-registration, or that
+  the suite measures what it claims. Those need a pre-registered protocol or independent reproduction.
+| assurance_level | meaning |
+|---|---|
+| `self_attested` | issuer ran + signed it (default); trust rests on the issuer |
+| `third_party` | a third party checked before signing |
+| `reproduced` | independently re-run and matched |
+| `enclave_attested` | produced in an attested trusted execution environment |
+Full detail: **[THREAT_MODEL.md](THREAT_MODEL.md)** — what `verify` catches and what it structurally cannot.
 ## What it verifies
 A bundle is a single JSON document. `proofbundle` checks, offline:
@@ -408,8 +435,10 @@ attestation — see [SECURITY.md](SECURITY.md).
   a sharpened honesty guardrail (authenticity/integrity, not computation-correctness), and outreach drafts.
 - **v0.9** — the standards moat: a DSSE-signed in-toto `test-result` export, a C2SP tlog-checkpoint over
   the RFC 6962 root, an Every Eval Ever converter, and standards-native repositioning.
-- **v1.0 (current release)** — distribution: opt-in framework integrations that auto-emit a signed receipt
-  of an inspect_ai eval (end-of-task hook) or a pytest run (pytest11 plugin), plus a composite GitHub Action.
+- **v1.0** — distribution: opt-in framework integrations that auto-emit a signed receipt of an inspect_ai
+  eval (end-of-task hook) or a pytest run (pytest11 plugin), plus a composite GitHub Action.
+- **v1.1 (current release)** — trust hardening: a signed `assurance_level`, a THREAT_MODEL, a self_attested-
+  without-prereg warning, model-swap + replay + withheld-field checks, and an adversarial No-Fake-PASS suite.
 - **Deferred** (explicitly not yet built) — SD-JWT VC conformance + `vct` metadata,
   Key-Binding JWT, status lists / revocation, an official in-toto PR, DSSE / a full in-toto client.

{proofbundle-1.0.0 → proofbundle-1.1.0}/README.md RENAMED Viewed

@@ -66,6 +66,33 @@ disclosable receipt. The verifier shipped first, small and correct, so it could
 be reviewed and trusted on its own; `emit_bundle` now creates bundles that
 `verify_bundle` accepts, fully offline on both sides.
+## What a receipt proves (and what it does not)
+A receipt is a **tamper-evident, signed statement of authorship and integrity** over an eval or test result —
+not a proof that the number is *true* or that the evaluation was well designed. Hold these apart:
+- **It proves:** the payload was signed by the stated issuer (authorship), no byte changed since (integrity,
+  Ed25519 + RFC 6962), the model/dataset behind salted commitments, and — since v1.1 — the **assurance level**
+  is signed in — tamper-evident and bound to the issuer, so a third party cannot alter it. `show-eval`
+  displays the level, warns on the weakest combination (self_attested with no pre-registration), and shows
+  withheld SD-JWT fields + receipt age; the `verify_commitment` library call (the holder presents the
+  identifier + salt out of band) makes a model-swap visible.
+- **It does not prove:** that a *self-attested* issuer is honest. The level is issuer-DECLARED: a dishonest
+  issuer can sign `reproduced` on a self-run eval — the signature binds *who claimed it* to them, it does not
+  make the claim true (same as the score). The warning catches the honest self_attested case; a higher level
+  is only as trustworthy as the process behind it.
+- **Also not proven:** that a result was not cherry-picked from many runs without pre-registration, or that
+  the suite measures what it claims. Those need a pre-registered protocol or independent reproduction.
+| assurance_level | meaning |
+|---|---|
+| `self_attested` | issuer ran + signed it (default); trust rests on the issuer |
+| `third_party` | a third party checked before signing |
+| `reproduced` | independently re-run and matched |
+| `enclave_attested` | produced in an attested trusted execution environment |
+Full detail: **[THREAT_MODEL.md](THREAT_MODEL.md)** — what `verify` catches and what it structurally cannot.
 ## What it verifies
 A bundle is a single JSON document. `proofbundle` checks, offline:
@@ -363,8 +390,10 @@ attestation — see [SECURITY.md](SECURITY.md).
   a sharpened honesty guardrail (authenticity/integrity, not computation-correctness), and outreach drafts.
 - **v0.9** — the standards moat: a DSSE-signed in-toto `test-result` export, a C2SP tlog-checkpoint over
   the RFC 6962 root, an Every Eval Ever converter, and standards-native repositioning.
-- **v1.0 (current release)** — distribution: opt-in framework integrations that auto-emit a signed receipt
-  of an inspect_ai eval (end-of-task hook) or a pytest run (pytest11 plugin), plus a composite GitHub Action.
+- **v1.0** — distribution: opt-in framework integrations that auto-emit a signed receipt of an inspect_ai
+  eval (end-of-task hook) or a pytest run (pytest11 plugin), plus a composite GitHub Action.
+- **v1.1 (current release)** — trust hardening: a signed `assurance_level`, a THREAT_MODEL, a self_attested-
+  without-prereg warning, model-swap + replay + withheld-field checks, and an adversarial No-Fake-PASS suite.
 - **Deferred** (explicitly not yet built) — SD-JWT VC conformance + `vct` metadata,
   Key-Binding JWT, status lists / revocation, an official in-toto PR, DSSE / a full in-toto client.

{proofbundle-1.0.0 → proofbundle-1.1.0}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "proofbundle"
-version = "1.0.0"
+version = "1.1.0"
 description = "Emit and verify portable cryptographic evidence bundles, offline: Ed25519 + RFC 6962 Merkle + optional SD-JWT."
 readme = "README.md"
 requires-python = ">=3.9"

{proofbundle-1.0.0 → proofbundle-1.1.0}/src/proofbundle/__init__.py RENAMED Viewed

@@ -13,7 +13,7 @@ from __future__ import annotations
 from typing import TYPE_CHECKING
-__version__ = "1.0.0"
+__version__ = "1.1.0"
 __all__ = [
     "__version__",

{proofbundle-1.0.0 → proofbundle-1.1.0}/src/proofbundle/cli.py RENAMED Viewed

@@ -48,7 +48,9 @@ def _cmd_emit_eval(args: argparse.Namespace) -> int:
 def _cmd_show_eval(args: argparse.Namespace) -> int:
-    from .evalclaim import decode_eval_claim  # noqa: PLC0415
+    from .evalclaim import (  # noqa: PLC0415
+        DEFAULT_ASSURANCE, check_freshness, claim_warnings, decode_eval_claim, sd_jwt_hidden_count,
+    )
     claim = decode_eval_claim(args.receipt)
     if claim is None:
         print("=> FAILED: not a valid, issuer-bound eval receipt", file=sys.stderr)
@@ -56,10 +58,19 @@ def _cmd_show_eval(args: argparse.Namespace) -> int:
     print(f"suite      {claim['suite']} ({claim['suite_version']})")
     print(f"metric     {claim['metric']} {claim['comparator']} {claim['threshold']}")
     print(f"passed     {claim['passed']}   (n={claim['n']})")
+    print(f"assurance  {claim.get('assurance_level', DEFAULT_ASSURANCE)}")
     print(f"model      commit {claim['model_id_commit']}")
     print(f"dataset    commit {claim['dataset_id_commit']}")
     print(f"issuer     {claim['issuer']}")
     print(f"timestamp  {claim['timestamp']}")
+    hidden = sd_jwt_hidden_count(args.receipt)
+    if hidden is not None:
+        print(f"sd-jwt     {hidden} field(s) withheld (selective disclosure)")
+    fresh = check_freshness(claim)
+    if fresh["parsed"]:
+        print(f"age        {fresh['age_seconds']}s")
+    for w in claim_warnings(claim):
+        print(f"WARNING    {w}")
     print("=> OK")
     return 0

{proofbundle-1.0.0 → proofbundle-1.1.0}/src/proofbundle/evalclaim.py RENAMED Viewed

@@ -1,7 +1,7 @@
 """Eval receipts (v0.4): sign + Merkle-anchor a canonical eval CLAIM.
-A receipt proves exactly one thing — *suite S scored `comparator` threshold T,
-passed=…* — carrying only SALTED commitments to the model and dataset identifiers,
+A receipt is tamper-evident signed evidence of exactly one thing — *suite S scored `comparator` threshold
+T, passed=…* — carrying only SALTED commitments to the model and dataset identifiers,
 never the weights, the data, or the plaintext names. A third party verifies the
 threshold was met, offline, from one file, without ever seeing the model or dataset.
@@ -37,14 +37,22 @@ _COMPARATORS = {">=", ">", "<=", "<"}
 _MAX_SAFE_INT = 2 ** 53 - 1
 # The published eval-claim schema's decimal pattern for threshold/score (no exponent, no sign+, no spaces).
 _DECIMAL_RE = re.compile(r"^-?[0-9]+(\.[0-9]+)?$")
+# Assurance level (v1.1): how much a PASS is worth. Signed into the claim (tamper-evident + bound to the
+# issuer, so a third party cannot alter it) — but issuer-DECLARED: a dishonest issuer can sign a higher level,
+# the signature attributes that claim to them, it does not make it true. Ordered weakest→strongest. Default
+# self_attested — the 1.0 integrations emit self-attested, and claiming more would be dishonest.
+ASSURANCE_LEVELS = ("self_attested", "third_party", "reproduced", "enclave_attested")
+DEFAULT_ASSURANCE = "self_attested"
 # The exact key set of an eval claim; decode/validate reject anything else.
 _REQUIRED = {"schema", "suite", "suite_version", "metric", "comparator", "threshold",
-             "passed", "n", "model_id_commit", "dataset_id_commit", "commit_alg", "issuer", "timestamp"}
+             "passed", "n", "model_id_commit", "dataset_id_commit", "commit_alg", "issuer", "timestamp",
+             "assurance_level"}
 _OPTIONAL = {"context_binding", "ci95", "multiple_testing", "prereg_sha256", "provenance"}
 __all__ = [
-    "EVAL_CLAIM_SCHEMA", "COMMIT_ALG", "canonicalize", "build_eval_claim",
+    "EVAL_CLAIM_SCHEMA", "COMMIT_ALG", "ASSURANCE_LEVELS", "canonicalize", "build_eval_claim",
     "emit_eval_receipt", "decode_eval_claim", "salted_commit", "issuer_fingerprint",
+    "claim_warnings", "verify_commitment", "check_freshness", "sd_jwt_hidden_count",
 ]
@@ -135,6 +143,7 @@ def build_eval_claim(*, suite: str, suite_version: str, metric: str, comparator:
                      issuer: str, timestamp: str, context_binding: Optional[str] = None,
                      ci95: Optional[Sequence[str]] = None, multiple_testing: Optional[str] = None,
                      prereg_sha256: Optional[str] = None, provenance: Optional[dict] = None,
+                     assurance_level: str = DEFAULT_ASSURANCE,
                      model_salt: Optional[bytes] = None, dataset_salt: Optional[bytes] = None):
     """Build a valid eval claim from raw values. Computes `passed` ITSELF from the comparator
     (never trusts the caller), creates salted commitments, and returns (claim, salts) with the
@@ -145,6 +154,8 @@ def build_eval_claim(*, suite: str, suite_version: str, metric: str, comparator:
     """
     if comparator not in _COMPARATORS:
         raise EvalClaimError(f"comparator must be one of {sorted(_COMPARATORS)}")
+    if assurance_level not in ASSURANCE_LEVELS:
+        raise EvalClaimError(f"assurance_level must be one of {list(ASSURANCE_LEVELS)}")
     # threshold/score must match the PUBLISHED schema's decimal pattern exactly — reject "1e2",
     # "Infinity", "+5", " 5 " etc. that Decimal() would accept but jsonschema rejects (schema-conformance).
     for name, val in (("threshold", threshold), ("score", score)):
@@ -164,7 +175,7 @@ def build_eval_claim(*, suite: str, suite_version: str, metric: str, comparator:
         "metric": metric, "comparator": comparator, "threshold": threshold, "passed": passed,
         "n": n, "model_id_commit": salted_commit(model_id, m_salt),
         "dataset_id_commit": salted_commit(dataset_id, d_salt), "commit_alg": COMMIT_ALG,
-        "issuer": issuer, "timestamp": timestamp,
+        "issuer": issuer, "timestamp": timestamp, "assurance_level": assurance_level,
     }
     if context_binding is not None:
         claim["context_binding"] = context_binding
@@ -189,6 +200,11 @@ def emit_eval_receipt(claim: dict, signer: Ed25519PrivateKey, *, prior_leaves: S
     """
     claim = dict(claim)
     claim["issuer"] = issuer_fingerprint(signer)
+    # A claim without an explicit assurance_level is self_attested — the weakest, safest default; never
+    # silently elevate. (v1.1: keeps pre-1.1 claim JSONs emittable while binding the honest level.)
+    claim.setdefault("assurance_level", DEFAULT_ASSURANCE)
+    if claim["assurance_level"] not in ASSURANCE_LEVELS:
+        raise EvalClaimError(f"assurance_level must be one of {list(ASSURANCE_LEVELS)}")
     missing = _REQUIRED - set(claim)
     if missing:
         raise EvalClaimError(f"claim missing required fields: {sorted(missing)}")
@@ -223,3 +239,86 @@ def decode_eval_claim(bundle) -> Optional[dict]:
         return claim
     except (KeyError, ValueError, EvalClaimError):
         return None
+def claim_warnings(claim: dict) -> list:
+    """Honest trust warnings for an already-verified claim (v1.1). A verified signature proves authorship +
+    integrity, NOT that the number is true or the study was pre-registered. The weakest combination —
+    self_attested with no pre-registration — is where an issuer could publish the best of many runs; surface
+    it so a strong signature never masks a weak assurance. Returns a list of human-readable strings."""
+    out = []
+    level = claim.get("assurance_level", DEFAULT_ASSURANCE)
+    if level == "self_attested" and not claim.get("prereg_sha256"):
+        out.append("self_attested with no prereg_sha256 — the weakest assurance: trust rests entirely on the "
+                   "issuer, who could publish the best of many runs. Pre-register (prereg_sha256) or use a "
+                   "higher assurance_level (reproduced / enclave_attested) to strengthen it.")
+    return out
+def verify_commitment(identifier: str, salt: bytes, commitment: str) -> bool:
+    """Check that a PRESENTED identifier (+ its salt) matches a salted commitment in a claim
+    (``model_id_commit`` / ``dataset_id_commit``). Makes a model-swap visible: a claim that silently swapped
+    the model cannot produce a matching (identifier, salt). Constant-time compare; the salt stays outside the
+    payload (the holder presents it to a verifier out of band)."""
+    try:
+        expected = salted_commit(identifier, salt)
+    except EvalClaimError:
+        return False
+    import hmac  # noqa: PLC0415
+    return hmac.compare_digest(expected, str(commitment))
+def check_freshness(claim: dict, max_age_seconds: Optional[int] = None, now=None) -> dict:
+    """Replay check (v1.1): parse the claim's timestamp and report its age. A receipt carries a timestamp but
+    verify never judged it — an old receipt could be replayed as new. Returns
+    {"parsed": bool, "age_seconds": int|None, "fresh": bool|None, "reason": str}. ``fresh`` is None when no
+    ``max_age_seconds`` bound is given (age reported, not judged). 3.9-safe ISO parsing (normalizes a 'Z')."""
+    from datetime import datetime, timezone  # noqa: PLC0415
+    ts = claim.get("timestamp")
+    if not isinstance(ts, str):
+        return {"parsed": False, "age_seconds": None, "fresh": None, "reason": "no timestamp"}
+    raw = ts[:-1] + "+00:00" if ts.endswith("Z") else ts
+    try:
+        dt = datetime.fromisoformat(raw)
+    except ValueError:
+        return {"parsed": False, "age_seconds": None, "fresh": None, "reason": f"unparseable timestamp {ts!r}"}
+    if dt.tzinfo is None:
+        dt = dt.replace(tzinfo=timezone.utc)
+    ref = now or datetime.now(timezone.utc)
+    if ref.tzinfo is None:
+        ref = ref.replace(tzinfo=timezone.utc)
+    age = int((ref - dt).total_seconds())
+    if max_age_seconds is None:
+        return {"parsed": True, "age_seconds": age, "fresh": None, "reason": f"age {age}s (no bound given)"}
+    fresh = 0 <= age <= max_age_seconds
+    return {"parsed": True, "age_seconds": age, "fresh": fresh,
+            "reason": (f"age {age}s within {max_age_seconds}s" if fresh
+                       else f"age {age}s outside [0, {max_age_seconds}]s — possible replay or clock skew")}
+def sd_jwt_hidden_count(bundle) -> Optional[int]:
+    """Number of selectively-disclosable (currently withheld) SD-JWT fields in a bundle, so that OMISSION is
+    visible: a receipt can hide claims behind the SD-JWT ``_sd`` digests. Returns the count, or None if the
+    bundle carries no SD-JWT. Reads the issuer JWT payload's ``_sd`` array without verifying the SD-JWT
+    (that is the holder/verifier's job); purely a disclosure-transparency signal."""
+    if isinstance(bundle, str):
+        bundle = load_bundle(bundle)
+    sd = bundle.get("sd_jwt_vc") if isinstance(bundle, dict) else None
+    if not sd:
+        return None
+    # the canonical bundle form (the only one verify_bundle accepts) stores the compact SD-JWT under "compact";
+    # sd_jwt/token are accepted as fallbacks for a bare token dict/string.
+    token = sd if isinstance(sd, str) else (sd.get("compact") or sd.get("sd_jwt") or sd.get("token") or "")
+    if not isinstance(token, str) or "." not in token:
+        return None
+    try:
+        jwt = token.split("~", 1)[0]                     # issuer JWT, before any disclosures
+        payload_b64 = jwt.split(".")[1]
+        payload_b64 += "=" * (-len(payload_b64) % 4)     # restore base64url padding
+        payload = json.loads(base64.urlsafe_b64decode(payload_b64).decode("utf-8"))
+    except (ValueError, KeyError, IndexError):
+        return None
+    if not isinstance(payload, dict):                    # a valid-JSON non-object payload → nothing to count
+        return None
+    sd_arr = payload.get("_sd")
+    return len(sd_arr) if isinstance(sd_arr, list) else None

{proofbundle-1.0.0 → proofbundle-1.1.0/src/proofbundle.egg-info}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: proofbundle
-Version: 1.0.0
+Version: 1.1.0
 Summary: Emit and verify portable cryptographic evidence bundles, offline: Ed25519 + RFC 6962 Merkle + optional SD-JWT.
 Author: Konrad Gruszka
 License: MIT
@@ -111,6 +111,33 @@ disclosable receipt. The verifier shipped first, small and correct, so it could
 be reviewed and trusted on its own; `emit_bundle` now creates bundles that
 `verify_bundle` accepts, fully offline on both sides.
+## What a receipt proves (and what it does not)
+A receipt is a **tamper-evident, signed statement of authorship and integrity** over an eval or test result —
+not a proof that the number is *true* or that the evaluation was well designed. Hold these apart:
+- **It proves:** the payload was signed by the stated issuer (authorship), no byte changed since (integrity,
+  Ed25519 + RFC 6962), the model/dataset behind salted commitments, and — since v1.1 — the **assurance level**
+  is signed in — tamper-evident and bound to the issuer, so a third party cannot alter it. `show-eval`
+  displays the level, warns on the weakest combination (self_attested with no pre-registration), and shows
+  withheld SD-JWT fields + receipt age; the `verify_commitment` library call (the holder presents the
+  identifier + salt out of band) makes a model-swap visible.
+- **It does not prove:** that a *self-attested* issuer is honest. The level is issuer-DECLARED: a dishonest
+  issuer can sign `reproduced` on a self-run eval — the signature binds *who claimed it* to them, it does not
+  make the claim true (same as the score). The warning catches the honest self_attested case; a higher level
+  is only as trustworthy as the process behind it.
+- **Also not proven:** that a result was not cherry-picked from many runs without pre-registration, or that
+  the suite measures what it claims. Those need a pre-registered protocol or independent reproduction.
+| assurance_level | meaning |
+|---|---|
+| `self_attested` | issuer ran + signed it (default); trust rests on the issuer |
+| `third_party` | a third party checked before signing |
+| `reproduced` | independently re-run and matched |
+| `enclave_attested` | produced in an attested trusted execution environment |
+Full detail: **[THREAT_MODEL.md](THREAT_MODEL.md)** — what `verify` catches and what it structurally cannot.
 ## What it verifies
 A bundle is a single JSON document. `proofbundle` checks, offline:
@@ -408,8 +435,10 @@ attestation — see [SECURITY.md](SECURITY.md).
   a sharpened honesty guardrail (authenticity/integrity, not computation-correctness), and outreach drafts.
 - **v0.9** — the standards moat: a DSSE-signed in-toto `test-result` export, a C2SP tlog-checkpoint over
   the RFC 6962 root, an Every Eval Ever converter, and standards-native repositioning.
-- **v1.0 (current release)** — distribution: opt-in framework integrations that auto-emit a signed receipt
-  of an inspect_ai eval (end-of-task hook) or a pytest run (pytest11 plugin), plus a composite GitHub Action.
+- **v1.0** — distribution: opt-in framework integrations that auto-emit a signed receipt of an inspect_ai
+  eval (end-of-task hook) or a pytest run (pytest11 plugin), plus a composite GitHub Action.
+- **v1.1 (current release)** — trust hardening: a signed `assurance_level`, a THREAT_MODEL, a self_attested-
+  without-prereg warning, model-swap + replay + withheld-field checks, and an adversarial No-Fake-PASS suite.
 - **Deferred** (explicitly not yet built) — SD-JWT VC conformance + `vct` metadata,
   Key-Binding JWT, status lists / revocation, an official in-toto PR, DSSE / a full in-toto client.

{proofbundle-1.0.0 → proofbundle-1.1.0}/src/proofbundle.egg-info/SOURCES.txt RENAMED Viewed

@@ -31,6 +31,7 @@ src/proofbundle/adapters/eee.py
 src/proofbundle/adapters/inspect_ai.py
 src/proofbundle/adapters/lm_eval.py
 tests/test_adapters.py
+tests/test_adversarial.py
 tests/test_bundle.py
 tests/test_bundle_robustness.py
 tests/test_checkpoint.py

proofbundle-1.1.0/tests/test_adversarial.py ADDED Viewed

@@ -0,0 +1,95 @@
+"""Adversarial No-Fake-PASS suite (v1.1): actively try to FORGE a passing receipt, and pin down exactly what
+verify catches and what it structurally cannot. Each test documents the honest boundary — a green here means
+the defence held OR the limitation is named, never a hidden false PASS.
+"""
+import base64
+import json
+import unittest
+from proofbundle import verify_bundle
+from proofbundle.evalclaim import (
+    build_eval_claim, check_freshness, claim_warnings, decode_eval_claim, emit_eval_receipt,
+    sd_jwt_hidden_count, verify_commitment,
+)
+from proofbundle.emit import generate_signer
+def _receipt(score="0.99", threshold="0.80", prereg=None, assurance="self_attested", ts="2020-01-01T00:00:00Z"):
+    signer = generate_signer()
+    claim, salts = build_eval_claim(
+        suite="mmlu", suite_version="1", metric="accuracy", comparator=">=", threshold=threshold,
+        score=score, n=1000, model_id="secret-model", dataset_id="secret-data", issuer="",
+        timestamp=ts, prereg_sha256=prereg, assurance_level=assurance)
+    return emit_eval_receipt(claim, signer), salts
+class TestAdversarial(unittest.TestCase):
+    def test_a_invented_numbers_with_valid_signature_pass_is_expected(self):
+        # A receipt binds AUTHORSHIP + INTEGRITY, not TRUTH. A signed but invented score verifies — this is
+        # EXPECTED and documented. The honesty gate is the self_attested-without-prereg WARNING.
+        bundle, _ = _receipt(score="0.99")
+        self.assertTrue(verify_bundle(bundle).ok)                 # signature/integrity hold
+        claim = decode_eval_claim(bundle)
+        self.assertIsNotNone(claim)
+        self.assertTrue(claim["passed"])                          # invented pass, cryptographically fine
+        self.assertTrue(claim_warnings(claim), "self_attested+no-prereg MUST warn")   # the honest counter
+    def test_a_prereg_or_higher_assurance_removes_the_warning(self):
+        self.assertFalse(claim_warnings(decode_eval_claim(_receipt(prereg="a" * 64)[0])))
+        self.assertFalse(claim_warnings(decode_eval_claim(_receipt(assurance="reproduced")[0])))
+    def test_b_tampered_payload_fails(self):
+        bundle, _ = _receipt()
+        tampered = json.loads(json.dumps(bundle))
+        payload = json.loads(base64.b64decode(tampered["payload_b64"]))
+        payload["passed"] = True
+        payload["threshold"] = "0.10"                             # forge an easier bar
+        tampered["payload_b64"] = base64.b64encode(
+            json.dumps(payload).encode("utf-8")).decode("ascii")
+        self.assertFalse(verify_bundle(tampered).ok)              # signature no longer matches
+        self.assertIsNone(decode_eval_claim(tampered))
+    def test_c_omitted_sd_jwt_fields_are_counted(self):
+        # Selective disclosure hides claims behind _sd digests; the count makes OMISSION visible. This exercises
+        # the CANONICAL bundle form (sd_jwt_vc = {"compact": ...}) — the only form verify_bundle accepts — not a
+        # bare string, so it would catch the "reads sd_jwt/token but the real key is compact" regression.
+        hdr = base64.urlsafe_b64encode(b'{"alg":"ES256"}').decode().rstrip("=")
+        pl = base64.urlsafe_b64encode(
+            json.dumps({"_sd": ["d1", "d2", "d3"], "iss": "x"}).encode()).decode().rstrip("=")
+        canonical = {"sd_jwt_vc": {"compact": f"{hdr}.{pl}.sig~", "issuer_public_key_b64": "AA=="}}
+        self.assertEqual(sd_jwt_hidden_count(canonical), 3)       # 3 withheld fields surfaced on the REAL form
+        self.assertIsNone(sd_jwt_hidden_count({"schema": "x"}))   # no sd-jwt → None (nothing hidden)
+        # a valid-JSON but non-object payload must return None, never crash (defensive contract)
+        bad = base64.urlsafe_b64encode(b'[1,2,3]').decode().rstrip("=")
+        self.assertIsNone(sd_jwt_hidden_count({"sd_jwt_vc": {"compact": f"{hdr}.{bad}.sig~"}}))
+        # and on the shipped REAL bundle (a verify-passing sd_jwt_vc) it must surface a positive count
+        from pathlib import Path
+        ex = Path(__file__).resolve().parent.parent / "examples" / "example_bundle.json"
+        if ex.is_file():
+            n = sd_jwt_hidden_count(json.loads(ex.read_text()))
+            self.assertIsNotNone(n)
+            self.assertGreater(n, 0)
+    def test_d_model_swap_against_commitment_is_a_mismatch(self):
+        bundle, salts = _receipt()
+        claim = decode_eval_claim(bundle)
+        self.assertTrue(verify_commitment("secret-model", salts["model_salt"], claim["model_id_commit"]))
+        self.assertFalse(verify_commitment("swapped-model", salts["model_salt"], claim["model_id_commit"]))
+        self.assertFalse(verify_commitment("secret-model", b"\x00" * 16, claim["model_id_commit"]))
+    def test_e_replay_of_old_receipt_is_detectable(self):
+        claim = decode_eval_claim(_receipt(ts="2020-01-01T00:00:00Z")[0])
+        fresh = check_freshness(claim, max_age_seconds=3600)
+        self.assertTrue(fresh["parsed"])
+        self.assertFalse(fresh["fresh"])                          # years old → not fresh (replay/skew)
+        self.assertGreater(fresh["age_seconds"], 3600)
+    def test_f_honest_receipt_still_verifies_end_to_end(self):
+        # The hardening must not break a legitimate receipt (guards against over-tightening).
+        bundle, _ = _receipt(prereg="b" * 64, assurance="reproduced")
+        self.assertTrue(verify_bundle(bundle).ok)
+        self.assertIsNotNone(decode_eval_claim(bundle))
+if __name__ == "__main__":
+    unittest.main()

{proofbundle-1.0.0 → proofbundle-1.1.0}/LICENSE RENAMED Viewed

File without changes

{proofbundle-1.0.0 → proofbundle-1.1.0}/setup.cfg RENAMED Viewed

File without changes

{proofbundle-1.0.0 → proofbundle-1.1.0}/src/proofbundle/_inspect_registry.py RENAMED Viewed

File without changes

{proofbundle-1.0.0 → proofbundle-1.1.0}/src/proofbundle/_integration.py RENAMED Viewed

File without changes

{proofbundle-1.0.0 → proofbundle-1.1.0}/src/proofbundle/adapters/__init__.py RENAMED Viewed

File without changes

{proofbundle-1.0.0 → proofbundle-1.1.0}/src/proofbundle/adapters/eee.py RENAMED Viewed

File without changes

{proofbundle-1.0.0 → proofbundle-1.1.0}/src/proofbundle/adapters/inspect_ai.py RENAMED Viewed

File without changes

{proofbundle-1.0.0 → proofbundle-1.1.0}/src/proofbundle/adapters/lm_eval.py RENAMED Viewed

File without changes

{proofbundle-1.0.0 → proofbundle-1.1.0}/src/proofbundle/bundle.py RENAMED Viewed

File without changes

{proofbundle-1.0.0 → proofbundle-1.1.0}/src/proofbundle/checkpoint.py RENAMED Viewed

File without changes

{proofbundle-1.0.0 → proofbundle-1.1.0}/src/proofbundle/dsse.py RENAMED Viewed

File without changes

{proofbundle-1.0.0 → proofbundle-1.1.0}/src/proofbundle/eee_eval_schema.json RENAMED Viewed

File without changes

{proofbundle-1.0.0 → proofbundle-1.1.0}/src/proofbundle/emit.py RENAMED Viewed

File without changes

{proofbundle-1.0.0 → proofbundle-1.1.0}/src/proofbundle/errors.py RENAMED Viewed

File without changes

{proofbundle-1.0.0 → proofbundle-1.1.0}/src/proofbundle/inspect_hook.py RENAMED Viewed

File without changes

{proofbundle-1.0.0 → proofbundle-1.1.0}/src/proofbundle/intoto.py RENAMED Viewed

File without changes

{proofbundle-1.0.0 → proofbundle-1.1.0}/src/proofbundle/merkle.py RENAMED Viewed

File without changes

{proofbundle-1.0.0 → proofbundle-1.1.0}/src/proofbundle/py.typed RENAMED Viewed

File without changes

{proofbundle-1.0.0 → proofbundle-1.1.0}/src/proofbundle/pytest_plugin.py RENAMED Viewed

File without changes

{proofbundle-1.0.0 → proofbundle-1.1.0}/src/proofbundle/sdjwt.py RENAMED Viewed

File without changes

{proofbundle-1.0.0 → proofbundle-1.1.0}/src/proofbundle/sdjwt_issue.py RENAMED Viewed

File without changes

{proofbundle-1.0.0 → proofbundle-1.1.0}/src/proofbundle/signature.py RENAMED Viewed

File without changes

{proofbundle-1.0.0 → proofbundle-1.1.0}/src/proofbundle.egg-info/dependency_links.txt RENAMED Viewed

File without changes

{proofbundle-1.0.0 → proofbundle-1.1.0}/src/proofbundle.egg-info/entry_points.txt RENAMED Viewed

File without changes

{proofbundle-1.0.0 → proofbundle-1.1.0}/src/proofbundle.egg-info/requires.txt RENAMED Viewed

File without changes

{proofbundle-1.0.0 → proofbundle-1.1.0}/src/proofbundle.egg-info/top_level.txt RENAMED Viewed

File without changes

{proofbundle-1.0.0 → proofbundle-1.1.0}/tests/test_adapters.py RENAMED Viewed

File without changes

{proofbundle-1.0.0 → proofbundle-1.1.0}/tests/test_bundle.py RENAMED Viewed

File without changes

{proofbundle-1.0.0 → proofbundle-1.1.0}/tests/test_bundle_robustness.py RENAMED Viewed

File without changes

{proofbundle-1.0.0 → proofbundle-1.1.0}/tests/test_checkpoint.py RENAMED Viewed

File without changes

{proofbundle-1.0.0 → proofbundle-1.1.0}/tests/test_cli.py RENAMED Viewed

File without changes

{proofbundle-1.0.0 → proofbundle-1.1.0}/tests/test_cli_eval.py RENAMED Viewed

File without changes

{proofbundle-1.0.0 → proofbundle-1.1.0}/tests/test_eee.py RENAMED Viewed

File without changes

{proofbundle-1.0.0 → proofbundle-1.1.0}/tests/test_emit.py RENAMED Viewed

File without changes

{proofbundle-1.0.0 → proofbundle-1.1.0}/tests/test_eval_claim_schema.py RENAMED Viewed

File without changes

{proofbundle-1.0.0 → proofbundle-1.1.0}/tests/test_evalclaim.py RENAMED Viewed

File without changes

{proofbundle-1.0.0 → proofbundle-1.1.0}/tests/test_examples.py RENAMED Viewed

File without changes

{proofbundle-1.0.0 → proofbundle-1.1.0}/tests/test_inspect_hook.py RENAMED Viewed

File without changes

{proofbundle-1.0.0 → proofbundle-1.1.0}/tests/test_intoto.py RENAMED Viewed

File without changes

{proofbundle-1.0.0 → proofbundle-1.1.0}/tests/test_intoto_dsse.py RENAMED Viewed

File without changes

{proofbundle-1.0.0 → proofbundle-1.1.0}/tests/test_merkle.py RENAMED Viewed

File without changes

{proofbundle-1.0.0 → proofbundle-1.1.0}/tests/test_merkle_property.py RENAMED Viewed

File without changes

{proofbundle-1.0.0 → proofbundle-1.1.0}/tests/test_pytest_plugin.py RENAMED Viewed

File without changes

{proofbundle-1.0.0 → proofbundle-1.1.0}/tests/test_rekor_interop.py RENAMED Viewed

File without changes

{proofbundle-1.0.0 → proofbundle-1.1.0}/tests/test_rfc6962_external_vectors.py RENAMED Viewed

File without changes

{proofbundle-1.0.0 → proofbundle-1.1.0}/tests/test_schema.py RENAMED Viewed

File without changes

{proofbundle-1.0.0 → proofbundle-1.1.0}/tests/test_sdjwt_issue.py RENAMED Viewed

File without changes

{proofbundle-1.0.0 → proofbundle-1.1.0}/tests/test_sdjwt_reference.py RENAMED Viewed

File without changes

{proofbundle-1.0.0 → proofbundle-1.1.0}/tests/test_signature.py RENAMED Viewed

File without changes

proofbundle 1.0.0__tar.gz → 1.1.0__tar.gz

proofbundle 1.0.0tar.gz → 1.1.0tar.gz