PyPI - proofbundle - Versions diffs - 0.7.1__tar.gz → 0.8.0__tar.gz - Mend

proofbundle 0.7.1tar.gz → 0.8.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (44) hide show

{proofbundle-0.7.1/src/proofbundle.egg-info → proofbundle-0.8.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: proofbundle
-Version: 0.7.1
+Version: 0.8.0
 Summary: Emit and verify portable cryptographic evidence bundles, offline: Ed25519 + RFC 6962 Merkle + optional SD-JWT.
 Author: Konrad Gruszka
 License: MIT
@@ -70,7 +70,7 @@ selectively disclosable credential. Pure Python, no server, no daemon, one JSON
 **At a glance:** `proofbundle emit` signs and anchors a payload; `proofbundle
 verify` checks one self-contained `bundle.json` with three offline cryptographic
-checks → `OK` or `FAILED`. No network, no daemon, no own crypto. 72 tests.
+checks → `OK` or `FAILED`. No network, no daemon, no own crypto. 74 tests.
 ## Contents
@@ -79,6 +79,7 @@ checks → `OK` or `FAILED`. No network, no daemon, no own crypto. 72 tests.
 - [How it fits together](#how-it-fits-together)
 - [Install](#install)
 - [Quickstart](#quickstart)
+- [Demo](#demo--a-real-eval-log-to-a-verified-receipt-offline)
 - [Interoperability](#interoperability)
 - [Bundle format](#bundle-format-proofbundlev01)
 - [Eval receipts](#eval-receipts)
@@ -209,6 +210,21 @@ from proofbundle import verify_consistency
 verify_consistency(first_size, second_size, proof, first_root, second_root)  # -> bool
 ```
+## Demo — a real eval log to a verified receipt, offline
+```bash
+pip install "proofbundle[eval,inspect]"
+make demo          # or: bash scripts/demo.sh
+```
+`make demo` runs end-to-end with **no network, no API key, no GPU**: it takes genuine eval logs — an
+inspect_ai `mockllm/model` `.eval` log and an lm-evaluation-harness `--model dummy` `results.json`
+(committed under `tests/fixtures/`, generated offline) — turns each into a signed, Merkle-anchored
+proofbundle receipt, and verifies it to `=> OK`. The scores are random (a dummy model); the point is
+that the *artifact* is signed and offline-verifiable, with model and dataset kept as salted commitments.
+See [`examples/inspect_receipt.py`](examples/inspect_receipt.py) and
+[`examples/lm_eval_receipt.py`](examples/lm_eval_receipt.py).
 ## Interoperability
 proofbundle uses the same RFC 6962 / RFC 9162 Merkle primitive as
@@ -285,11 +301,24 @@ proofbundle show-eval receipt.json       # verify + print the claim (issuer-boun
 ```
 The claim format is specified in [EVAL_CLAIM.md](EVAL_CLAIM.md); the emit path uses
-RFC 8785 JCS canonicalization, the verify path stays dependency-free. **Honest scope:**
-a receipt proves `passed` against `threshold` and hides the model/dataset via salted
-commitments — it does **not** prove the evaluation was well designed or that the score
-itself is correct. Those are human judgements; what it removes is the need to simply
-trust the number.
+RFC 8785 JCS canonicalization, the verify path stays dependency-free.
+**Honesty guardrail (the exact scope).** A receipt attests the **authenticity and integrity** of a
+*claimed* result and its context — these exact bytes, signed by this key, anchored under this root, with
+model/dataset kept as salted commitments. It does **not** attest the **correctness of the computation**,
+and it cannot detect **cherry-picking** of the eval. Whether the eval was well designed, whether the
+suite measures what it claims, and whether the number was computed honestly are separate questions.
+Trusted-execution approaches such as [Attestable Audits](https://arxiv.org/abs/2506.23706) target
+computation-correctness with a different (hardware) trust model; proofbundle is the lightweight,
+hardware-free path to a portable, tamper-evident, selectively disclosable *result artifact*.
+**How this differs from a bare hash or a TEE.** A plain SHA-256 of a log commits to bytes but carries no
+signature, no tamper-evident anchor, and no selective disclosure (an attestation-exporter idea along
+those lines,
+[inspect_evals PR #1610](https://github.com/UKGovernmentBEIS/inspect_evals/pull/1610), was closed as
+belonging *a layer above* the framework — which is exactly where proofbundle sits). A TEE proves the
+computation ran untampered but needs specific hardware. proofbundle adds Ed25519 + RFC 6962 Merkle +
+SD-JWT selective disclosure over one portable file, offline.
 ### A verification layer for trustworthy eval logs
@@ -328,8 +357,10 @@ attestation — see [SECURITY.md](SECURITY.md).
 - **v0.5** — inspect_ai adapter (stable API), in-toto Statement v1 view, SD-JWT **issuance** (RFC 9901).
 - **v0.6** — a second eval adapter (lm-evaluation-harness, real format + provenance), INTEROP.md,
   CITATION.cff, PEP 740 attestations documented.
-- **v0.7 (current release)** — citability polish: ORCID in CITATION.cff, a Zenodo DOI placeholder
-  (assigned on release), and a draft in-toto ML-eval predicate proposal.
+- **v0.7** — citability polish (ORCID, Zenodo DOI placeholder, in-toto proposal draft); v0.7.1 hardened
+  verifier robustness + CI on Python 3.9 after a holistic review.
+- **v0.8 (current release)** — an offline `make demo` (real eval log -> signed receipt -> verified),
+  a sharpened honesty guardrail (authenticity/integrity, not computation-correctness), and outreach drafts.
 - **Deferred** (explicitly not yet built) — SD-JWT VC conformance + `vct` metadata,
   Key-Binding JWT, status lists / revocation, an official in-toto PR, DSSE / a full in-toto client.

{proofbundle-0.7.1 → proofbundle-0.8.0}/README.md RENAMED Viewed

@@ -27,7 +27,7 @@ selectively disclosable credential. Pure Python, no server, no daemon, one JSON
 **At a glance:** `proofbundle emit` signs and anchors a payload; `proofbundle
 verify` checks one self-contained `bundle.json` with three offline cryptographic
-checks → `OK` or `FAILED`. No network, no daemon, no own crypto. 72 tests.
+checks → `OK` or `FAILED`. No network, no daemon, no own crypto. 74 tests.
 ## Contents
@@ -36,6 +36,7 @@ checks → `OK` or `FAILED`. No network, no daemon, no own crypto. 72 tests.
 - [How it fits together](#how-it-fits-together)
 - [Install](#install)
 - [Quickstart](#quickstart)
+- [Demo](#demo--a-real-eval-log-to-a-verified-receipt-offline)
 - [Interoperability](#interoperability)
 - [Bundle format](#bundle-format-proofbundlev01)
 - [Eval receipts](#eval-receipts)
@@ -166,6 +167,21 @@ from proofbundle import verify_consistency
 verify_consistency(first_size, second_size, proof, first_root, second_root)  # -> bool
 ```
+## Demo — a real eval log to a verified receipt, offline
+```bash
+pip install "proofbundle[eval,inspect]"
+make demo          # or: bash scripts/demo.sh
+```
+`make demo` runs end-to-end with **no network, no API key, no GPU**: it takes genuine eval logs — an
+inspect_ai `mockllm/model` `.eval` log and an lm-evaluation-harness `--model dummy` `results.json`
+(committed under `tests/fixtures/`, generated offline) — turns each into a signed, Merkle-anchored
+proofbundle receipt, and verifies it to `=> OK`. The scores are random (a dummy model); the point is
+that the *artifact* is signed and offline-verifiable, with model and dataset kept as salted commitments.
+See [`examples/inspect_receipt.py`](examples/inspect_receipt.py) and
+[`examples/lm_eval_receipt.py`](examples/lm_eval_receipt.py).
 ## Interoperability
 proofbundle uses the same RFC 6962 / RFC 9162 Merkle primitive as
@@ -242,11 +258,24 @@ proofbundle show-eval receipt.json       # verify + print the claim (issuer-boun
 ```
 The claim format is specified in [EVAL_CLAIM.md](EVAL_CLAIM.md); the emit path uses
-RFC 8785 JCS canonicalization, the verify path stays dependency-free. **Honest scope:**
-a receipt proves `passed` against `threshold` and hides the model/dataset via salted
-commitments — it does **not** prove the evaluation was well designed or that the score
-itself is correct. Those are human judgements; what it removes is the need to simply
-trust the number.
+RFC 8785 JCS canonicalization, the verify path stays dependency-free.
+**Honesty guardrail (the exact scope).** A receipt attests the **authenticity and integrity** of a
+*claimed* result and its context — these exact bytes, signed by this key, anchored under this root, with
+model/dataset kept as salted commitments. It does **not** attest the **correctness of the computation**,
+and it cannot detect **cherry-picking** of the eval. Whether the eval was well designed, whether the
+suite measures what it claims, and whether the number was computed honestly are separate questions.
+Trusted-execution approaches such as [Attestable Audits](https://arxiv.org/abs/2506.23706) target
+computation-correctness with a different (hardware) trust model; proofbundle is the lightweight,
+hardware-free path to a portable, tamper-evident, selectively disclosable *result artifact*.
+**How this differs from a bare hash or a TEE.** A plain SHA-256 of a log commits to bytes but carries no
+signature, no tamper-evident anchor, and no selective disclosure (an attestation-exporter idea along
+those lines,
+[inspect_evals PR #1610](https://github.com/UKGovernmentBEIS/inspect_evals/pull/1610), was closed as
+belonging *a layer above* the framework — which is exactly where proofbundle sits). A TEE proves the
+computation ran untampered but needs specific hardware. proofbundle adds Ed25519 + RFC 6962 Merkle +
+SD-JWT selective disclosure over one portable file, offline.
 ### A verification layer for trustworthy eval logs
@@ -285,8 +314,10 @@ attestation — see [SECURITY.md](SECURITY.md).
 - **v0.5** — inspect_ai adapter (stable API), in-toto Statement v1 view, SD-JWT **issuance** (RFC 9901).
 - **v0.6** — a second eval adapter (lm-evaluation-harness, real format + provenance), INTEROP.md,
   CITATION.cff, PEP 740 attestations documented.
-- **v0.7 (current release)** — citability polish: ORCID in CITATION.cff, a Zenodo DOI placeholder
-  (assigned on release), and a draft in-toto ML-eval predicate proposal.
+- **v0.7** — citability polish (ORCID, Zenodo DOI placeholder, in-toto proposal draft); v0.7.1 hardened
+  verifier robustness + CI on Python 3.9 after a holistic review.
+- **v0.8 (current release)** — an offline `make demo` (real eval log -> signed receipt -> verified),
+  a sharpened honesty guardrail (authenticity/integrity, not computation-correctness), and outreach drafts.
 - **Deferred** (explicitly not yet built) — SD-JWT VC conformance + `vct` metadata,
   Key-Binding JWT, status lists / revocation, an official in-toto PR, DSSE / a full in-toto client.

{proofbundle-0.7.1 → proofbundle-0.8.0}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "proofbundle"
-version = "0.7.1"
+version = "0.8.0"
 description = "Emit and verify portable cryptographic evidence bundles, offline: Ed25519 + RFC 6962 Merkle + optional SD-JWT."
 readme = "README.md"
 requires-python = ">=3.9"

{proofbundle-0.7.1 → proofbundle-0.8.0}/src/proofbundle/__init__.py RENAMED Viewed

@@ -13,7 +13,7 @@ from .emit import emit_bundle, generate_signer
 from .errors import Check, ProofBundleError, VerificationResult
 from .merkle import verify_consistency, verify_inclusion
-__version__ = "0.7.1"
+__version__ = "0.8.0"
 __all__ = [
     "__version__",

{proofbundle-0.7.1 → proofbundle-0.8.0/src/proofbundle.egg-info}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: proofbundle
-Version: 0.7.1
+Version: 0.8.0
 Summary: Emit and verify portable cryptographic evidence bundles, offline: Ed25519 + RFC 6962 Merkle + optional SD-JWT.
 Author: Konrad Gruszka
 License: MIT
@@ -70,7 +70,7 @@ selectively disclosable credential. Pure Python, no server, no daemon, one JSON
 **At a glance:** `proofbundle emit` signs and anchors a payload; `proofbundle
 verify` checks one self-contained `bundle.json` with three offline cryptographic
-checks → `OK` or `FAILED`. No network, no daemon, no own crypto. 72 tests.
+checks → `OK` or `FAILED`. No network, no daemon, no own crypto. 74 tests.
 ## Contents
@@ -79,6 +79,7 @@ checks → `OK` or `FAILED`. No network, no daemon, no own crypto. 72 tests.
 - [How it fits together](#how-it-fits-together)
 - [Install](#install)
 - [Quickstart](#quickstart)
+- [Demo](#demo--a-real-eval-log-to-a-verified-receipt-offline)
 - [Interoperability](#interoperability)
 - [Bundle format](#bundle-format-proofbundlev01)
 - [Eval receipts](#eval-receipts)
@@ -209,6 +210,21 @@ from proofbundle import verify_consistency
 verify_consistency(first_size, second_size, proof, first_root, second_root)  # -> bool
 ```
+## Demo — a real eval log to a verified receipt, offline
+```bash
+pip install "proofbundle[eval,inspect]"
+make demo          # or: bash scripts/demo.sh
+```
+`make demo` runs end-to-end with **no network, no API key, no GPU**: it takes genuine eval logs — an
+inspect_ai `mockllm/model` `.eval` log and an lm-evaluation-harness `--model dummy` `results.json`
+(committed under `tests/fixtures/`, generated offline) — turns each into a signed, Merkle-anchored
+proofbundle receipt, and verifies it to `=> OK`. The scores are random (a dummy model); the point is
+that the *artifact* is signed and offline-verifiable, with model and dataset kept as salted commitments.
+See [`examples/inspect_receipt.py`](examples/inspect_receipt.py) and
+[`examples/lm_eval_receipt.py`](examples/lm_eval_receipt.py).
 ## Interoperability
 proofbundle uses the same RFC 6962 / RFC 9162 Merkle primitive as
@@ -285,11 +301,24 @@ proofbundle show-eval receipt.json       # verify + print the claim (issuer-boun
 ```
 The claim format is specified in [EVAL_CLAIM.md](EVAL_CLAIM.md); the emit path uses
-RFC 8785 JCS canonicalization, the verify path stays dependency-free. **Honest scope:**
-a receipt proves `passed` against `threshold` and hides the model/dataset via salted
-commitments — it does **not** prove the evaluation was well designed or that the score
-itself is correct. Those are human judgements; what it removes is the need to simply
-trust the number.
+RFC 8785 JCS canonicalization, the verify path stays dependency-free.
+**Honesty guardrail (the exact scope).** A receipt attests the **authenticity and integrity** of a
+*claimed* result and its context — these exact bytes, signed by this key, anchored under this root, with
+model/dataset kept as salted commitments. It does **not** attest the **correctness of the computation**,
+and it cannot detect **cherry-picking** of the eval. Whether the eval was well designed, whether the
+suite measures what it claims, and whether the number was computed honestly are separate questions.
+Trusted-execution approaches such as [Attestable Audits](https://arxiv.org/abs/2506.23706) target
+computation-correctness with a different (hardware) trust model; proofbundle is the lightweight,
+hardware-free path to a portable, tamper-evident, selectively disclosable *result artifact*.
+**How this differs from a bare hash or a TEE.** A plain SHA-256 of a log commits to bytes but carries no
+signature, no tamper-evident anchor, and no selective disclosure (an attestation-exporter idea along
+those lines,
+[inspect_evals PR #1610](https://github.com/UKGovernmentBEIS/inspect_evals/pull/1610), was closed as
+belonging *a layer above* the framework — which is exactly where proofbundle sits). A TEE proves the
+computation ran untampered but needs specific hardware. proofbundle adds Ed25519 + RFC 6962 Merkle +
+SD-JWT selective disclosure over one portable file, offline.
 ### A verification layer for trustworthy eval logs
@@ -328,8 +357,10 @@ attestation — see [SECURITY.md](SECURITY.md).
 - **v0.5** — inspect_ai adapter (stable API), in-toto Statement v1 view, SD-JWT **issuance** (RFC 9901).
 - **v0.6** — a second eval adapter (lm-evaluation-harness, real format + provenance), INTEROP.md,
   CITATION.cff, PEP 740 attestations documented.
-- **v0.7 (current release)** — citability polish: ORCID in CITATION.cff, a Zenodo DOI placeholder
-  (assigned on release), and a draft in-toto ML-eval predicate proposal.
+- **v0.7** — citability polish (ORCID, Zenodo DOI placeholder, in-toto proposal draft); v0.7.1 hardened
+  verifier robustness + CI on Python 3.9 after a holistic review.
+- **v0.8 (current release)** — an offline `make demo` (real eval log -> signed receipt -> verified),
+  a sharpened honesty guardrail (authenticity/integrity, not computation-correctness), and outreach drafts.
 - **Deferred** (explicitly not yet built) — SD-JWT VC conformance + `vct` metadata,
   Key-Binding JWT, status lists / revocation, an official in-toto PR, DSSE / a full in-toto client.

{proofbundle-0.7.1 → proofbundle-0.8.0}/src/proofbundle.egg-info/SOURCES.txt RENAMED Viewed

@@ -30,6 +30,7 @@ tests/test_cli_eval.py
 tests/test_emit.py
 tests/test_eval_claim_schema.py
 tests/test_evalclaim.py
+tests/test_examples.py
 tests/test_intoto.py
 tests/test_merkle.py
 tests/test_merkle_property.py

proofbundle-0.8.0/tests/test_examples.py ADDED Viewed

@@ -0,0 +1,28 @@
+"""The demo examples run end-to-end (real fixtures -> receipt -> verify). Covers `make demo` (Phase B)."""
+import importlib.util
+import sys
+import unittest
+from pathlib import Path
+REPO = Path(__file__).resolve().parents[1]
+def _run_example(name):
+    try:
+        import inspect_ai.log  # noqa: F401  (inspect example needs it)
+    except ImportError:
+        if name == "inspect_receipt":
+            raise unittest.SkipTest("inspect_ai not installed")
+    spec = importlib.util.spec_from_file_location(name, REPO / "examples" / f"{name}.py")
+    m = importlib.util.module_from_spec(spec)
+    sys.modules[name] = m
+    spec.loader.exec_module(m)
+    return m.main()
+class TestExamples(unittest.TestCase):
+    def test_lm_eval_receipt_example(self):
+        self.assertEqual(_run_example("lm_eval_receipt"), 0)
+    def test_inspect_receipt_example(self):
+        self.assertEqual(_run_example("inspect_receipt"), 0)

{proofbundle-0.7.1 → proofbundle-0.8.0}/LICENSE RENAMED Viewed

File without changes

{proofbundle-0.7.1 → proofbundle-0.8.0}/setup.cfg RENAMED Viewed

File without changes

{proofbundle-0.7.1 → proofbundle-0.8.0}/src/proofbundle/adapters/__init__.py RENAMED Viewed

File without changes

{proofbundle-0.7.1 → proofbundle-0.8.0}/src/proofbundle/adapters/inspect_ai.py RENAMED Viewed

File without changes

{proofbundle-0.7.1 → proofbundle-0.8.0}/src/proofbundle/adapters/lm_eval.py RENAMED Viewed

File without changes

{proofbundle-0.7.1 → proofbundle-0.8.0}/src/proofbundle/bundle.py RENAMED Viewed

File without changes

{proofbundle-0.7.1 → proofbundle-0.8.0}/src/proofbundle/cli.py RENAMED Viewed

File without changes

{proofbundle-0.7.1 → proofbundle-0.8.0}/src/proofbundle/emit.py RENAMED Viewed

File without changes

{proofbundle-0.7.1 → proofbundle-0.8.0}/src/proofbundle/errors.py RENAMED Viewed

File without changes

{proofbundle-0.7.1 → proofbundle-0.8.0}/src/proofbundle/evalclaim.py RENAMED Viewed

File without changes

{proofbundle-0.7.1 → proofbundle-0.8.0}/src/proofbundle/intoto.py RENAMED Viewed

File without changes

{proofbundle-0.7.1 → proofbundle-0.8.0}/src/proofbundle/merkle.py RENAMED Viewed

File without changes

{proofbundle-0.7.1 → proofbundle-0.8.0}/src/proofbundle/py.typed RENAMED Viewed

File without changes

{proofbundle-0.7.1 → proofbundle-0.8.0}/src/proofbundle/sdjwt.py RENAMED Viewed

File without changes

{proofbundle-0.7.1 → proofbundle-0.8.0}/src/proofbundle/sdjwt_issue.py RENAMED Viewed

File without changes

{proofbundle-0.7.1 → proofbundle-0.8.0}/src/proofbundle/signature.py RENAMED Viewed

File without changes

{proofbundle-0.7.1 → proofbundle-0.8.0}/src/proofbundle.egg-info/dependency_links.txt RENAMED Viewed

File without changes

{proofbundle-0.7.1 → proofbundle-0.8.0}/src/proofbundle.egg-info/entry_points.txt RENAMED Viewed

File without changes

{proofbundle-0.7.1 → proofbundle-0.8.0}/src/proofbundle.egg-info/requires.txt RENAMED Viewed

File without changes

{proofbundle-0.7.1 → proofbundle-0.8.0}/src/proofbundle.egg-info/top_level.txt RENAMED Viewed

File without changes

{proofbundle-0.7.1 → proofbundle-0.8.0}/tests/test_adapters.py RENAMED Viewed

File without changes

{proofbundle-0.7.1 → proofbundle-0.8.0}/tests/test_bundle.py RENAMED Viewed

File without changes

{proofbundle-0.7.1 → proofbundle-0.8.0}/tests/test_bundle_robustness.py RENAMED Viewed

File without changes

{proofbundle-0.7.1 → proofbundle-0.8.0}/tests/test_cli.py RENAMED Viewed

File without changes

{proofbundle-0.7.1 → proofbundle-0.8.0}/tests/test_cli_eval.py RENAMED Viewed

File without changes

{proofbundle-0.7.1 → proofbundle-0.8.0}/tests/test_emit.py RENAMED Viewed

File without changes

{proofbundle-0.7.1 → proofbundle-0.8.0}/tests/test_eval_claim_schema.py RENAMED Viewed

File without changes

{proofbundle-0.7.1 → proofbundle-0.8.0}/tests/test_evalclaim.py RENAMED Viewed

File without changes

{proofbundle-0.7.1 → proofbundle-0.8.0}/tests/test_intoto.py RENAMED Viewed

File without changes

{proofbundle-0.7.1 → proofbundle-0.8.0}/tests/test_merkle.py RENAMED Viewed

File without changes

{proofbundle-0.7.1 → proofbundle-0.8.0}/tests/test_merkle_property.py RENAMED Viewed

File without changes

{proofbundle-0.7.1 → proofbundle-0.8.0}/tests/test_rekor_interop.py RENAMED Viewed

File without changes

{proofbundle-0.7.1 → proofbundle-0.8.0}/tests/test_rfc6962_external_vectors.py RENAMED Viewed

File without changes

{proofbundle-0.7.1 → proofbundle-0.8.0}/tests/test_schema.py RENAMED Viewed

File without changes

{proofbundle-0.7.1 → proofbundle-0.8.0}/tests/test_sdjwt_issue.py RENAMED Viewed

File without changes

{proofbundle-0.7.1 → proofbundle-0.8.0}/tests/test_sdjwt_reference.py RENAMED Viewed

File without changes

{proofbundle-0.7.1 → proofbundle-0.8.0}/tests/test_signature.py RENAMED Viewed

File without changes

proofbundle 0.7.1__tar.gz → 0.8.0__tar.gz

proofbundle 0.7.1tar.gz → 0.8.0tar.gz