actproof 0.2.0__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
actproof/catalogue.py ADDED
@@ -0,0 +1,1031 @@
1
+ # SPDX-FileCopyrightText: 2026 Deyan Paroushev
2
+ # SPDX-License-Identifier: MIT
3
+ """
4
+ Load and query the actproof-events catalogue (v2 and v3 entries). Validate
5
+ manifests against entries.
6
+
7
+ A catalogue is a directory tree of JSON files; each file describes one regulated
8
+ or governance act type (NIS2 Article 20 management body approval, EUDR DDS
9
+ preparation, software release, board resolution, etc.). The catalogue is
10
+ maintained in a separate repository (actproof-events) under a permissive
11
+ license. This module reads it, indexes the entries by act_type_id, computes
12
+ content hashes for catalogue binding, and validates whether a manifest's claim
13
+ satisfies its entry's required fields and evidence labels.
14
+
15
+ Schema versions
16
+ ---------------
17
+
18
+ Entries may carry one of two schema discriminators:
19
+
20
+ - ``"actproof.act_catalogue_entry.v2"`` (introduced in actproof-events
21
+ v1.4-rc1): fifteen wire-schema fields covering claim shape, evidence,
22
+ signature policy, regulatory citation, and provenance.
23
+ - ``"actproof.act_catalogue_entry.v3"`` (introduced in actproof-events
24
+ v1.5-rc1): strict additive superset of v2. Adds four optional sub-objects
25
+ for richer act-type semantics: ``regulated_context_profile``,
26
+ ``prior_receipts_profile``, ``reliance_context``, ``disclosure_profile``.
27
+
28
+ Both are accepted by the loader. v2 entries leave the four v3 fields on
29
+ ``CatalogueEntry`` at ``None``. v3 entries populate them where the JSON
30
+ declares the corresponding blocks; absent blocks remain ``None``.
31
+
32
+ Path resolution
33
+ ---------------
34
+
35
+ The catalogue is located on disk. How the bytes get there (git submodule,
36
+ vendored copy, volume mount, fresh clone in CI) is a deployment decision the
37
+ library does not constrain. Resolution order:
38
+
39
+ 1. The ``acts_path`` argument to ``load_catalogue`` if provided.
40
+ 2. ``$ACTPROOF_CATALOGUE_PATH`` environment variable.
41
+ 3. ``./actproof-events/catalogue/acts/`` relative to the current working dir.
42
+ 4. ``./vendor/actproof-events/catalogue/acts/`` relative to the current working dir.
43
+
44
+ The schema file is located by default relative to the acts path at
45
+ ``../../spec/schemas/``. Resolution tries v3 first
46
+ (``act_catalogue_entry.v3.json``), falls back to v2
47
+ (``act_catalogue_entry.v2.json``). Whichever file is found is hashed into
48
+ ``Catalogue.schema_hash`` for receipt binding.
49
+
50
+ What gets loaded
51
+ ----------------
52
+
53
+ The loader walks the acts directory tree, reading every ``*.json`` file. Files
54
+ are filtered:
55
+
56
+ - Files in any ``_deprecated`` subdirectory are skipped. v1 entries archived
57
+ there are preserved for historical reference but are not loaded for new
58
+ issuance.
59
+ - Files matching ``*.test_vectors.json`` are skipped. Those are test fixtures,
60
+ not entries.
61
+ - Files whose top-level ``schema`` field is not one of the recognised
62
+ discriminators (``SCHEMA_DISCRIMINATORS``) are silently skipped (could be
63
+ schema files, READMEs in JSON, or other unrelated artifacts).
64
+ - Duplicate ``act_type_id`` raises ``CatalogueLoadError``.
65
+
66
+ What gets validated
67
+ -------------------
68
+
69
+ ``validate_manifest(manifest, catalogue)`` checks:
70
+
71
+ 1. The manifest's ``act_type_id`` exists in the loaded catalogue.
72
+ 2. The manifest's ``entry_version`` matches the loaded entry's version.
73
+ 3. The manifest's ``entry_hash`` matches the loaded entry's content hash.
74
+ 4. Every ``required_claim_field`` is present and non-empty in the manifest's claim.
75
+ 5. Every ``required_evidence_label`` has at least one ``Evidence`` in the manifest.
76
+ 6. Every ``Evidence.label`` is declared by the entry (in required or optional).
77
+ 7. The ``schema`` discriminator on the entry is one of
78
+ ``SCHEMA_DISCRIMINATORS``.
79
+
80
+ Returns a ``list[ValidationIssue]``. An empty list means valid. The caller
81
+ decides how to act on issues (treat all as errors, distinguish by ``code``, etc.).
82
+
83
+ Catalogue binding helpers
84
+ -------------------------
85
+
86
+ ``hash_entry_file(path) -> str`` returns ``"sha256:..."`` of the raw entry
87
+ JSON bytes. This is what gets stored in ``CatalogueBinding.entry_hash`` at
88
+ manifest issue time. Hashing raw file bytes (not canonical bytes) matches the
89
+ git-pinned model: anyone with the catalogue at the pinned commit can recompute
90
+ the hash with ``sha256sum entry.json`` and confirm.
91
+
92
+ ``hash_schema_file(path) -> str`` returns ``"sha256:..."`` of the schema file
93
+ bytes, for ``CatalogueBinding.schema_hash``.
94
+
95
+ API
96
+ ---
97
+
98
+ * ``load_catalogue(...) -> Catalogue``
99
+ * ``validate_manifest(manifest, catalogue) -> list[ValidationIssue]``
100
+ * ``hash_entry_file(path) -> str``
101
+ * ``hash_schema_file(path) -> str``
102
+
103
+ Data classes
104
+ ~~~~~~~~~~~~
105
+
106
+ * ``CatalogueEntry`` - one entry from the catalogue. v2 entries populate
107
+ fifteen v2 wire-schema fields plus two derived fields (``source_path``,
108
+ ``entry_hash``). v3 entries additionally populate up to four optional
109
+ sub-objects (see below); they default to ``None`` on v2 entries and on
110
+ v3 entries that do not declare them.
111
+ * ``RegulatoryCitation`` - optional regulatory anchor (v2+).
112
+ * ``SignaturePolicy`` - signature evidence requirements (v2+).
113
+ * ``RegulatedContextProfile`` - optional v3 block constraining the receipt
114
+ envelope ``regulated_context`` shape for this act type.
115
+ * ``PriorReceiptsProfile`` - optional v3 block declaring bilateral
116
+ lifecycle expectations (which ``prior_receipts`` roles MUST or MAY appear).
117
+ * ``RelianceContext`` - optional v3 block naming who issues, who relies,
118
+ and who later verifies.
119
+ * ``DisclosureProfile`` - optional v3 block declaring per-field disclosure
120
+ tier (public, commitment, private) and bilateral back-propagation scope.
121
+ * ``Catalogue`` - the loaded collection (entries dict + provenance metadata).
122
+ * ``ValidationIssue`` - one finding from manifest validation.
123
+
124
+ Exceptions
125
+ ~~~~~~~~~~
126
+
127
+ * ``CatalogueLoadError`` - raised on filesystem or parse problems at load.
128
+ """
129
+
130
+ from __future__ import annotations
131
+
132
+ import hashlib
133
+ import json
134
+ import logging
135
+ import os
136
+ from dataclasses import dataclass, field
137
+ from pathlib import Path
138
+ from typing import Mapping, Optional
139
+
140
+ from actproof.manifest import Manifest
141
+
142
+ logger = logging.getLogger(__name__)
143
+
144
+ __all__ = [
145
+ "Catalogue",
146
+ "CatalogueEntry",
147
+ "RegulatoryCitation",
148
+ "SignaturePolicy",
149
+ "RegulatedContextProfile",
150
+ "PriorReceiptsProfile",
151
+ "RelianceContext",
152
+ "DisclosureProfile",
153
+ "ValidationIssue",
154
+ "CatalogueLoadError",
155
+ "load_catalogue",
156
+ "validate_manifest",
157
+ "hash_entry_file",
158
+ "hash_schema_file",
159
+ "SCHEMA_DISCRIMINATOR",
160
+ "SCHEMA_DISCRIMINATOR_V2",
161
+ "SCHEMA_DISCRIMINATOR_V3",
162
+ "SCHEMA_DISCRIMINATORS",
163
+ "ENV_CATALOGUE_PATH",
164
+ ]
165
+
166
+
167
+ # ─────────────────────────────────────────────────────────────────
168
+ # CONSTANTS
169
+ # ─────────────────────────────────────────────────────────────────
170
+
171
+ SCHEMA_DISCRIMINATOR_V2: str = "actproof.act_catalogue_entry.v2"
172
+ """Schema discriminator for v2 catalogue entries (fifteen wire-schema fields,
173
+ no v3 sub-objects)."""
174
+
175
+ SCHEMA_DISCRIMINATOR_V3: str = "actproof.act_catalogue_entry.v3"
176
+ """Schema discriminator for v3 catalogue entries (fifteen v2 fields plus four
177
+ optional sub-objects: ``regulated_context_profile``, ``prior_receipts_profile``,
178
+ ``reliance_context``, ``disclosure_profile``)."""
179
+
180
+ SCHEMA_DISCRIMINATORS: frozenset[str] = frozenset({
181
+ SCHEMA_DISCRIMINATOR_V2,
182
+ SCHEMA_DISCRIMINATOR_V3,
183
+ })
184
+ """Set of all schema discriminator strings recognised by this loader.
185
+ Membership check: ``entry_data["schema"] in SCHEMA_DISCRIMINATORS``."""
186
+
187
+ SCHEMA_DISCRIMINATOR: str = SCHEMA_DISCRIMINATOR_V2
188
+ """Backward-compatible alias for ``SCHEMA_DISCRIMINATOR_V2``. Retained so that
189
+ external consumers that imported this name from actproof v0.1.0 continue to
190
+ work. New code should use ``SCHEMA_DISCRIMINATOR_V2`` and
191
+ ``SCHEMA_DISCRIMINATOR_V3`` directly, and ``SCHEMA_DISCRIMINATORS`` for
192
+ membership checks."""
193
+
194
+ ENV_CATALOGUE_PATH: str = "ACTPROOF_CATALOGUE_PATH"
195
+ """Environment variable consulted for the catalogue acts path."""
196
+
197
+ _FALLBACK_ACTS_PATHS: tuple[str, ...] = (
198
+ "actproof-events/catalogue/acts",
199
+ "vendor/actproof-events/catalogue/acts",
200
+ )
201
+ """Filesystem locations to try if no path is given and the env var is unset."""
202
+
203
+ _SCHEMA_RELATIVE_PATH_V3: tuple[str, ...] = (
204
+ "..", "..", "spec", "schemas", "act_catalogue_entry.v3.json",
205
+ )
206
+ """Default v3 schema path relative to the acts directory."""
207
+
208
+ _SCHEMA_RELATIVE_PATH_V2: tuple[str, ...] = (
209
+ "..", "..", "spec", "schemas", "act_catalogue_entry.v2.json",
210
+ )
211
+ """Default v2 schema path relative to the acts directory."""
212
+
213
+
214
+ # ─────────────────────────────────────────────────────────────────
215
+ # EXCEPTIONS
216
+ # ─────────────────────────────────────────────────────────────────
217
+
218
+ class CatalogueLoadError(RuntimeError):
219
+ """Raised when the catalogue cannot be loaded (path missing, parse error,
220
+ duplicate act_type_id, invalid entry shape).
221
+ """
222
+
223
+
224
+ # ─────────────────────────────────────────────────────────────────
225
+ # DATA CLASSES
226
+ # ─────────────────────────────────────────────────────────────────
227
+
228
+ @dataclass(frozen=True)
229
+ class RegulatoryCitation:
230
+ """Regulatory anchor for an act type (may be ``None`` on the parent entry).
231
+
232
+ Attributes:
233
+ instrument: Name of the regulatory instrument, e.g.
234
+ ``"Directive (EU) 2022/2555"``.
235
+ article: Article, section, or paragraph reference.
236
+ jurisdiction: ISO 3166-1 alpha-2, ISO 3166-2 subdivision, or
237
+ supranational tag (``"EU"``, ``"UN"``).
238
+ in_force_from: ISO 8601 date when the instrument took effect.
239
+ """
240
+ instrument: str
241
+ article: str
242
+ jurisdiction: str
243
+ in_force_from: str
244
+
245
+
246
+ @dataclass(frozen=True)
247
+ class SignaturePolicy:
248
+ """Signature evidence requirements for an act type.
249
+
250
+ Attributes:
251
+ minimum: One of ``"issuer_record"``, ``"external_signature"``, or
252
+ ``"either"``. ``issuer_record`` is a platform-recorded commit
253
+ action with metadata (email, IP, UA, token hash, timestamp); it
254
+ is evidence, NOT an Advanced Electronic Signature under
255
+ eIDAS Regulation 910/2014. ``external_signature`` requires an
256
+ externally produced signature artifact (QES, AES, signed PDF).
257
+ ``either`` accepts both.
258
+ supports: Tuple of evidence labels for externally produced signature
259
+ artifacts the act type recognises.
260
+ """
261
+ minimum: str
262
+ supports: tuple[str, ...]
263
+
264
+
265
+ # ─────────────────────────────────────────────────────────────────
266
+ # v3 OPTIONAL SUB-OBJECTS
267
+ #
268
+ # These four dataclasses correspond to the four optional blocks added in
269
+ # the act_catalogue_entry.v3 schema. They are absent on v2 entries and
270
+ # optional on v3 entries. Where present in JSON, they are parsed into
271
+ # these dataclasses and exposed via the corresponding CatalogueEntry
272
+ # fields. Where absent, the CatalogueEntry field stays ``None``.
273
+ # ─────────────────────────────────────────────────────────────────
274
+
275
+ @dataclass(frozen=True)
276
+ class RegulatedContextProfile:
277
+ """Constrains the receipt envelope ``regulated_context`` shape for an act type.
278
+
279
+ Introduced in act_catalogue_entry.v3. Validators MAY use this to reject
280
+ receipts whose ``regulated_context.context_type`` or ``submission_stage``
281
+ are not permitted by the issuing act type.
282
+
283
+ Attributes:
284
+ allowed_context_types: Tuple of permitted ``context_type`` values.
285
+ A non-empty subset of ``{"transaction_or_shipment",
286
+ "incident_lifecycle", "reporting_period", "assurance_handoff",
287
+ "release_or_publication"}``.
288
+ allowed_submission_stages: Tuple of permitted ``submission_stage``
289
+ values. Each stage belongs grammatically to one of the
290
+ ``allowed_context_types`` per the receipt envelope spec. Empty
291
+ tuple means no stage restriction beyond what the envelope
292
+ schema enforces.
293
+ default_context_type: ``context_type`` assumed when an issuer does
294
+ not supply one. ``None`` means no default. If present, MUST be
295
+ a member of ``allowed_context_types``.
296
+ """
297
+ allowed_context_types: tuple[str, ...]
298
+ allowed_submission_stages: tuple[str, ...] = ()
299
+ default_context_type: Optional[str] = None
300
+
301
+
302
+ @dataclass(frozen=True)
303
+ class PriorReceiptsProfile:
304
+ """Declares bilateral lifecycle expectations for receipts under an act type.
305
+
306
+ Introduced in act_catalogue_entry.v3. Validators MAY use this to reject
307
+ ``prior_receipts`` entries whose role is not declared, or to require
308
+ that certain roles be present.
309
+
310
+ actproof-events v1.5-rc1 entries leave this block absent (or empty)
311
+ because the May 19 dogfood does not exercise bilateral propagation.
312
+ The first exercised pair lands in v1.5-rc2 by May 30.
313
+
314
+ Attributes:
315
+ required_roles: Tuple of role identifiers that MUST appear in a
316
+ receipt's ``prior_receipts`` array. Empty tuple means no
317
+ prior receipts required.
318
+ optional_roles: Tuple of role identifiers that MAY appear in a
319
+ receipt's ``prior_receipts`` array.
320
+ """
321
+ required_roles: tuple[str, ...] = ()
322
+ optional_roles: tuple[str, ...] = ()
323
+
324
+
325
+ @dataclass(frozen=True)
326
+ class RelianceContext:
327
+ """Names who issues, who relies, and who later verifies receipts under this act type.
328
+
329
+ Introduced in act_catalogue_entry.v3. Conveys the act type's intended
330
+ reliance pattern in a structured form, so downstream tooling can
331
+ surface it without parsing prose documentation.
332
+
333
+ Attributes:
334
+ issuer_role: Functional description of the entity issuing receipts
335
+ under this act type. Example: ``"open-source maintainer
336
+ issuing a release attestation"``.
337
+ counterparty_action: What the counterparty does on receiving the
338
+ receipt. Example: ``"competent authority supervisor records
339
+ the attestation in the supervisory file"``.
340
+ later_verifiers: Tuple of categories of third parties expected to
341
+ verify the receipt independently at later points in time.
342
+ Example: ``("regulator", "external_auditor", "public")``.
343
+ reliance_statement: One-sentence statement of what the receipt
344
+ asserts that the counterparty and later verifiers may rely
345
+ upon. ``None`` if not declared.
346
+ """
347
+ issuer_role: str
348
+ counterparty_action: str
349
+ later_verifiers: tuple[str, ...]
350
+ reliance_statement: Optional[str] = None
351
+
352
+
353
+ @dataclass(frozen=True)
354
+ class DisclosureProfile:
355
+ """Per-field disclosure tier and bilateral back-propagation scope.
356
+
357
+ Introduced in act_catalogue_entry.v3. Declares which manifest fields
358
+ are stored cleartext, which are stored as salted SHA-256 commitments,
359
+ and which are omitted from the manifest entirely. Also declares which
360
+ fields are surfaced back to each prior-receipt role in the automatic
361
+ disclosure receipt generated at settlement.
362
+
363
+ Three tiers:
364
+
365
+ - ``public``: cleartext in the canonical manifest.
366
+ - ``commitment``: salted SHA-256 hash in the manifest; cleartext lives
367
+ only in the issuer's holder receipt. Commitment construction is
368
+ ``sha256(salt || ":" || qualified_field_name || ":" || canonical_value)``.
369
+ - ``private``: omitted from the manifest entirely; lives only in the
370
+ holder receipt.
371
+
372
+ Catalogue releases MAY restrict the use of the private tier.
373
+ actproof-events v1.5-rc1 entries MUST have ``private_fields = ()``
374
+ (a constraint enforced by the catalogue validator, not by this
375
+ dataclass).
376
+
377
+ Field references in the three tier tuples MAY be simple claim field
378
+ names (e.g. ``"supplier_name"``) or dotted manifest paths
379
+ (e.g. ``"manifest.title"``, ``"manifest.issuer.legal_name"``,
380
+ ``"manifest.evidence[].label"``, ``"manifest.recipients[].org_name"``).
381
+ Fields not named in any of the three tier tuples default to public.
382
+
383
+ Attributes:
384
+ public_fields: Tuple of field references stored cleartext in the
385
+ canonical manifest.
386
+ commitment_fields: Tuple of field references stored as salted
387
+ SHA-256 commitments in the canonical manifest.
388
+ private_fields: Tuple of field references omitted from the
389
+ manifest entirely.
390
+ back_propagation_scope: Mapping from a ``prior_receipts`` role to
391
+ a tuple of field references visible to that prior receipt's
392
+ issuer in the automatic disclosure receipt generated at
393
+ settlement. Empty mapping means no automatic back-propagation
394
+ declared. Multiple roles MAY be declared even if a given
395
+ catalogue release exercises only one.
396
+ """
397
+ public_fields: tuple[str, ...]
398
+ commitment_fields: tuple[str, ...]
399
+ private_fields: tuple[str, ...]
400
+ back_propagation_scope: Mapping[str, tuple[str, ...]] = field(
401
+ default_factory=dict
402
+ )
403
+
404
+
405
+ @dataclass(frozen=True)
406
+ class CatalogueEntry:
407
+ """A single catalogue entry. Fifteen v2 wire-schema fields, two derived
408
+ fields, and four optional v3 sub-objects.
409
+
410
+ v2 entries populate the fifteen v2 wire-schema fields. The four v3
411
+ sub-object fields default to ``None`` and remain ``None`` on v2 entries.
412
+ v3 entries additionally populate up to four of the optional sub-objects;
413
+ fields not declared in JSON remain ``None``.
414
+
415
+ Attributes:
416
+ schema: Schema discriminator. ``"actproof.act_catalogue_entry.v2"``
417
+ for v2 entries, ``"actproof.act_catalogue_entry.v3"`` for v3
418
+ entries.
419
+ act_type_id: Canonical identifier under ``op:`` namespace.
420
+ claim_type: snake_case semantic shape identifier.
421
+ display_name: Human-readable display name.
422
+ regulatory_citation: Optional regulatory anchor.
423
+ required_claim_fields: Tuple of claim field names that MUST be present
424
+ and non-empty in a manifest's claim.
425
+ optional_claim_fields: Tuple of claim field names a manifest MAY use.
426
+ required_evidence_labels: Tuple of evidence labels that MUST each be
427
+ covered by at least one ``Evidence`` in a manifest.
428
+ eligible_issuer_roles: Tuple of issuer authority labels (e.g.
429
+ ``"essential_entity"``) that can issue this act type.
430
+ recommended_witness_roles: Tuple of recipient roles typically
431
+ designated as witnesses for this act type.
432
+ signature_policy: Signature requirements.
433
+ version: Integer version of this entry.
434
+ supersedes: act_type_id of the entry this one supersedes, or ``None``.
435
+ maintainer: Identifier of the entry's maintainer.
436
+ test_vector_reference: Path or URI of the test vectors file.
437
+ source_path: Local filesystem path the entry was loaded from. Derived,
438
+ not part of the wire schema. Useful for debugging.
439
+ entry_hash: ``"sha256:..."`` of the raw entry JSON file bytes. Derived
440
+ at load time. Goes into ``CatalogueBinding.entry_hash`` at manifest
441
+ issue time.
442
+ regulated_context_profile: Optional v3 block constraining the receipt
443
+ envelope ``regulated_context`` shape for this act type. ``None``
444
+ for v2 entries and for v3 entries that do not declare the block.
445
+ prior_receipts_profile: Optional v3 block declaring bilateral
446
+ lifecycle expectations. ``None`` for v2 entries and for v3
447
+ entries that do not declare the block.
448
+ reliance_context: Optional v3 block naming the issuer role,
449
+ counterparty action, and later verifiers. ``None`` for v2
450
+ entries and for v3 entries that do not declare the block.
451
+ disclosure_profile: Optional v3 block declaring per-field disclosure
452
+ tier and back-propagation scope. ``None`` for v2 entries and for
453
+ v3 entries that do not declare the block.
454
+ """
455
+ schema: str
456
+ act_type_id: str
457
+ claim_type: str
458
+ display_name: str
459
+ regulatory_citation: Optional[RegulatoryCitation]
460
+ required_claim_fields: tuple[str, ...]
461
+ optional_claim_fields: tuple[str, ...]
462
+ required_evidence_labels: tuple[str, ...]
463
+ eligible_issuer_roles: tuple[str, ...]
464
+ recommended_witness_roles: tuple[str, ...]
465
+ signature_policy: SignaturePolicy
466
+ version: int
467
+ supersedes: Optional[str]
468
+ maintainer: str
469
+ test_vector_reference: str
470
+ # Derived fields:
471
+ source_path: str = ""
472
+ entry_hash: str = ""
473
+ # v3 optional sub-objects (None for v2 entries; populated from JSON on v3
474
+ # entries that declare them; remain None on v3 entries that do not).
475
+ # Placed after derived fields to preserve positional construction for any
476
+ # caller relying on the v2 field order (source_path, entry_hash).
477
+ regulated_context_profile: Optional[RegulatedContextProfile] = None
478
+ prior_receipts_profile: Optional[PriorReceiptsProfile] = None
479
+ reliance_context: Optional[RelianceContext] = None
480
+ disclosure_profile: Optional[DisclosureProfile] = None
481
+
482
+
483
+ @dataclass(frozen=True)
484
+ class Catalogue:
485
+ """The loaded catalogue: indexed entries plus provenance metadata.
486
+
487
+ Attributes:
488
+ entries: Mapping from ``act_type_id`` to ``CatalogueEntry``.
489
+ source_root: Filesystem path the catalogue was loaded from (the
490
+ ``catalogue/acts/`` directory).
491
+ source_uri: External URI of the catalogue source repository (e.g.
492
+ GitHub URL), if known. May be ``None`` if not provided.
493
+ git_commit: 40-character git SHA-1 of the catalogue source at load
494
+ time, if known. May be ``None`` if the catalogue is not in a
495
+ git working tree or the caller did not provide it.
496
+ schema_hash: ``"sha256:..."`` of the catalogue schema file bytes.
497
+ Empty string if the schema file could not be located.
498
+ """
499
+ entries: Mapping[str, CatalogueEntry]
500
+ source_root: str
501
+ source_uri: Optional[str]
502
+ git_commit: Optional[str]
503
+ schema_hash: str
504
+
505
+ def get(self, act_type_id: str) -> Optional[CatalogueEntry]:
506
+ """Look up an entry by ``act_type_id``. Returns ``None`` if absent."""
507
+ return self.entries.get(act_type_id)
508
+
509
+ def list_entries(self) -> list[CatalogueEntry]:
510
+ """Return all entries sorted by display_name."""
511
+ return sorted(self.entries.values(), key=lambda e: e.display_name)
512
+
513
+ def __len__(self) -> int:
514
+ return len(self.entries)
515
+
516
+ def __contains__(self, act_type_id: object) -> bool:
517
+ return act_type_id in self.entries
518
+
519
+
520
+ @dataclass(frozen=True)
521
+ class ValidationIssue:
522
+ """One finding from manifest validation.
523
+
524
+ Attributes:
525
+ code: Machine-readable code, one of:
526
+
527
+ - ``UNKNOWN_ACT_TYPE``: manifest's act_type_id not in catalogue.
528
+ - ``ENTRY_VERSION_MISMATCH``: manifest cites a different version.
529
+ - ``ENTRY_HASH_MISMATCH``: manifest's entry_hash differs from
530
+ what's in the loaded catalogue.
531
+ - ``SCHEMA_HASH_MISMATCH``: manifest's schema_hash differs from
532
+ what's in the loaded catalogue.
533
+ - ``MISSING_REQUIRED_CLAIM_FIELD``: a required claim field is
534
+ absent or empty in the manifest's claim.
535
+ - ``MISSING_REQUIRED_EVIDENCE_LABEL``: a required evidence label
536
+ has no covering ``Evidence`` in the manifest.
537
+ - ``UNKNOWN_EVIDENCE_LABEL``: an ``Evidence`` carries a label
538
+ not declared by the entry (neither required nor optional).
539
+ message: Human-readable description.
540
+ field: JSON-Path-style location (e.g. ``"claim.decision_date"``,
541
+ ``"evidence[2].label"``), or ``None`` for top-level issues.
542
+ """
543
+ code: str
544
+ message: str
545
+ field: Optional[str] = None
546
+
547
+
548
+ # ─────────────────────────────────────────────────────────────────
549
+ # PATH RESOLUTION
550
+ # ─────────────────────────────────────────────────────────────────
551
+
552
+ def _resolve_acts_path(explicit: Optional[Path]) -> Path:
553
+ """Resolve the catalogue acts path from arg, env var, or fallbacks."""
554
+ if explicit is not None:
555
+ path = explicit.expanduser().resolve()
556
+ if not path.is_dir():
557
+ raise CatalogueLoadError(
558
+ f"acts_path {explicit!r} is not a directory (resolved to {path})"
559
+ )
560
+ return path
561
+
562
+ env_value = os.environ.get(ENV_CATALOGUE_PATH)
563
+ if env_value:
564
+ path = Path(env_value).expanduser().resolve()
565
+ if not path.is_dir():
566
+ raise CatalogueLoadError(
567
+ f"{ENV_CATALOGUE_PATH}={env_value!r} is not a directory "
568
+ f"(resolved to {path})"
569
+ )
570
+ return path
571
+
572
+ for fallback in _FALLBACK_ACTS_PATHS:
573
+ candidate = Path(fallback).resolve()
574
+ if candidate.is_dir():
575
+ return candidate
576
+
577
+ raise CatalogueLoadError(
578
+ f"Could not locate the catalogue acts directory. "
579
+ f"Pass acts_path explicitly, set {ENV_CATALOGUE_PATH}, "
580
+ f"or place the catalogue at one of: {', '.join(_FALLBACK_ACTS_PATHS)}."
581
+ )
582
+
583
+
584
+ def _resolve_schema_path(acts_path: Path) -> Optional[Path]:
585
+ """Find the schema file relative to the acts directory.
586
+
587
+ Tries v3 first (``act_catalogue_entry.v3.json``), falls back to v2
588
+ (``act_catalogue_entry.v2.json``). Returns ``None`` if neither file
589
+ exists. Catalogues that ship v3 entries should have the v3 schema file
590
+ present; catalogues that ship only v2 entries may have only the v2
591
+ schema file. Whichever file is found is what gets hashed into
592
+ ``Catalogue.schema_hash``.
593
+ """
594
+ for relative_parts in (_SCHEMA_RELATIVE_PATH_V3, _SCHEMA_RELATIVE_PATH_V2):
595
+ candidate = acts_path.joinpath(*relative_parts).resolve()
596
+ if candidate.is_file():
597
+ return candidate
598
+ return None
599
+
600
+
601
+ # ─────────────────────────────────────────────────────────────────
602
+ # PARSING
603
+ # ─────────────────────────────────────────────────────────────────
604
+
605
+ def _parse_entry(data: dict, source_path: str, entry_hash: str) -> CatalogueEntry:
606
+ """Build a ``CatalogueEntry`` from a parsed JSON dict.
607
+
608
+ Accepts entries with either the v2 or v3 schema discriminator. v3 entries
609
+ additionally parse the four optional sub-objects (``regulated_context_profile``,
610
+ ``prior_receipts_profile``, ``reliance_context``, ``disclosure_profile``)
611
+ into their respective dataclasses where the corresponding JSON blocks
612
+ are present; absent blocks leave the field at ``None``. v2 entries
613
+ never populate the v3 fields, even if the JSON happens to carry them
614
+ (this matches the loader's general tolerance for extra dict keys; the
615
+ v2 JSON schema file enforces strict ``additionalProperties: false`` for
616
+ consumers that validate at the schema-file level).
617
+
618
+ Raises:
619
+ CatalogueLoadError: If the schema discriminator is not recognised,
620
+ or a required field on the entry or on a present v3 sub-object
621
+ is missing or wrong-typed.
622
+ """
623
+ schema_value = data.get("schema")
624
+ if schema_value not in SCHEMA_DISCRIMINATORS:
625
+ raise CatalogueLoadError(
626
+ f"Not a recognised catalogue entry: schema={schema_value!r} at "
627
+ f"{source_path}. Expected one of: {sorted(SCHEMA_DISCRIMINATORS)}"
628
+ )
629
+
630
+ try:
631
+ citation_data = data.get("regulatory_citation")
632
+ citation: Optional[RegulatoryCitation] = None
633
+ if citation_data is not None:
634
+ citation = RegulatoryCitation(
635
+ instrument=citation_data["instrument"],
636
+ article=citation_data["article"],
637
+ jurisdiction=citation_data["jurisdiction"],
638
+ in_force_from=citation_data["in_force_from"],
639
+ )
640
+
641
+ sig_data = data["signature_policy"]
642
+ sig_policy = SignaturePolicy(
643
+ minimum=sig_data["minimum"],
644
+ supports=tuple(sig_data.get("supports", [])),
645
+ )
646
+
647
+ # v3 optional sub-objects. Populated only when the entry is v3 AND
648
+ # the corresponding block is present in JSON. v2 entries always
649
+ # yield None for all four (extra dict keys on v2 entries are
650
+ # tolerated by the Python loader but ignored here; the v2 JSON
651
+ # schema file enforces strict additionalProperties:false at
652
+ # schema-validation time for consumers that run that check).
653
+ regulated_context_profile: Optional[RegulatedContextProfile] = None
654
+ prior_receipts_profile: Optional[PriorReceiptsProfile] = None
655
+ reliance_context: Optional[RelianceContext] = None
656
+ disclosure_profile: Optional[DisclosureProfile] = None
657
+
658
+ if schema_value == SCHEMA_DISCRIMINATOR_V3:
659
+ rcp_data = data.get("regulated_context_profile")
660
+ if rcp_data is not None:
661
+ regulated_context_profile = RegulatedContextProfile(
662
+ allowed_context_types=tuple(rcp_data["allowed_context_types"]),
663
+ allowed_submission_stages=tuple(
664
+ rcp_data.get("allowed_submission_stages", [])
665
+ ),
666
+ default_context_type=rcp_data.get("default_context_type"),
667
+ )
668
+
669
+ prp_data = data.get("prior_receipts_profile")
670
+ if prp_data is not None:
671
+ prior_receipts_profile = PriorReceiptsProfile(
672
+ required_roles=tuple(prp_data.get("required_roles", [])),
673
+ optional_roles=tuple(prp_data.get("optional_roles", [])),
674
+ )
675
+
676
+ rc_data = data.get("reliance_context")
677
+ if rc_data is not None:
678
+ reliance_context = RelianceContext(
679
+ issuer_role=rc_data["issuer_role"],
680
+ counterparty_action=rc_data["counterparty_action"],
681
+ later_verifiers=tuple(rc_data["later_verifiers"]),
682
+ reliance_statement=rc_data.get("reliance_statement"),
683
+ )
684
+
685
+ dp_data = data.get("disclosure_profile")
686
+ if dp_data is not None:
687
+ bps_raw = dp_data.get("back_propagation_scope") or {}
688
+ back_prop = {
689
+ role: tuple(field_refs)
690
+ for role, field_refs in bps_raw.items()
691
+ }
692
+ disclosure_profile = DisclosureProfile(
693
+ public_fields=tuple(dp_data["public_fields"]),
694
+ commitment_fields=tuple(dp_data["commitment_fields"]),
695
+ private_fields=tuple(dp_data["private_fields"]),
696
+ back_propagation_scope=back_prop,
697
+ )
698
+
699
+ return CatalogueEntry(
700
+ schema=schema_value,
701
+ act_type_id=data["act_type_id"],
702
+ claim_type=data["claim_type"],
703
+ display_name=data["display_name"],
704
+ regulatory_citation=citation,
705
+ required_claim_fields=tuple(data["required_claim_fields"]),
706
+ optional_claim_fields=tuple(data.get("optional_claim_fields", [])),
707
+ required_evidence_labels=tuple(data["required_evidence_labels"]),
708
+ eligible_issuer_roles=tuple(data["eligible_issuer_roles"]),
709
+ recommended_witness_roles=tuple(data["recommended_witness_roles"]),
710
+ signature_policy=sig_policy,
711
+ version=int(data["version"]),
712
+ supersedes=data.get("supersedes"),
713
+ maintainer=data["maintainer"],
714
+ test_vector_reference=data["test_vector_reference"],
715
+ source_path=source_path,
716
+ entry_hash=entry_hash,
717
+ regulated_context_profile=regulated_context_profile,
718
+ prior_receipts_profile=prior_receipts_profile,
719
+ reliance_context=reliance_context,
720
+ disclosure_profile=disclosure_profile,
721
+ )
722
+ except (KeyError, TypeError, ValueError) as exc:
723
+ raise CatalogueLoadError(
724
+ f"Cannot parse catalogue entry at {source_path}: {exc}"
725
+ ) from exc
726
+
727
+
728
+ def _scan_acts_directory(acts_path: Path) -> dict[str, CatalogueEntry]:
729
+ """Walk the acts directory and load all v2 and v3 entries.
730
+
731
+ Skipped:
732
+ - Files in any ``_deprecated`` subdirectory.
733
+ - Files matching ``*.test_vectors.json``.
734
+ - Files whose top-level ``schema`` is not in ``SCHEMA_DISCRIMINATORS``
735
+ (silently, as a permissive allowance for other JSON files in tree).
736
+
737
+ Raises:
738
+ CatalogueLoadError: If two entries share an ``act_type_id``.
739
+ """
740
+ entries: dict[str, CatalogueEntry] = {}
741
+
742
+ for json_path in sorted(acts_path.rglob("*.json")):
743
+ if "_deprecated" in json_path.parts:
744
+ continue
745
+ if json_path.name.endswith(".test_vectors.json"):
746
+ continue
747
+
748
+ try:
749
+ raw_bytes = json_path.read_bytes()
750
+ except OSError as exc:
751
+ logger.warning("Could not read %s: %s", json_path, exc)
752
+ continue
753
+
754
+ try:
755
+ data = json.loads(raw_bytes)
756
+ except json.JSONDecodeError as exc:
757
+ logger.warning("Skipping malformed JSON at %s: %s", json_path, exc)
758
+ continue
759
+
760
+ if not isinstance(data, dict) or data.get("schema") not in SCHEMA_DISCRIMINATORS:
761
+ # Could be a schema file, a README in JSON, etc. Not an error.
762
+ continue
763
+
764
+ entry_hash = "sha256:" + hashlib.sha256(raw_bytes).hexdigest()
765
+ entry = _parse_entry(data, str(json_path), entry_hash)
766
+
767
+ if entry.act_type_id in entries:
768
+ existing = entries[entry.act_type_id]
769
+ raise CatalogueLoadError(
770
+ f"Duplicate act_type_id {entry.act_type_id!r}: "
771
+ f"{json_path} conflicts with {existing.source_path}"
772
+ )
773
+
774
+ entries[entry.act_type_id] = entry
775
+
776
+ return entries
777
+
778
+
779
+ # ─────────────────────────────────────────────────────────────────
780
+ # PUBLIC LOADER
781
+ # ─────────────────────────────────────────────────────────────────
782
+
783
+ def load_catalogue(
784
+ acts_path: Optional[Path] = None,
785
+ *,
786
+ schema_path: Optional[Path] = None,
787
+ source_uri: Optional[str] = None,
788
+ git_commit: Optional[str] = None,
789
+ ) -> Catalogue:
790
+ """Load the catalogue from disk.
791
+
792
+ Args:
793
+ acts_path: Optional path to the ``catalogue/acts/`` directory. If
794
+ ``None``, resolves from ``$ACTPROOF_CATALOGUE_PATH`` or fallback
795
+ locations.
796
+ schema_path: Optional path to the schema JSON file. If ``None``,
797
+ looks for ``../../spec/schemas/act_catalogue_entry.v3.json``
798
+ relative to the acts path first, then falls back to
799
+ ``act_catalogue_entry.v2.json``.
800
+ source_uri: Optional external URI of the catalogue repository.
801
+ Stored on the resulting ``Catalogue`` for receipt provenance.
802
+ git_commit: Optional 40-character git SHA-1 of the catalogue at
803
+ load time. Stored on the resulting ``Catalogue`` for receipt
804
+ provenance.
805
+
806
+ Returns:
807
+ A ``Catalogue`` with all v2 and v3 entries indexed by ``act_type_id``.
808
+
809
+ Raises:
810
+ CatalogueLoadError: If the path cannot be resolved or contains
811
+ structural problems (duplicate act_type_ids, malformed entries).
812
+ """
813
+ resolved_acts = _resolve_acts_path(acts_path)
814
+ entries = _scan_acts_directory(resolved_acts)
815
+
816
+ # Compute schema_hash if we can find the schema file.
817
+ schema_hash = ""
818
+ actual_schema_path = schema_path or _resolve_schema_path(resolved_acts)
819
+ if actual_schema_path is not None and actual_schema_path.is_file():
820
+ schema_hash = hash_schema_file(actual_schema_path)
821
+ elif schema_path is not None:
822
+ # Caller passed a path explicitly; if it doesn't exist, that's an error.
823
+ raise CatalogueLoadError(
824
+ f"schema_path {schema_path!r} is not a file"
825
+ )
826
+ # If we couldn't find the schema, schema_hash stays empty. Callers
827
+ # populating CatalogueBinding will need to provide it some other way.
828
+
829
+ logger.info(
830
+ "Loaded %d catalogue entries from %s: %s",
831
+ len(entries), resolved_acts, sorted(entries.keys()),
832
+ )
833
+
834
+ return Catalogue(
835
+ entries=entries,
836
+ source_root=str(resolved_acts),
837
+ source_uri=source_uri,
838
+ git_commit=git_commit,
839
+ schema_hash=schema_hash,
840
+ )
841
+
842
+
843
+ # ─────────────────────────────────────────────────────────────────
844
+ # HASH HELPERS
845
+ # ─────────────────────────────────────────────────────────────────
846
+
847
+ def hash_entry_file(path: Path) -> str:
848
+ """Compute ``"sha256:..."`` of a catalogue entry file's raw bytes.
849
+
850
+ Hashes the raw file bytes, not canonical bytes. This matches the
851
+ git-pinned model: anyone with the catalogue at the pinned commit can
852
+ recompute the hash with ``sha256sum`` and get the same value. Sensitive
853
+ to file formatting (line endings, trailing newline), but stable when
854
+ the source of truth is a git tree.
855
+
856
+ Args:
857
+ path: Path to the entry JSON file.
858
+
859
+ Returns:
860
+ ``"sha256:"`` followed by 64 lowercase hex characters.
861
+
862
+ Raises:
863
+ OSError: If the file cannot be read.
864
+ """
865
+ return "sha256:" + hashlib.sha256(path.read_bytes()).hexdigest()
866
+
867
+
868
+ def hash_schema_file(path: Path) -> str:
869
+ """Compute ``"sha256:..."`` of the catalogue schema file's raw bytes.
870
+
871
+ Args:
872
+ path: Path to the schema JSON file.
873
+
874
+ Returns:
875
+ ``"sha256:"`` followed by 64 lowercase hex characters.
876
+
877
+ Raises:
878
+ OSError: If the file cannot be read.
879
+ """
880
+ return "sha256:" + hashlib.sha256(path.read_bytes()).hexdigest()
881
+
882
+
883
+ # ─────────────────────────────────────────────────────────────────
884
+ # MANIFEST VALIDATION
885
+ # ─────────────────────────────────────────────────────────────────
886
+
887
+ def validate_manifest(
888
+ manifest: Manifest, catalogue: Catalogue
889
+ ) -> list[ValidationIssue]:
890
+ """Validate a manifest's claim and evidence against its catalogue entry.
891
+
892
+ Performs seven checks (see module docstring for the full list).
893
+ Returns a list of issues; an empty list means the manifest is valid
894
+ against the loaded catalogue.
895
+
896
+ Catalogue conformance is one of several validation layers a complete
897
+ verifier runs. The others are: ``validate_manifest_shape`` (structural
898
+ well-formedness), and the receipt verifier (chain anchor, RFC 3161
899
+ token, evidence file hashes). This module covers the catalogue layer
900
+ only.
901
+
902
+ Args:
903
+ manifest: The manifest to validate.
904
+ catalogue: The loaded catalogue to validate against.
905
+
906
+ Returns:
907
+ A list of ``ValidationIssue``. Empty list means valid.
908
+ """
909
+ issues: list[ValidationIssue] = []
910
+
911
+ # 1. Act type known?
912
+ entry = catalogue.get(manifest.catalogue.act_type_id)
913
+ if entry is None:
914
+ issues.append(
915
+ ValidationIssue(
916
+ code="UNKNOWN_ACT_TYPE",
917
+ message=(
918
+ f"act_type_id {manifest.catalogue.act_type_id!r} is not "
919
+ f"in the loaded catalogue. Known act_type_ids: "
920
+ f"{sorted(catalogue.entries.keys())}"
921
+ ),
922
+ field="catalogue.act_type_id",
923
+ )
924
+ )
925
+ # Can't continue any other checks without an entry.
926
+ return issues
927
+
928
+ # 2. Entry version matches?
929
+ if entry.version != manifest.catalogue.entry_version:
930
+ issues.append(
931
+ ValidationIssue(
932
+ code="ENTRY_VERSION_MISMATCH",
933
+ message=(
934
+ f"manifest cites entry_version {manifest.catalogue.entry_version} "
935
+ f"but loaded catalogue has version {entry.version} for "
936
+ f"{manifest.catalogue.act_type_id}"
937
+ ),
938
+ field="catalogue.entry_version",
939
+ )
940
+ )
941
+
942
+ # 3. Entry hash matches?
943
+ if entry.entry_hash and entry.entry_hash != manifest.catalogue.entry_hash:
944
+ issues.append(
945
+ ValidationIssue(
946
+ code="ENTRY_HASH_MISMATCH",
947
+ message=(
948
+ f"manifest cites entry_hash {manifest.catalogue.entry_hash!r} "
949
+ f"but loaded catalogue entry hashes to {entry.entry_hash!r}. "
950
+ f"The manifest may have been issued against a different "
951
+ f"version of the catalogue (pinned at a different commit)."
952
+ ),
953
+ field="catalogue.entry_hash",
954
+ )
955
+ )
956
+
957
+ # 4. Schema hash matches?
958
+ if catalogue.schema_hash and catalogue.schema_hash != manifest.catalogue.schema_hash:
959
+ issues.append(
960
+ ValidationIssue(
961
+ code="SCHEMA_HASH_MISMATCH",
962
+ message=(
963
+ f"manifest cites schema_hash {manifest.catalogue.schema_hash!r} "
964
+ f"but loaded catalogue schema hashes to {catalogue.schema_hash!r}."
965
+ ),
966
+ field="catalogue.schema_hash",
967
+ )
968
+ )
969
+
970
+ # 5. Required claim fields all present and non-empty?
971
+ for required_field in entry.required_claim_fields:
972
+ value = manifest.claim.get(required_field)
973
+ if value is None:
974
+ issues.append(
975
+ ValidationIssue(
976
+ code="MISSING_REQUIRED_CLAIM_FIELD",
977
+ message=(
978
+ f"required claim field {required_field!r} is "
979
+ f"missing from manifest.claim"
980
+ ),
981
+ field=f"claim.{required_field}",
982
+ )
983
+ )
984
+ elif isinstance(value, str) and not value.strip():
985
+ issues.append(
986
+ ValidationIssue(
987
+ code="MISSING_REQUIRED_CLAIM_FIELD",
988
+ message=(
989
+ f"required claim field {required_field!r} is "
990
+ f"empty in manifest.claim"
991
+ ),
992
+ field=f"claim.{required_field}",
993
+ )
994
+ )
995
+
996
+ # 6. Required evidence labels all covered by at least one Evidence?
997
+ attached_labels = {e.label for e in manifest.evidence}
998
+ for required_label in entry.required_evidence_labels:
999
+ if required_label not in attached_labels:
1000
+ issues.append(
1001
+ ValidationIssue(
1002
+ code="MISSING_REQUIRED_EVIDENCE_LABEL",
1003
+ message=(
1004
+ f"required evidence label {required_label!r} has no "
1005
+ f"covering file in manifest.evidence. Attached labels: "
1006
+ f"{sorted(attached_labels)}"
1007
+ ),
1008
+ field="evidence",
1009
+ )
1010
+ )
1011
+
1012
+ # 7. Each evidence label is recognised by the entry?
1013
+ known_evidence_labels = set(entry.required_evidence_labels)
1014
+ # The signature_policy.supports list contains evidence labels for
1015
+ # externally produced signature artifacts; those are also recognised.
1016
+ known_evidence_labels.update(entry.signature_policy.supports)
1017
+ for i, ev in enumerate(manifest.evidence):
1018
+ if ev.label not in known_evidence_labels:
1019
+ issues.append(
1020
+ ValidationIssue(
1021
+ code="UNKNOWN_EVIDENCE_LABEL",
1022
+ message=(
1023
+ f"evidence[{i}].label {ev.label!r} is not declared "
1024
+ f"by entry {entry.act_type_id}. Known labels: "
1025
+ f"{sorted(known_evidence_labels)}"
1026
+ ),
1027
+ field=f"evidence[{i}].label",
1028
+ )
1029
+ )
1030
+
1031
+ return issues