PyPI - sum-engine - Versions diffs - 0.4.1__tar.gz → 0.6.0__tar.gz - Mend

sum-engine 0.4.1tar.gz → 0.6.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (119) hide show

{sum_engine-0.4.1 → sum_engine-0.6.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: sum-engine
-Version: 0.4.1
+Version: 0.6.0
 Summary: SUM — bidirectional knowledge distillation with optional cryptographic attestation. Pipe prose, get a CanonicalBundle (HMAC / Ed25519 / W3C VC 2.0), verify anywhere.
 Author: ototao
 License: Apache-2.0
@@ -25,9 +25,11 @@ Requires-Dist: cryptography>=41.0.0
 Requires-Dist: sympy>=1.12
 Provides-Extra: sieve
 Requires-Dist: spacy>=3.7.0; extra == "sieve"
+Provides-Extra: openai
+Requires-Dist: openai<3.0.0,>=1.40.0; extra == "openai"
+Requires-Dist: pydantic>=2.0.0; extra == "openai"
 Provides-Extra: llm
-Requires-Dist: openai<3.0.0,>=1.40.0; extra == "llm"
-Requires-Dist: pydantic>=2.0.0; extra == "llm"
+Requires-Dist: sum-engine[openai]; extra == "llm"
 Provides-Extra: anthropic
 Requires-Dist: anthropic>=0.97.0; extra == "anthropic"
 Requires-Dist: pydantic>=2.0.0; extra == "anthropic"
@@ -35,6 +37,9 @@ Provides-Extra: receipt-verify
 Requires-Dist: joserfc>=1.0.0; extra == "receipt-verify"
 Provides-Extra: mcp
 Requires-Dist: mcp>=1.0.0; extra == "mcp"
+Provides-Extra: research
+Requires-Dist: numpy>=1.24.0; extra == "research"
+Requires-Dist: scipy>=1.10.0; extra == "research"
 Provides-Extra: omni-format
 Requires-Dist: markitdown==0.1.5; extra == "omni-format"
 Provides-Extra: dev
@@ -48,7 +53,7 @@ Requires-Dist: build>=1.0.0; extra == "dev"
 Requires-Dist: hypothesis>=6.0.0; extra == "dev"
 Provides-Extra: all
 Requires-Dist: sum-engine[sieve]; extra == "all"
-Requires-Dist: sum-engine[llm]; extra == "all"
+Requires-Dist: sum-engine[openai]; extra == "all"
 Requires-Dist: sum-engine[anthropic]; extra == "all"
 Requires-Dist: sum-engine[receipt-verify]; extra == "all"
 Requires-Dist: sum-engine[mcp]; extra == "all"
@@ -106,7 +111,7 @@ A minimal Node verifier using `jose` + `canonicalize` is in [`docs/RENDER_RECEIP
 | Surface | Status | Verifies |
 |---|---|---|
-| `pip install 'sum-engine[sieve]'` — `sum attest` / `sum verify` / `sum render` / `sum resolve` / `sum ledger` / `sum inspect` / `sum schema` | shipped on `main` (PyPI 0.3.0 stale; 0.4.0 cut on the operator queue) | structural reconstruction; HMAC-SHA256 + Ed25519 signatures (W3C VC 2.0 `eddsa-jcs-2022`); bidirectional `sum attest` ↔ `sum render` symmetry from the shell |
+| `pip install 'sum-engine[sieve]'` — `sum attest` / `sum verify` / `sum render` / `sum resolve` / `sum ledger` / `sum inspect` / `sum schema` | shipped on PyPI ≥ 0.4.1 | structural reconstruction; HMAC-SHA256 + Ed25519 signatures (W3C VC 2.0 `eddsa-jcs-2022`); bidirectional `sum attest` ↔ `sum render` symmetry from the shell |
 | Cloudflare Worker at `sum-demo.ototao.workers.dev` | shipped | `/api/render` → tome + `render_receipt`; `/.well-known/jwks.json` → JWKS; `/api/qid` → Wikidata resolver |
 | Single-file browser demo (`single_file_demo/index.html`) | shipped | paste prose → in-browser attest → CanonicalBundle JSON; same bytes verify under `node standalone_verifier/verify.js` (Chrome / Firefox / Safari with WebCrypto Ed25519 support) |
 | Cross-runtime trust triangle | locked by CI (`make xruntime`) | K1 / K1-mw / K2 / K3 / K4 — Python ↔ Node ↔ Browser agree byte-for-byte on valid bundles. `make xruntime-adversarial` adds A1–A6 rejection-class equivalence. |
@@ -162,7 +167,7 @@ The reverse direction also runs under explicit slider control. The local path ac
 sum render --density 0.5 < bundle.json
 # → keeps the lex-prefix half of the axioms; @sliders header records what was requested
-sum render --length 0.9 --use-worker https://sum.ototao.com --json < bundle.json
+sum render --length 0.9 --use-worker https://sum-demo.ototao.workers.dev --json < bundle.json
 # → LLM-conditioned tome + signed render_receipt (sum.render_receipt.v1) on stdout
 ```
@@ -189,7 +194,7 @@ pip install 'sum-engine[mcp,sieve]'
 ### Calling SUM over HTTP
-The hosted Worker at `https://sum.ototao.com` exposes `/api/render`, `/api/complete`, `/api/qid`, and the `/.well-known/{jwks,revoked-kids}.json` verification surfaces. [`docs/API_REFERENCE.md`](docs/API_REFERENCE.md) is the wire spec — request/response shapes, error codes, the six-step receipt-verification flow, working Node + Python examples. Use this when the caller is a web app, mobile app, or server-side service; use the MCP server when the caller is a local LLM client.
+The hosted Worker at `https://sum-demo.ototao.workers.dev` exposes `/api/render`, `/api/complete`, `/api/qid`, and the `/.well-known/{jwks,revoked-kids}.json` verification surfaces. [`docs/API_REFERENCE.md`](docs/API_REFERENCE.md) is the wire spec — request/response shapes, error codes, the six-step receipt-verification flow, working Node + Python examples. Use this when the caller is a web app, mobile app, or server-side service; use the MCP server when the caller is a local LLM client.
 ---

{sum_engine-0.4.1 → sum_engine-0.6.0}/README.md RENAMED Viewed

@@ -48,7 +48,7 @@ A minimal Node verifier using `jose` + `canonicalize` is in [`docs/RENDER_RECEIP
 | Surface | Status | Verifies |
 |---|---|---|
-| `pip install 'sum-engine[sieve]'` — `sum attest` / `sum verify` / `sum render` / `sum resolve` / `sum ledger` / `sum inspect` / `sum schema` | shipped on `main` (PyPI 0.3.0 stale; 0.4.0 cut on the operator queue) | structural reconstruction; HMAC-SHA256 + Ed25519 signatures (W3C VC 2.0 `eddsa-jcs-2022`); bidirectional `sum attest` ↔ `sum render` symmetry from the shell |
+| `pip install 'sum-engine[sieve]'` — `sum attest` / `sum verify` / `sum render` / `sum resolve` / `sum ledger` / `sum inspect` / `sum schema` | shipped on PyPI ≥ 0.4.1 | structural reconstruction; HMAC-SHA256 + Ed25519 signatures (W3C VC 2.0 `eddsa-jcs-2022`); bidirectional `sum attest` ↔ `sum render` symmetry from the shell |
 | Cloudflare Worker at `sum-demo.ototao.workers.dev` | shipped | `/api/render` → tome + `render_receipt`; `/.well-known/jwks.json` → JWKS; `/api/qid` → Wikidata resolver |
 | Single-file browser demo (`single_file_demo/index.html`) | shipped | paste prose → in-browser attest → CanonicalBundle JSON; same bytes verify under `node standalone_verifier/verify.js` (Chrome / Firefox / Safari with WebCrypto Ed25519 support) |
 | Cross-runtime trust triangle | locked by CI (`make xruntime`) | K1 / K1-mw / K2 / K3 / K4 — Python ↔ Node ↔ Browser agree byte-for-byte on valid bundles. `make xruntime-adversarial` adds A1–A6 rejection-class equivalence. |
@@ -104,7 +104,7 @@ The reverse direction also runs under explicit slider control. The local path ac
 sum render --density 0.5 < bundle.json
 # → keeps the lex-prefix half of the axioms; @sliders header records what was requested
-sum render --length 0.9 --use-worker https://sum.ototao.com --json < bundle.json
+sum render --length 0.9 --use-worker https://sum-demo.ototao.workers.dev --json < bundle.json
 # → LLM-conditioned tome + signed render_receipt (sum.render_receipt.v1) on stdout
 ```
@@ -131,7 +131,7 @@ pip install 'sum-engine[mcp,sieve]'
 ### Calling SUM over HTTP
-The hosted Worker at `https://sum.ototao.com` exposes `/api/render`, `/api/complete`, `/api/qid`, and the `/.well-known/{jwks,revoked-kids}.json` verification surfaces. [`docs/API_REFERENCE.md`](docs/API_REFERENCE.md) is the wire spec — request/response shapes, error codes, the six-step receipt-verification flow, working Node + Python examples. Use this when the caller is a web app, mobile app, or server-side service; use the MCP server when the caller is a local LLM client.
+The hosted Worker at `https://sum-demo.ototao.workers.dev` exposes `/api/render`, `/api/complete`, `/api/qid`, and the `/.well-known/{jwks,revoked-kids}.json` verification surfaces. [`docs/API_REFERENCE.md`](docs/API_REFERENCE.md) is the wire spec — request/response shapes, error codes, the six-step receipt-verification flow, working Node + Python examples. Use this when the caller is a web app, mobile app, or server-side service; use the MCP server when the caller is a local LLM client.
 ---

{sum_engine-0.4.1 → sum_engine-0.6.0}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "sum-engine"
-version = "0.4.1"
+version = "0.6.0"
 description = "SUM — bidirectional knowledge distillation with optional cryptographic attestation. Pipe prose, get a CanonicalBundle (HMAC / Ed25519 / W3C VC 2.0), verify anywhere."
 readme = "README.md"
 license = { text = "Apache-2.0" }
@@ -39,10 +39,19 @@ dependencies = [
 [project.optional-dependencies]
 # Dependency-injection extras. Users pick the extractor they want:
 #   pip install sum-engine[sieve]   # deterministic, offline spaCy path
-#   pip install sum-engine[llm]     # OpenAI structured-output path
-#   pip install sum-engine[all]     # both, plus dev tooling
+#   pip install sum-engine[openai]  # OpenAI structured-output path
+#   pip install sum-engine[llm]     # alias for [openai] (legacy name)
+#   pip install sum-engine[all]     # everything, plus dev tooling
 sieve = ["spacy>=3.7.0"]
-llm = ["openai>=1.40.0,<3.0.0", "pydantic>=2.0.0"]
+# `[openai]` is the canonical, vendor-named extra; `[llm]` is kept as a
+# back-compat alias because it predates the multi-provider dispatcher
+# (Anthropic and OpenAI now have their own named extras). Both install
+# identical dependencies — the openai SDK plus pydantic for structured
+# outputs. The vendor adapter is at
+# sum_engine_internal/ensemble/llm_dispatch.py::OpenAIAdapter; the
+# render-path TS companion is worker/src/routes/render.ts::callOpenAI.
+openai = ["openai>=1.40.0,<3.0.0", "pydantic>=2.0.0"]
+llm = ["sum-engine[openai]"]
 # Anthropic Messages-API adapter (Claude family). Installed alongside
 # [llm] for the §2.5 frontier-LLM benchmark runs that compare
 # generator-side interventions across providers. The runner picks
@@ -60,6 +69,16 @@ receipt-verify = ["joserfc>=1.0.0"]
 # `mcp` package is the official Python SDK; FastMCP is its
 # decorator-style high-level API. See docs/MCP_INTEGRATION.md.
 mcp = ["mcp>=1.0.0"]
+# Research-grade modules under sum_engine_internal/research/. NOT on
+# the production install path; APIs may change between minor releases
+# without backwards-compatibility guarantees. Currently provides the
+# v1 sheaf-Laplacian hallucination detector grounded in Gebhart,
+# Hansen & Schrater (2023, AISTATS, arXiv:2110.03789) and the
+# sheaf-Laplacian theory of Hansen & Ghrist (2019). See
+# docs/SHEAF_HALLUCINATION_DETECTOR.md for the spec, including
+# verified blindspots (predicate-flip, off-graph fact-fabrication,
+# empty-render false negative).
+research = ["numpy>=1.24.0", "scipy>=1.10.0"]
 # Omni-format adapter. Markdown is the canonical pivot for the
 # attest pipeline: any input format -> markdown -> existing
 # extract/state/bundle path. Source URI anchors to the original
@@ -97,7 +116,7 @@ dev = [
 ]
 all = [
     "sum-engine[sieve]",
-    "sum-engine[llm]",
+    "sum-engine[openai]",
     "sum-engine[anthropic]",
     "sum-engine[receipt-verify]",
     "sum-engine[mcp]",

sum_engine-0.6.0/sum_cli/audit_log.py ADDED Viewed

@@ -0,0 +1,171 @@
+"""sum_cli.audit_log — universal audit-log streaming for compliance.
+When the ``SUM_AUDIT_LOG`` environment variable is set to a file path,
+every ``sum attest`` / ``sum verify`` / ``sum render`` operation
+appends a single JSONL row to that file describing what was done.
+This is the regime-agnostic foundation of the compliance primitives
+direction (Path 3). A specific regime (GDPR-Art-30, HIPAA-164.514,
+EU AI Act Annex IV, etc.) can be implemented as a downstream
+consumer of this audit log: tail the file, validate per-regime
+required fields, raise on policy violation. The audit log itself
+makes no regime-specific assumptions — it records *what happened*
+verbatim so any auditor can reconstruct the operation.
+Schema: ``sum.audit_log.v1`` — additive; new optional fields may
+appear in future minor versions; consumers should ignore unknown
+keys.
+Required fields per row:
+  - ``schema`` = ``"sum.audit_log.v1"``
+  - ``timestamp`` (ISO 8601 UTC, e.g. ``"2026-05-01T18:35:14.123Z"``)
+  - ``operation`` ∈ ``{"attest", "verify", "render"}``
+  - ``cli_version`` (the ``sum-cli`` version that produced the row)
+Operation-specific optional fields:
+  - attest:  ``source_uri``, ``state_integer_digits``, ``axiom_count``,
+             ``extractor``, ``signed`` (bool — Ed25519 attached?),
+             ``hmac`` (bool), ``branch``
+  - verify:  ``state_integer_digits``, ``axiom_count``, ``signatures``
+             ``{ed25519, hmac}``, ``branch``, ``ok`` (bool)
+  - render:  ``axiom_count_input``, ``mode`` (``"local-deterministic"``
+             or ``"worker"``), ``sliders``, ``render_receipt_kid`` (if
+             worker), ``worker_url`` (if worker)
+Failures (exit code != 0) emit a row with ``error`` field describing
+the failure class. Audit-log writes themselves never fail loudly —
+if ``SUM_AUDIT_LOG`` points at a non-writable path, the operation
+proceeds and the failure is reported on stderr only when ``--verbose``
+is set; the audit semantics fail-open by design (a non-functional
+audit destination should not break the trust loop).
+Env var precedence:
+  1. ``SUM_AUDIT_LOG`` env var (path to JSONL file; appended to)
+  2. unset → no audit logging
+The path may be ``-`` for stdout (rare; mostly useful for piping
+into another tool); otherwise treated as a file path with append-mode
+open.
+Concurrency: writes are O_APPEND on POSIX, so multiple sum
+processes writing to the same audit log produce a serialised
+ordering (atomic per write() up to PIPE_BUF on most systems —
+single-line JSONL records well under that bound). We write each
+row in one ``f.write()`` call to maximise atomicity.
+"""
+from __future__ import annotations
+import json
+import os
+import sys
+from datetime import datetime, timezone
+from typing import Any
+_AUDIT_LOG_SCHEMA = "sum.audit_log.v1"
+def _audit_log_path() -> str | None:
+    """Return the configured audit-log destination, or None if unset."""
+    p = os.environ.get("SUM_AUDIT_LOG")
+    if p is None or p == "":
+        return None
+    return p
+def _identity_fields() -> dict[str, Any]:
+    """Optional identity fields populated from env vars.
+    Sprint 4 of the intensification path to arXiv (PR #140). Closes
+    the PCI DSS Req 10.2.2 user-identification gap named in
+    ``docs/COMPLIANCE_PCI_DSS_4_REQ_10.md``. Three env vars are
+    consulted; each populated value becomes an optional field on
+    the audit-log row:
+      - ``SUM_AUDIT_USER_ID`` → ``user_id`` field. Per PCI Req 10.2.2,
+        "user identification" is the FIRST required field for each
+        audit-log event. Operators running SUM behind an
+        authenticating proxy populate this from the proxy's session
+        identity at process start.
+      - ``SUM_AUDIT_HOST_ID`` → ``host_id`` field. Multi-host
+        deployments (clusters, k8s pods, container fleets) use this
+        to attribute events to specific compute units.
+      - ``SUM_AUDIT_IP_ADDRESS`` → ``ip_address`` field. Network-
+        layer origination, useful for incident-response / forensic
+        analysis under PCI Req 10.2.2's "origination of event"
+        requirement.
+    All three are *optional*. An unset env var produces an absent
+    field (not a null value). The audit-log schema stays at
+    ``sum.audit_log.v1``; these are additive optional fields under
+    the existing schema's "consumers should ignore unknown keys"
+    convention. Backward compat: rows without these fields still
+    pass every existing validator.
+    PCI DSS validator R7 (``pci-dss-4-req-10.user-identification-
+    recommended``, added in the same PR as this function) treats
+    a missing ``user_id`` as a Req 10.2.2 violation in compliance-
+    mode runs.
+    """
+    out: dict[str, Any] = {}
+    for env_var, field in (
+        ("SUM_AUDIT_USER_ID", "user_id"),
+        ("SUM_AUDIT_HOST_ID", "host_id"),
+        ("SUM_AUDIT_IP_ADDRESS", "ip_address"),
+    ):
+        value = os.environ.get(env_var)
+        if value:  # non-empty (skips both None and empty string)
+            out[field] = value
+    return out
+def emit_audit_event(operation: str, payload: dict[str, Any]) -> None:
+    """Append a single JSONL row to the audit log if configured.
+    Fail-open: if the audit destination is unwritable, the operation
+    proceeds normally; the failure is silent unless verbose stderr
+    is enabled by the caller. Audit logging is *advisory* — the
+    canonical bundle / receipt still carries the load-bearing trust
+    properties; the audit log is for downstream compliance tools
+    to ingest.
+    Identity fields (``user_id`` / ``host_id`` / ``ip_address``) are
+    populated from env vars by :func:`_identity_fields`. They appear
+    on the row only when the corresponding env var is set; absent
+    env vars produce absent fields (not null values). The
+    ``payload`` argument takes precedence over identity fields if
+    a caller passes overlapping keys — useful for tests that want
+    to pin specific identity values without touching the
+    environment.
+    """
+    path = _audit_log_path()
+    if path is None:
+        return
+    from sum_cli import __version__
+    row = {
+        "schema": _AUDIT_LOG_SCHEMA,
+        "timestamp": datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%S.")
+                     + f"{datetime.now(timezone.utc).microsecond // 1000:03d}Z",
+        "operation": operation,
+        "cli_version": __version__,
+        # Identity fields from env, then payload last so the caller
+        # wins on overlap (intentional for test seams).
+        **_identity_fields(),
+        **payload,
+    }
+    line = json.dumps(row, separators=(",", ":")) + "\n"
+    try:
+        if path == "-":
+            sys.stdout.write(line)
+            sys.stdout.flush()
+        else:
+            with open(path, "a", encoding="utf-8") as f:
+                f.write(line)
+    except OSError:
+        # Fail-open: audit destination unavailable; do not block
+        # the operation. The canonical bundle / receipt remains the
+        # load-bearing trust artifact.
+        pass

{sum_engine-0.4.1 → sum_engine-0.6.0}/sum_cli/main.py RENAMED Viewed

@@ -420,6 +420,18 @@ def cmd_attest(args: argparse.Namespace) -> int:
             f"{len(bundle['state_integer'])} digits",
             file=sys.stderr,
         )
+    from sum_cli.audit_log import emit_audit_event
+    emit_audit_event("attest", {
+        "source_uri": source_uri,
+        "axiom_count": len(triples),
+        "state_integer_digits": len(bundle["state_integer"]),
+        "extractor": extractor,
+        "branch": args.branch,
+        "signed": "public_signature" in bundle,
+        "hmac": "signature" in bundle,
+        "input_format": format_sidecar.get("input_format"),
+    })
     return 0
@@ -844,6 +856,16 @@ def cmd_verify(args: argparse.Namespace) -> int:
         f"sum: ✓ verified {axioms} axiom(s), state integer matches ({marks})",
         file=sys.stderr,
     )
+    from sum_cli.audit_log import emit_audit_event
+    emit_audit_event("verify", {
+        "ok": True,
+        "axiom_count": axioms,
+        "state_integer_digits": len(claimed_state_str),
+        "branch": bundle.get("branch", "main"),
+        "signatures": {"hmac": hmac_status, "ed25519": ed25519_status},
+        "extraction": extraction,
+    })
     return 0
@@ -926,7 +948,15 @@ def _post_render_to_worker(
         endpoint,
         data=payload,
         method="POST",
-        headers={"content-type": "application/json"},
+        headers={
+            "content-type": "application/json",
+            # Cloudflare in front of hosted Workers (incl. the
+            # default sum-demo.ototao.workers.dev) rejects the
+            # default Python urllib User-Agent with HTTP 403 /
+            # error 1010. Identify ourselves as a known scripted
+            # client so the request passes the bot-detection layer.
+            "user-agent": f"sum-cli/{__version__} (+https://github.com/OtotaO/SUM)",
+        },
     )
     try:
         with urllib.request.urlopen(req, timeout=60) as resp:
@@ -1130,6 +1160,20 @@ def _emit_render_output(envelope: dict, args: argparse.Namespace) -> int:
             kid = envelope["render_receipt"].get("kid", "?")
             msg += f", receipt kid={kid}"
         print(msg, file=sys.stderr)
+    from sum_cli.audit_log import emit_audit_event
+    audit_payload: dict = {
+        "mode": envelope.get("mode"),
+        "axiom_count_input": envelope.get("axiom_count_input"),
+        "tome_chars": len(tome_text),
+        "sliders": envelope.get("sliders"),
+    }
+    if "render_receipt" in envelope:
+        receipt = envelope["render_receipt"]
+        audit_payload["render_receipt_kid"] = receipt.get("kid")
+        audit_payload["render_receipt_schema"] = receipt.get("schema")
+        audit_payload["worker_url"] = envelope.get("worker_url")
+    emit_audit_event("render", audit_payload)
     return 0
@@ -1499,6 +1543,172 @@ def cmd_schema(args: argparse.Namespace) -> int:
     return 0
+# ─── compliance ──────────────────────────────────────────────────────
+#
+# Per-regime validators consuming sum.audit_log.v1. The audit log is
+# regime-agnostic substrate; ``sum compliance check`` is the actionable
+# layer that turns it into a pass/fail verdict for a specific regime.
+# Exit code 0 when ok=true, 1 otherwise — pipe-friendly for CI gates.
+_COMPLIANCE_REGIMES: dict[str, str] = {
+    "eu-ai-act-article-12": (
+        "EU AI Act (Regulation (EU) 2024/1689) Article 12 — record-"
+        "keeping for high-risk AI systems. Pins per-row traceability "
+        "fields (timestamp, operation, cli_version), schema, and "
+        "operation-specific anchors (source_uri / axiom_count / "
+        "state_integer_digits / mode)."
+    ),
+    "gdpr-article-30": (
+        "GDPR (Regulation (EU) 2016/679) Article 30 — Records of "
+        "Processing Activities. Pins the per-row floor enabling Art "
+        "30 reporting (schema, timestamp, ISO-8601-UTC parseability, "
+        "processing-category indicator, processor identity). The "
+        "controller separately maintains record-set-scope metadata "
+        "(Art 30(1)(a)–(g) controller name, purposes, categories, "
+        "recipients, transfers, retention, security measures) "
+        "out-of-band; this validator does not pin those."
+    ),
+    "hipaa-164-312-b": (
+        "HIPAA Security Rule 45 CFR § 164.312(b) — Audit Controls "
+        "(Technical Safeguards). Pins the per-row form requirements "
+        "for an audit recording that supports examination of "
+        "activity (schema, timestamp, ISO-8601-UTC, activity type, "
+        "system component identification, per-operation examination "
+        "anchors). Deployment-scope obligations — auditor function, "
+        "retention, ePHI inventory — live outside this validator."
+    ),
+    "iso-27001-8-15": (
+        "ISO/IEC 27001:2022 Annex A.8.15 — Logging. Pins the per-"
+        "row form floor an audit log must satisfy for the recording "
+        "to count as a 'produced' log under A.8.15 (schema, "
+        "timestamp, ISO-8601-UTC, activity, system component). The "
+        "'stored', 'protected', 'analysed' verbs map to deployment-"
+        "scope obligations (file-system policy, access control, "
+        "SIEM integration) outside this validator."
+    ),
+    "soc-2-cc-7-2": (
+        "SOC 2 Trust Services Criteria CC7.2 — System Operations. "
+        "Pins the per-row form floor required to enable the "
+        "monitoring criterion (schema, timestamp, ISO-8601-UTC, "
+        "activity classification, system component identification). "
+        "The detection / monitoring / analysis activities themselves "
+        "(SIEM rules, alert routing, oncall rotations) live at "
+        "deployment scope outside this validator."
+    ),
+    "pci-dss-4-req-10": (
+        "PCI DSS v4.0 Requirement 10 — Log and Monitor All Access "
+        "to System Components and Cardholder Data. Pins per-row "
+        "content visible in audit_log.v1 against Req 10.2.2 (event "
+        "content) plus 10.6 (consistent time): schema, timestamp, "
+        "ISO-8601-UTC, event type, origination, event-content "
+        "completeness. NOT pinned: 10.1 organisational policies; "
+        "10.2.1.* specific event-type coverage; 10.2.2 user "
+        "identification (audit_log.v1 has no user_id field); 10.3 "
+        "log file protection; 10.4 log review process; 10.5 12-"
+        "month retention; 10.7 failure detection / alerting."
+    ),
+}
+def _compliance_validators():
+    """Return the regime → validate-callable dispatch table.
+    Built lazily to avoid importing compliance modules at CLI
+    startup (each module imports the dataclass infrastructure;
+    deferring keeps `sum --help` cold-start fast). Registered
+    regimes must match :data:`_COMPLIANCE_REGIMES` keys exactly —
+    a mismatch surfaces as a KeyError at dispatch, intentional
+    (better than silent fallthrough)."""
+    from sum_engine_internal.compliance import (  # local import — see docstring
+        eu_ai_act_article_12,
+        gdpr_article_30,
+        hipaa_164_312_b,
+        iso_27001_8_15,
+        pci_dss_4_req_10,
+        soc_2_cc_7_2,
+    )
+    return {
+        "eu-ai-act-article-12": eu_ai_act_article_12.validate,
+        "gdpr-article-30": gdpr_article_30.validate,
+        "hipaa-164-312-b": hipaa_164_312_b.validate,
+        "iso-27001-8-15": iso_27001_8_15.validate,
+        "soc-2-cc-7-2": soc_2_cc_7_2.validate,
+        "pci-dss-4-req-10": pci_dss_4_req_10.validate,
+    }
+def cmd_compliance_check(args: argparse.Namespace) -> int:
+    if args.regime not in _COMPLIANCE_REGIMES:
+        print(
+            f"sum: unknown regime {args.regime!r}. Known: "
+            f"{sorted(_COMPLIANCE_REGIMES)}",
+            file=sys.stderr,
+        )
+        return 2
+    if args.audit_log == "-":
+        text = sys.stdin.read()
+    else:
+        try:
+            with open(args.audit_log, "r", encoding="utf-8") as f:
+                text = f.read()
+        except OSError as e:
+            print(f"sum: cannot read --audit-log {args.audit_log!r}: {e}", file=sys.stderr)
+            return 2
+    rows: list[dict] = []
+    parse_errors: list[tuple[int, str]] = []
+    for i, line in enumerate(text.splitlines()):
+        line = line.strip()
+        if not line:
+            continue
+        try:
+            rows.append(json.loads(line))
+        except json.JSONDecodeError as e:
+            parse_errors.append((i, str(e)))
+    validators = _compliance_validators()
+    try:
+        validate = validators[args.regime]
+    except KeyError:
+        # Defensive — _COMPLIANCE_REGIMES gate above should have caught this.
+        # Reaching here means a regime is registered in _COMPLIANCE_REGIMES
+        # but missing from _compliance_validators() — a wiring drift.
+        print(
+            f"sum: regime {args.regime!r} listed but not wired "
+            f"(internal: missing from _compliance_validators dispatch)",
+            file=sys.stderr,
+        )
+        return 2
+    report = validate(rows)
+    out = report.to_dict()
+    if parse_errors:
+        # Surface parse errors alongside rule violations — both are
+        # compliance-relevant. A malformed JSONL line is itself a
+        # traceability defect.
+        out["parse_errors"] = [
+            {"line_index": idx, "error": msg} for idx, msg in parse_errors
+        ]
+    json.dump(out, sys.stdout, indent=2 if args.pretty else None)
+    sys.stdout.write("\n")
+    return 0 if (report.ok and not parse_errors) else 1
+def cmd_compliance_regimes(args: argparse.Namespace) -> int:
+    out = {
+        "schema": "sum.compliance_regimes.v1",
+        "regimes": [
+            {"id": rid, "description": desc}
+            for rid, desc in sorted(_COMPLIANCE_REGIMES.items())
+        ],
+    }
+    json.dump(out, sys.stdout, indent=2)
+    sys.stdout.write("\n")
+    return 0
 # ─── Argparse wiring ─────────────────────────────────────────────────
 def build_parser() -> argparse.ArgumentParser:
@@ -1878,6 +2088,60 @@ def build_parser() -> argparse.ArgumentParser:
     )
     p_schema.set_defaults(func=cmd_schema)
+    # compliance — regime validators consuming sum.audit_log.v1.
+    p_compliance = subparsers.add_parser(
+        "compliance",
+        help="Validate a sum.audit_log.v1 stream against a compliance regime.",
+        description=(
+            "Apply a per-regime validator to a sum.audit_log.v1 JSONL stream "
+            "and emit a sum.compliance_report.v1 verdict. The audit log is "
+            "regime-agnostic substrate; this verb is the actionable layer "
+            "that turns it into a compliance-grade pass/fail."
+        ),
+    )
+    p_compliance_sub = p_compliance.add_subparsers(
+        dest="compliance_cmd", required=True, metavar="<compliance-cmd>",
+    )
+    p_comp_check = p_compliance_sub.add_parser(
+        "check",
+        help="Validate an audit-log stream against a regime; emit a JSON report.",
+        description=(
+            "Read a sum.audit_log.v1 JSONL stream from --audit-log (or stdin "
+            "if '-'), validate against the named regime, emit a "
+            "sum.compliance_report.v1 JSON object on stdout. Exit code is "
+            "0 when ok=true, 1 otherwise — pipe-friendly for CI gates."
+        ),
+    )
+    p_comp_check.add_argument(
+        "--regime",
+        required=True,
+        choices=["eu-ai-act-article-12"],
+        help="Compliance regime to validate against.",
+    )
+    p_comp_check.add_argument(
+        "--audit-log",
+        required=True,
+        help="Path to a sum.audit_log.v1 JSONL file ('-' for stdin).",
+    )
+    p_comp_check.add_argument(
+        "--pretty",
+        action="store_true",
+        help="Pretty-print the report JSON.",
+    )
+    p_comp_check.set_defaults(func=cmd_compliance_check)
+    p_comp_regimes = p_compliance_sub.add_parser(
+        "regimes",
+        help="List available compliance regimes.",
+        description=(
+            "Emit the set of regime identifiers this CLI can validate "
+            "against. Adding a new regime appends to this list; "
+            "existing identifiers are stable."
+        ),
+    )
+    p_comp_regimes.set_defaults(func=cmd_compliance_regimes)
     return parser

sum-engine 0.4.1__tar.gz → 0.6.0__tar.gz

sum-engine 0.4.1tar.gz → 0.6.0tar.gz