npm - @occasiolabs/occasio - Versions diffs - 0.8.1 - Mend

@occasiolabs/occasio 0.8.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (92) hide show

package/LICENSE +202 -0
package/NOTICE +10 -0
package/README.md +216 -0
package/bin/occasio-mcp.js +5 -0
package/bin/occasio.js +2 -0
package/bin/supervisor/README.md +90 -0
package/bin/supervisor/com.occasio.proxy.plist.template +36 -0
package/bin/supervisor/install-windows-task.ps1 +48 -0
package/bin/supervisor/occasio.service +18 -0
package/docs/AUDIT.md +120 -0
package/docs/attest_verify.py +283 -0
package/docs/audit_walker.py +65 -0
package/docs/canonicalize.py +99 -0
package/docs/compliance-mapping.md +93 -0
package/docs/demos/mcp-block.md +148 -0
package/docs/edr-calibration.md +73 -0
package/docs/edr-demo.md +83 -0
package/docs/python-verifier.md +74 -0
package/docs/reference-pipeline.md +140 -0
package/package.json +69 -0
package/policy-templates/dev-default.yml +84 -0
package/policy-templates/finance.yml +61 -0
package/policy-templates/strict.yml +49 -0
package/schemas/agent-attestation-v1.json +190 -0
package/schemas/occasio-policy.schema.json +99 -0
package/spec/agent-attestation/v1/README.md +137 -0
package/src/adapters/claude-code.js +518 -0
package/src/adapters/cline.js +161 -0
package/src/adapters/computer-use-cli.js +198 -0
package/src/adapters/computer-use.js +227 -0
package/src/analyzer.js +170 -0
package/src/anomaly/cli.js +143 -0
package/src/anomaly/detectors/deny-rate.js +84 -0
package/src/anomaly/detectors/file-read-volume.js +109 -0
package/src/anomaly/detectors/secret-redact-rate.js +107 -0
package/src/anomaly/detectors/unknown-tool-input.js +83 -0
package/src/anomaly/index.js +169 -0
package/src/attest/canonicalize.js +97 -0
package/src/attest/index.js +355 -0
package/src/attest/run-slice.js +57 -0
package/src/attest/sign.js +186 -0
package/src/attest/verify.js +192 -0
package/src/audit/errors.js +21 -0
package/src/audit/input-normalizer.js +121 -0
package/src/audit/jsonl-auditor.js +178 -0
package/src/audit/verifier.js +152 -0
package/src/baseline.js +507 -0
package/src/boundary.js +238 -0
package/src/budget.js +42 -0
package/src/classifier.js +115 -0
package/src/context-budget.js +77 -0
package/src/core/boundary-event.js +75 -0
package/src/core/decision.js +61 -0
package/src/core/pipeline.js +66 -0
package/src/core/tool-names.js +105 -0
package/src/dashboard.js +892 -0
package/src/demo/README.md +31 -0
package/src/demo/anomalies-demo.js +211 -0
package/src/demo/attest-demo.js +198 -0
package/src/distiller.js +155 -0
package/src/embeddings.json +72 -0
package/src/executor/dispatcher.js +230 -0
package/src/harness.js +817 -0
package/src/index.js +1711 -0
package/src/inspect.js +329 -0
package/src/interceptor.js +1198 -0
package/src/lao.js +185 -0
package/src/lao_prep.py +119 -0
package/src/ledger.js +209 -0
package/src/mcp-experiment.js +140 -0
package/src/mcp-normalize.js +139 -0
package/src/mcp-server.js +320 -0
package/src/outbound-policy.js +433 -0
package/src/policy/built-in-classifiers.js +78 -0
package/src/policy/doctor.js +226 -0
package/src/policy/engine.js +339 -0
package/src/policy/init.js +153 -0
package/src/policy/loader.js +448 -0
package/src/policy/rules-default.js +36 -0
package/src/policy/shell-path.js +135 -0
package/src/policy/show.js +196 -0
package/src/policy/validate.js +310 -0
package/src/preflight/cli.js +164 -0
package/src/preflight/miner.js +329 -0
package/src/proxy/agent-router.js +93 -0
package/src/redteam.js +428 -0
package/src/replay.js +446 -0
package/src/report/index.js +224 -0
package/src/runtime.js +595 -0
package/src/scanner/index.js +49 -0
package/src/selftest.js +192 -0
package/src/session.js +36 -0

package/docs/AUDIT.md ADDED Viewed

@@ -0,0 +1,120 @@
+# Occasio — Audit Log Format and Independent Verification
+**Audience.** Security / compliance reviewers and platform engineers who need to verify Occasio's audit trail without trusting Occasio's own verifier.
+**Promise.** Every governed tool call writes one row to `~/.occasio/pipeline-events.jsonl`. Each row is hash-chained to the previous row using SHA-256, starting from a fixed genesis sentinel. Any post-hoc edit, reordering, or deletion within the chain is detectable by re-walking the file. This document specifies the row format and the canonical-serialization rules precisely enough that an independent walker (a small Python script, included) reproduces the verification end-to-end.
+---
+## 1. Row format
+Each line of `pipeline-events.jsonl` is a UTF-8 JSON object. A row is built in **this exact field order**:
+```
+ts, event_id, session_id, run_id,
+agent, protocol, direction, kind,
+tool_name, tool_inputs,
+action, reason, policy_source, executor, transform,
+result_kind, exit_code, secrets_redacted, distilled, tokens_saved,
+prev_hash, hash
+```
+Field semantics:
+| Field | Type | Notes |
+|---|---|---|
+| `ts` | string (ISO-8601) | Event timestamp from the boundary event. |
+| `event_id` | string (UUID) | Unique per event. |
+| `session_id` | string | Stable per Occasio session. |
+| `run_id` | string | Stable per agent run. |
+| `agent` | string | Canonical agent id (e.g. `claude-code`). |
+| `protocol` | string | Wire protocol (e.g. `anthropic-http`). |
+| `direction` | string | `inbound` (agent → cloud) or `outbound`. |
+| `kind` | string | `tool_call`, `request`, etc. |
+| `tool_name` | string | Canonical tool name (e.g. `read_file`). |
+| `tool_inputs` | object \| absent | Normalized inputs (see `src/audit/input-normalizer.js`). Absent means the tool's inputs are intentionally not logged. |
+| `action` | string | `LOCAL`, `PASS`, `BLOCK`, or `TRANSFORM`. |
+| `reason` | string | Reason code from the policy engine. |
+| `policy_source` | string | `default` or `user`. |
+| `executor` | string \| absent | Where the action ran (e.g. `native`). |
+| `transform` | string \| absent | Transform applied, if any. |
+| `result_kind` | string | `local`, `pass`, `block`, `transform`, or `unknown`. |
+| `exit_code` | number \| absent | Non-zero on local execution failure. |
+| `secrets_redacted` | number \| absent | Count of secrets redacted in the result. |
+| `distilled` | bool \| absent | Whether output was distilled. |
+| `tokens_saved` | number \| absent | Tokens saved by distillation. |
+| `prev_hash` | string (64-hex) | Hash of the previous row, or genesis on the first row. |
+| `hash` | string (64-hex) | SHA-256 of the row's canonical serialization with `hash` removed. |
+Fields whose value would be `undefined` (in JS) or `None` (in Python) are **omitted** from the serialized row, not emitted with a null value. This matches V8's `JSON.stringify` default behavior.
+### Row kinds
+`kind` distinguishes what an audit row records. As of v0.6.6 there are two:
+| `kind` | When it fires | Semantics |
+|---|---|---|
+| `tool_call` | Every governed tool call (Claude Code or MCP) | `tool_inputs` is per-tool (file path, glob, count). `action` is one of `LOCAL`/`PASS`/`BLOCK`/`TRANSFORM`. `result_kind` is `local`/`pass`/`block`/`transform`. |
+| `policy_loaded` | Process startup, and on every policy-file edit (hot-reload) | `tool_inputs` is `{ policy_hash, policy_path, version }`. `tool_name` is the placeholder string `"policy_loaded"`. `action` is `"INFO"`. `reason` is `"policy-loaded"`. **`result_kind` is omitted** — a policy-load event has no dispatcher Result. |
+The `policy_loaded` row binds the audit chain to a specific policy file's bytes: a buyer can prove not just "what was blocked" but "under which exact `policy.yml` the block was decided." Because the hash is over the raw file bytes (not the normalized policy object), comments and whitespace count — the hash matches whatever a reviewer reads in source control.
+## 2. Genesis sentinel
+The `prev_hash` of the first row in a chain is:
+```
+0000000000000000000000000000000000000000000000000000000000000000
+```
+(64 zero hex digits.)
+## 3. Hash algorithm
+For each row:
+1. Take the row object.
+2. Remove the `hash` field.
+3. Serialize **in insertion order** to a UTF-8 string with no whitespace between tokens, no key sorting, and non-ASCII characters emitted literally. (V8 `JSON.stringify` default; equivalent to Python `json.dumps(d, separators=(",", ":"), ensure_ascii=False)` over a Python 3.7+ dict.)
+4. Compute the lowercase hex SHA-256 of the resulting bytes.
+That is the value of `hash`. The `prev_hash` of the next row equals this `hash`.
+## 4. Independent walker
+A standalone Python script, [`audit_walker.py`](audit_walker.py), implements the verification with no Occasio dependencies — only `hashlib`, `json`, `sys` from the standard library. To run it:
+```sh
+python3 docs/audit_walker.py ~/.occasio/pipeline-events.jsonl
+```
+Expected output for an intact chain:
+```
+OK: 31 rows verified
+```
+If any row's `prev_hash` does not match the previous row's `hash`, or any row's recomputed hash does not match its stored `hash`, the script exits non-zero with a `MISMATCH at line N: …` message identifying the first inconsistency.
+## 5. Parity with Occasio's own verifier
+Occasio ships its own verifier (`occasio audit verify`). For audit credibility, both must agree on the same file. Parity is checked at every release; v0.6.4 is verified to agree on the maintainer's 31-row reference log.
+If you find a row where `audit_walker.py` and `occasio audit verify` disagree, that is a bug. Open an issue with the row line number and we will treat it as audit-credibility-critical (i.e. fix-before-next-release).
+## 6. What this proves and does not prove
+**Proves.** No row in the chain has been edited after the fact. No row has been removed from the middle of the chain. No row has been reordered.
+**Does not prove.**
+- That no rows were *omitted* — i.e. that the proxy was running and recording during every session in which it should have been. Gaps in time are visible in the `ts` field, but proving "no governed action escaped the log" requires comparing the audit log against an external record of agent activity. For pilots that need this guarantee, ship the audit rows offsite (SIEM, S3, append-only file) on a tail cadence.
+- ~~That the proxy was running with the policy file you expected.~~ **(Resolved in v0.6.6.)** Every process startup and every hot-reload appends a `policy_loaded` row carrying the SHA-256 of the active policy file's bytes; subsequent tool-call rows are bound to the most recent `policy_loaded` row by chain position. To verify "this BLOCK happened under this exact policy file": (1) find the BLOCK row, (2) walk backward to the most recent `policy_loaded` row, (3) compare its `tool_inputs.policy_hash` to a SHA-256 of the file you intend to compare against. The walker in `audit_walker.py` will accept both `kind` values without modification.
+- That **multiple processes** writing to the same audit file did not interleave. The Claude Code proxy and the MCP server each emit their own `policy_loaded` rows, which is correct, but they share `pipeline-events.jsonl` under a single-writer assumption documented in v0.6.5's CHANGELOG. Concurrent writers on Windows can interleave; the chain detects the corruption but cannot repair it.
+- That a row written *during* a write outage was not lost. v0.6.4 aborts the proxy with exit code 1 when an audit append fails, so a successful tool dispatch cannot coexist with a missing row in steady state. The combination of (a) fail-fatal audit writes and (b) a supervisor that restarts the proxy is the operational guarantee.
+## 7. Stability commitment
+The audit row schema and field-order list in §1 are part of Occasio's stable surface. They will not change incompatibly across v0.6.x. Any future field will be added in a way that does not invalidate existing rows or re-walks of the chain.
+`audit_walker.py` in this repository is the canonical reference. If your verifier produces different bytes on the same input, your verifier is wrong, not the spec.

package/docs/attest_verify.py ADDED Viewed

@@ -0,0 +1,283 @@
+#!/usr/bin/env python3
+"""
+attest_verify.py — independent Python verifier for Occasio's
+AI-Agent Behavioral Attestation v1 predicate.
+Mirrors src/attest/verify.js, but written for an auditor whose
+environment is Python-only and who refuses to trust Occasio's
+own verifier to certify Occasio's own output. Three independent
+checks, in order, each must pass:
+    1. Sigstore signature  (Fulcio cert chain + Rekor inclusion)
+    2. DSSE payload ↔ attestation predicate canonical-byte equivalence
+    3. Audit-chain integrity end-to-end + first/last hash containment
+Step 1 is delegated to ``sigstore-python`` when available. When the
+library is not installed the step is marked "skipped (install
+sigstore-python)" rather than silently passing; the auditor decides
+whether to install it or to accept a partial verification.
+Usage::
+    python3 attest_verify.py <attestation.json> [--bundle <path>]
+                             [--chain <path>]
+Exit code 0 when every (non-skipped) check passes, 1 otherwise.
+Companion files in this directory:
+    canonicalize.py   RFC 8785 subset (kept in lockstep with the
+                      Node and browser implementations)
+    audit_walker.py   the audit-chain walker, reused for step 3
+"""
+from __future__ import annotations
+import argparse
+import base64
+import json
+import os
+import sys
+from typing import Any
+# Allow `import canonicalize` / `import audit_walker` when invoked
+# directly from this folder.
+sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
+from canonicalize import canonicalize  # noqa: E402
+import audit_walker  # noqa: E402
+PREDICATE_TYPE = (
+    "https://github.com/occasiolabs/occasio/spec/agent-attestation/v1"
+)
+DSSE_PAYLOAD_TYPE = "application/vnd.in-toto+json"
+def _read_json(path: str) -> Any:
+    with open(path, "r", encoding="utf-8") as fh:
+        return json.load(fh)
+def _check_sigstore(bundle: dict) -> tuple[bool, str | None, str]:
+    """Try to verify the Sigstore bundle. Returns (passed, detail, note).
+    ``note`` is one of: 'verified', 'skipped', or an error string.
+    The auditor can decide whether to treat a 'skipped' as a failure.
+    """
+    try:
+        from sigstore.verify import Verifier, policy  # type: ignore
+        from sigstore.models import Bundle  # type: ignore
+    except ImportError:
+        return (
+            False,
+            "sigstore-python not installed (pip install sigstore)",
+            "skipped",
+        )
+    try:
+        bundle_obj = Bundle.from_json(json.dumps(bundle))
+        verifier = Verifier.production()
+        # We do not pin the identity here — the auditor's policy may
+        # require workflow-ref pinning. For the reference verifier we
+        # accept any Fulcio cert and let the audit-chain step bind
+        # the predicate to a concrete run_id.
+        verifier.verify_dsse(bundle_obj, policy.UnsafeNoOp())
+        return True, None, "verified"
+    except Exception as exc:  # noqa: BLE001
+        return False, str(exc), "error"
+def _check_payload_equivalence(
+    attestation: dict, bundle: dict
+) -> tuple[bool, str | None]:
+    env = bundle.get("dsseEnvelope") or bundle.get("dsse_envelope")
+    if not env or not env.get("payload"):
+        return False, "bundle missing dsseEnvelope.payload"
+    if env.get("payloadType") != DSSE_PAYLOAD_TYPE:
+        return False, f"unexpected payloadType: {env.get('payloadType')!r}"
+    try:
+        payload_bytes = base64.b64decode(env["payload"])
+        statement = json.loads(payload_bytes)
+    except Exception as exc:  # noqa: BLE001
+        return False, f"cannot decode DSSE payload: {exc}"
+    if statement.get("predicateType") != PREDICATE_TYPE:
+        return False, f"unexpected predicateType: {statement.get('predicateType')!r}"
+    expected = {k: v for k, v in attestation.items() if k != "signature"}
+    if canonicalize(statement.get("predicate")) != canonicalize(expected):
+        return False, "predicate differs from DSSE payload predicate"
+    return True, None
+def _check_audit_chain(
+    attestation: dict, chain_path: str | None
+) -> tuple[bool, str | None]:
+    chain_file = chain_path or attestation.get("audit_chain", {}).get("chain_file")
+    if not chain_file:
+        return False, "no chain_file in attestation and --chain not provided"
+    if not os.path.exists(chain_file):
+        return False, f"chain file not found: {chain_file}"
+    # audit_walker exits 0/1 based on integrity; reuse its internals
+    # to also capture the first/last hash positions.
+    first_target = attestation.get("audit_chain", {}).get("first_hash")
+    last_target = attestation.get("audit_chain", {}).get("last_hash")
+    if not first_target or not last_target:
+        return False, "attestation missing first_hash/last_hash"
+    prev = audit_walker.GENESIS
+    chained = 0
+    first_idx = -1
+    last_idx = -1
+    with open(chain_file, "r", encoding="utf-8") as fh:
+        for lineno, raw in enumerate(fh, 1):
+            line = raw.rstrip("\n")
+            if not line:
+                continue
+            row = json.loads(line)
+            stored = row.pop("hash", None)
+            if not isinstance(stored, str) or len(stored) != 64:
+                continue
+            if row.get("prev_hash") != prev:
+                return False, f"chain broken at line {lineno}"
+            recomputed = audit_walker.hashlib.sha256(
+                audit_walker.canonical_serialize(row)
+            ).hexdigest()
+            if recomputed != stored:
+                return False, f"hash mismatch at line {lineno}"
+            prev = stored
+            chained += 1
+            if stored == first_target and first_idx == -1:
+                first_idx = lineno
+            if stored == last_target:
+                last_idx = lineno
+    if first_idx == -1:
+        return False, "first_hash not found in chain"
+    if last_idx == -1:
+        return False, "last_hash not found in chain"
+    if last_idx < first_idx:
+        return False, "last_hash precedes first_hash in chain"
+    return True, f"chain_length={chained}, slice rows {first_idx}..{last_idx}"
+def verify(
+    attestation_path: str,
+    bundle_path: str | None = None,
+    chain_path: str | None = None,
+) -> dict:
+    """Run all three checks and return a structured result."""
+    if not bundle_path:
+        if attestation_path.endswith(".json"):
+            bundle_path = attestation_path[:-5] + ".sigstore.json"
+        else:
+            bundle_path = attestation_path + ".sigstore.json"
+    attestation = _read_json(attestation_path)
+    if not os.path.exists(bundle_path):
+        # An unsigned attestation is still verifiable for the payload
+        # equivalence and chain steps — surface that as 'skipped' on
+        # step 1 rather than erroring out.
+        bundle = None
+    else:
+        bundle = _read_json(bundle_path)
+    checks = []
+    if bundle is None:
+        checks.append({
+            "name": "sigstore signature",
+            "ok": False,
+            "note": "skipped",
+            "detail": "no Sigstore bundle file alongside attestation",
+        })
+    else:
+        ok, detail, note = _check_sigstore(bundle)
+        checks.append({
+            "name": "sigstore signature",
+            "ok": ok,
+            "note": note,
+            "detail": detail,
+        })
+    if bundle is None:
+        checks.append({
+            "name": "bundle payload matches attestation",
+            "ok": False,
+            "note": "skipped",
+            "detail": "no bundle to compare against",
+        })
+    else:
+        ok, detail = _check_payload_equivalence(attestation, bundle)
+        checks.append({
+            "name": "bundle payload matches attestation",
+            "ok": ok,
+            "note": "verified" if ok else "failed",
+            "detail": detail,
+        })
+    ok, detail = _check_audit_chain(attestation, chain_path)
+    checks.append({
+        "name": "audit chain integrity",
+        "ok": ok,
+        "note": "verified" if ok else "failed",
+        "detail": detail,
+    })
+    # Overall pass: every check is ok=True. Skipped counts as not-ok
+    # so the caller cannot pretend a partial verification was a full
+    # one. The detail line tells the auditor what to install to lift
+    # the skip.
+    overall = all(c["ok"] for c in checks)
+    return {"ok": overall, "checks": checks}
+def _render(result: dict) -> int:
+    for c in result["checks"]:
+        mark = "OK" if c["ok"] else ("SKIP" if c["note"] == "skipped" else "FAIL")
+        line = f"  [{mark:>4}] {c['name']}"
+        if c.get("detail"):
+            line += f"  ({c['detail']})"
+        print(line)
+    print()
+    print("PASS" if result["ok"] else "FAIL")
+    return 0 if result["ok"] else 1
+def main() -> int:
+    parser = argparse.ArgumentParser(
+        description=(
+            "Independent Python verifier for Occasio Agent Attestation v1."
+        )
+    )
+    parser.add_argument("attestation", help="Path to attestation.json")
+    parser.add_argument(
+        "--bundle",
+        help=(
+            "Path to Sigstore bundle (default: <attestation>.sigstore.json)"
+        ),
+    )
+    parser.add_argument(
+        "--chain",
+        help=(
+            "Path to audit chain file. Default: read chain_file from the "
+            "attestation."
+        ),
+    )
+    parser.add_argument(
+        "--json",
+        action="store_true",
+        help="Emit JSON result instead of human-readable lines.",
+    )
+    args = parser.parse_args()
+    result = verify(args.attestation, args.bundle, args.chain)
+    if args.json:
+        print(json.dumps(result, indent=2))
+        return 0 if result["ok"] else 1
+    return _render(result)
+if __name__ == "__main__":
+    sys.exit(main())

package/docs/audit_walker.py ADDED Viewed

@@ -0,0 +1,65 @@
+#!/usr/bin/env python3
+"""
+Independent walker for Occasio's pipeline-events.jsonl audit log.
+Re-walks the SHA-256 hash chain without using any Occasio code, so the
+audit-trail integrity claim does not depend on trusting Occasio's own
+verifier. See docs/AUDIT.md for the row schema and the canonical-
+serialization rules this script implements.
+Usage:
+    python3 audit_walker.py ~/.occasio/pipeline-events.jsonl
+Exit code 0 on success, 1 on first inconsistency or I/O error.
+"""
+import hashlib
+import json
+import sys
+GENESIS = "0" * 64
+def canonical_serialize(row_without_hash: dict) -> bytes:
+    # Mirrors V8's JSON.stringify with default options:
+    #   - no whitespace between tokens
+    #   - non-ASCII characters emitted literally (ensure_ascii=False)
+    #   - keys in insertion order (Python 3.7+ dict guarantees this)
+    return json.dumps(
+        row_without_hash,
+        separators=(",", ":"),
+        ensure_ascii=False,
+    ).encode("utf-8")
+def walk(path: str) -> int:
+    prev_hash = GENESIS
+    chained = 0
+    with open(path, "r", encoding="utf-8") as fh:
+        for lineno, raw in enumerate(fh, 1):
+            line = raw.rstrip("\n")
+            if not line:
+                continue
+            row = json.loads(line)
+            stored_hash = row.pop("hash", None)
+            # Legacy rows (pre-hash-chain) have no hash field — skip silently.
+            if not isinstance(stored_hash, str) or len(stored_hash) != 64:
+                continue
+            if row.get("prev_hash") != prev_hash:
+                print(f"MISMATCH at line {lineno}: prev_hash chain broken", file=sys.stderr)
+                return 1
+            recomputed = hashlib.sha256(canonical_serialize(row)).hexdigest()
+            if recomputed != stored_hash:
+                print(f"MISMATCH at line {lineno}: stored hash {stored_hash} != recomputed {recomputed}", file=sys.stderr)
+                return 1
+            prev_hash = stored_hash
+            chained += 1
+    print(f"OK: {chained} rows verified")
+    return 0
+if __name__ == "__main__":
+    if len(sys.argv) != 2:
+        print("usage: audit_walker.py <pipeline-events.jsonl>", file=sys.stderr)
+        sys.exit(2)
+    sys.exit(walk(sys.argv[1]))

package/docs/canonicalize.py ADDED Viewed

@@ -0,0 +1,99 @@
+"""
+canonicalize.py — RFC 8785 subset for byte-stable JSON serialisation.
+Companion to docs/audit_walker.py and docs/attest_verify.py: lets an
+auditor in a Python-only environment re-verify a Occasio
+attestation against the producer's canonical form without trusting
+the producer's code.
+Must stay byte-identical to src/attest/canonicalize.js and the
+inline copy in integrations/attest-view/viewer.js. The three
+implementations exist so the schema is provably language-independent;
+diverging them defeats the point.
+Cross-language invariant (load-bearing):
+    JavaScript has a single ``number`` type. ``JSON.parse('1.0')``
+    yields the integer 1; ``JSON.stringify(1)`` emits ``'1'``.
+    Python distinguishes int from float: ``json.loads('1.0')`` yields
+    ``float(1.0)``; ``json.dumps(1.0)`` emits ``'1.0'``. If we silently
+    accepted floats, the JS verifier and the Python verifier would
+    canonicalize the same JSON file to different bytes — silent
+    byte-equivalence breakage. This module:
+      - rejects non-integer floats (e.g. 1.5) with a clear error
+      - coerces integer-valued floats (e.g. 1.0) to the integer
+        representation so that a Python parse of ``"1.0"`` and a JS
+        parse of ``"1.0"`` canonicalize identically
+    If a future schema requires decimal precision, encode it as a
+    string. The canonicalize boundary stays integer-only.
+Deviations from strict RFC 8785 (documented, intentional):
+    - Float rejection above (instead of RFC 8785's prescribed form).
+      Load-bearing for cross-language byte-equivalence.
+    - Lone-surrogate handling matches Python json.dumps (escapes
+      via \\uXXXX). JCS specifies the same.
+"""
+from __future__ import annotations
+import json
+from typing import Any
+def canonicalize(value: Any) -> str:
+    """Return the canonical-JSON string for ``value``.
+    Rules:
+      - object keys sorted lexicographically by UTF-16 code unit
+        (Python's default ``sorted`` on strs is UTF-16-equivalent for
+        the BMP, which covers every key in the v1 schema)
+      - ``None`` ``True`` ``False`` map to ``null``/``true``/``false``
+      - object members whose value is ``None`` are kept (they encode
+        explicit nullable fields like ``policy.version``); members
+        absent from the dict are not invented
+      - arrays preserve order
+      - rejects ``float('nan')``/``inf``, callables, types,
+        non-string keys
+    Raises ``ValueError`` on rejected inputs.
+    """
+    if value is None:
+        return "null"
+    if isinstance(value, bool):
+        return "true" if value else "false"
+    if isinstance(value, int):
+        return str(value)
+    if isinstance(value, float):
+        if value != value or value in (float("inf"), float("-inf")):
+            raise ValueError("canonicalize: non-finite number")
+        # Cross-language invariant: a JSON literal like "1.0" parses to
+        # int(1) in JavaScript but float(1.0) in Python. Coerce the
+        # integer-valued case so both implementations canonicalize to
+        # the same bytes. Reject genuine non-integer floats — see the
+        # module docstring for the schema-design rationale.
+        if not value.is_integer():
+            raise ValueError(
+                f"canonicalize: non-integer number {value} — "
+                "cross-language byte-equivalence requires schema fields "
+                "be integers or strings. Encode decimal values as strings."
+            )
+        return str(int(value))
+    if isinstance(value, str):
+        # json.dumps emits a fully-escaped RFC 8259 string. Matches
+        # what V8's JSON.stringify does for ASCII + most Unicode.
+        return json.dumps(value, ensure_ascii=False)
+    if isinstance(value, (list, tuple)):
+        return "[" + ",".join(canonicalize(v) for v in value) + "]"
+    if isinstance(value, dict):
+        for k in value.keys():
+            if not isinstance(k, str):
+                raise ValueError(
+                    f"canonicalize: non-string key {k!r}"
+                )
+        items = sorted(value.items(), key=lambda kv: kv[0])
+        return "{" + ",".join(
+            json.dumps(k, ensure_ascii=False) + ":" + canonicalize(v)
+            for k, v in items
+        ) + "}"
+    raise ValueError(f"canonicalize: unsupported type {type(value).__name__}")

package/docs/compliance-mapping.md ADDED Viewed

@@ -0,0 +1,93 @@
+# Occasio — SOC 2 Control Mapping (DRAFT)
+**Status.** Draft. Conservative scope. The mappings below are limited to stanzas where the link between the policy and the SOC 2 control is **direct and provable from the audit log** — i.e. every claimed control evidences itself as an actual row, not as a vendor assertion. Mappings that would require interpretive bridging are intentionally absent. Before relying on this document for an audit, have it reviewed by a compliance practitioner familiar with your environment; it is published as a starting point, not a substitute for that review.
+**Scope.**
+- Framework: SOC 2 Trust Services Criteria, 2017, Common Criteria series only.
+- Template: `policy-templates/finance.yml`.
+- Evidence: rows in `~/.occasio/pipeline-events.jsonl`, verifiable by `occasio audit verify` and the independent walker at [`audit_walker.py`](audit_walker.py).
+What this document deliberately does not do:
+- Map ISO 27001, HIPAA, PCI-DSS, FedRAMP, or NIST 800-53. (Single-framework discipline; per-framework mapping is a separate effort.)
+- Claim coverage of any availability, processing integrity, confidentiality, or privacy criteria beyond what `finance.yml` directly produces evidence for.
+- Imply that Occasio alone is sufficient for SOC 2 attestation — it is one signal in a control set that includes IAM, endpoint controls, network controls, and HR processes.
+---
+## CC6.1 — Logical and Physical Access Controls (Restrict)
+> *"The entity implements logical access security software, infrastructure, and architectures over protected information assets to protect them from security events to meet the entity's objectives."*
+**Mapped stanza.** `deny_paths` in `finance.yml`.
+```yaml
+deny_paths:
+  - ~/.ssh
+  - ~/.aws
+  - ~/.config/gcloud
+  - ~/.gnupg
+```
+**Why this maps.** A `deny_paths` entry blocks any read of a path under the listed prefix by an AI agent's tool call, regardless of which agent is calling and regardless of the routing that would otherwise apply. The control point is enforced at the Occasio boundary; the agent receives a `(blocked by policy)` synthetic refusal and the underlying file is never opened.
+**Evidence in the audit log.** Every blocked attempt produces a row of this exact shape:
+```json
+{
+  "kind":         "tool_call",
+  "tool_name":    "read_file",
+  "tool_inputs":  { "path": "<resolved absolute path>" },
+  "action":       "BLOCK",
+  "reason":       "path-denied",
+  "result_kind":  "block",
+  "prev_hash":    "...",
+  "hash":         "..."
+}
+```
+A reviewer asks: *"show me every time the agent attempted to read protected credentials in the period."* The answer is `occasio report --days N`'s `blocked_accesses[]` array filtered by `reason: "path-denied"`, with row-level evidence verifiable via the hash chain.
+**Limitations.**
+- Coverage is bounded by what is in the `deny_paths` list. A path not listed is not blocked.
+- A developer with write access to `~/.occasio/policy.yml` can edit the list; that edit produces a `policy_loaded` row in the audit log (under v0.6.6+) carrying the SHA-256 of the new file, so the change is detectable, but it is not prevented.
+- Concurrent multi-process audit writes are an unmitigated risk in v0.6.5 and v0.6.6 — if two processes are appending to the same `pipeline-events.jsonl`, an interleaved write on Windows can corrupt the chain. Document the single-writer discipline alongside this control.
+---
+## CC7.2 — System Monitoring (Detection of Security Events)
+> *"The entity monitors system components and the operation of those components for anomalies that are indicative of malicious acts, natural disasters, and errors affecting the entity's ability to meet its objectives; anomalies are analyzed to determine whether they represent security events."*
+**Mapped stanza.** The audit log itself, as produced by the Occasio proxy and MCP server.
+**Why this maps.** Every governed tool call produces an immutable audit row. The hash chain detects post-hoc edits, and the `policy_loaded` synthetic event (v0.6.6+) binds tool-call rows to the specific policy file under which they were decided. A `BLOCK` row with `reason: "secret in tool result: <label>"` or `reason: "path-denied"` is the security event a CC7.2 program would treat as anomalous.
+**Evidence in the audit log.** The full `pipeline-events.jsonl` file, plus the integrity statement from `occasio audit verify` (or the equivalent independent walker output). The `occasio report` command summarises these into `summary.paths_blocked`, `summary.secrets_detected`, and the corresponding `blocked_accesses[]` and `secret_events[]` arrays.
+**Limitations.**
+- The audit log is local. CC7.2 typically expects centralised log aggregation; the log must be shipped to a SIEM or equivalent for organisation-wide monitoring. v0.6.6 does not ship a built-in shipper; this is a separate operational integration.
+- The control covers detection, not response. Action on detected events (notification, ticketing, remediation) is out of scope for the policy file alone.
+- Absence of rows is not a control signal in v0.6.6: gaps can occur if the proxy was not running. Pair with a supervisor template (see `bin/supervisor/`) and external uptime monitoring to close this gap.
+---
+## Mappings deliberately not included in this draft
+The following criteria are sometimes claimed by AI-tooling vendors but are **not mapped here** because the link to a `finance.yml` stanza is not directly evidenced by an audit row:
+- **CC6.6 — Encryption.** Occasio does not encrypt data at rest or in transit on its own; it relies on the underlying filesystem and HTTPS to Anthropic. No stanza in `finance.yml` produces evidence relevant to a CC6.6 review.
+- **CC6.7 — Information classification.** `deny_patterns` partially address this for credential-shaped strings, but classification systems (DLP labels, sensitivity tags) are an organisational concern, not a regex-pattern concern. Mapping `deny_patterns` to CC6.7 would overstate what the audit log proves.
+- **CC8.1 — Change management.** The `policy_loaded` row records *that* a policy changed and *what hash* it changed to, but not who changed it or whether the change was approved. Layering an MDM/dotfiles-with-PR-review process on top of `policy.yml` is what actually addresses CC8.1.
+- **A series (Availability), PI (Processing Integrity), C (Confidentiality), P (Privacy).** Out of scope for a Common-Criteria-only mapping. Add per-framework mapping documents if a customer needs them.
+---
+## How to use this document
+1. **Pre-pilot review.** Hand this document to the customer's compliance contact alongside `GOVERNANCE.md` and `docs/AUDIT.md`. Ask them to flag any mapping that is too aggressive (we'd rather narrow the document than overclaim) and any criterion they expected to see addressed (which becomes a roadmap input, not a v0.6.6 ship).
+2. **At pilot end.** Run `occasio report --days <pilot-length>` and walk through the output with the compliance contact, pointing at the rows that evidence each mapped control.
+3. **Re-review.** This document is versioned with the policy schema. If `finance.yml` gains a new stanza, this mapping must be revisited at that point — a stanza without explicit mapping or explicit "not mapped" treatment is documentation drift.
+---
+*Last reviewed: pending. To request a review, see the issues link in `package.json`.*