@occasiolabs/occasio 0.8.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (92) hide show
  1. package/LICENSE +202 -0
  2. package/NOTICE +10 -0
  3. package/README.md +216 -0
  4. package/bin/occasio-mcp.js +5 -0
  5. package/bin/occasio.js +2 -0
  6. package/bin/supervisor/README.md +90 -0
  7. package/bin/supervisor/com.occasio.proxy.plist.template +36 -0
  8. package/bin/supervisor/install-windows-task.ps1 +48 -0
  9. package/bin/supervisor/occasio.service +18 -0
  10. package/docs/AUDIT.md +120 -0
  11. package/docs/attest_verify.py +283 -0
  12. package/docs/audit_walker.py +65 -0
  13. package/docs/canonicalize.py +99 -0
  14. package/docs/compliance-mapping.md +93 -0
  15. package/docs/demos/mcp-block.md +148 -0
  16. package/docs/edr-calibration.md +73 -0
  17. package/docs/edr-demo.md +83 -0
  18. package/docs/python-verifier.md +74 -0
  19. package/docs/reference-pipeline.md +140 -0
  20. package/package.json +69 -0
  21. package/policy-templates/dev-default.yml +84 -0
  22. package/policy-templates/finance.yml +61 -0
  23. package/policy-templates/strict.yml +49 -0
  24. package/schemas/agent-attestation-v1.json +190 -0
  25. package/schemas/occasio-policy.schema.json +99 -0
  26. package/spec/agent-attestation/v1/README.md +137 -0
  27. package/src/adapters/claude-code.js +518 -0
  28. package/src/adapters/cline.js +161 -0
  29. package/src/adapters/computer-use-cli.js +198 -0
  30. package/src/adapters/computer-use.js +227 -0
  31. package/src/analyzer.js +170 -0
  32. package/src/anomaly/cli.js +143 -0
  33. package/src/anomaly/detectors/deny-rate.js +84 -0
  34. package/src/anomaly/detectors/file-read-volume.js +109 -0
  35. package/src/anomaly/detectors/secret-redact-rate.js +107 -0
  36. package/src/anomaly/detectors/unknown-tool-input.js +83 -0
  37. package/src/anomaly/index.js +169 -0
  38. package/src/attest/canonicalize.js +97 -0
  39. package/src/attest/index.js +355 -0
  40. package/src/attest/run-slice.js +57 -0
  41. package/src/attest/sign.js +186 -0
  42. package/src/attest/verify.js +192 -0
  43. package/src/audit/errors.js +21 -0
  44. package/src/audit/input-normalizer.js +121 -0
  45. package/src/audit/jsonl-auditor.js +178 -0
  46. package/src/audit/verifier.js +152 -0
  47. package/src/baseline.js +507 -0
  48. package/src/boundary.js +238 -0
  49. package/src/budget.js +42 -0
  50. package/src/classifier.js +115 -0
  51. package/src/context-budget.js +77 -0
  52. package/src/core/boundary-event.js +75 -0
  53. package/src/core/decision.js +61 -0
  54. package/src/core/pipeline.js +66 -0
  55. package/src/core/tool-names.js +105 -0
  56. package/src/dashboard.js +892 -0
  57. package/src/demo/README.md +31 -0
  58. package/src/demo/anomalies-demo.js +211 -0
  59. package/src/demo/attest-demo.js +198 -0
  60. package/src/distiller.js +155 -0
  61. package/src/embeddings.json +72 -0
  62. package/src/executor/dispatcher.js +230 -0
  63. package/src/harness.js +817 -0
  64. package/src/index.js +1711 -0
  65. package/src/inspect.js +329 -0
  66. package/src/interceptor.js +1198 -0
  67. package/src/lao.js +185 -0
  68. package/src/lao_prep.py +119 -0
  69. package/src/ledger.js +209 -0
  70. package/src/mcp-experiment.js +140 -0
  71. package/src/mcp-normalize.js +139 -0
  72. package/src/mcp-server.js +320 -0
  73. package/src/outbound-policy.js +433 -0
  74. package/src/policy/built-in-classifiers.js +78 -0
  75. package/src/policy/doctor.js +226 -0
  76. package/src/policy/engine.js +339 -0
  77. package/src/policy/init.js +153 -0
  78. package/src/policy/loader.js +448 -0
  79. package/src/policy/rules-default.js +36 -0
  80. package/src/policy/shell-path.js +135 -0
  81. package/src/policy/show.js +196 -0
  82. package/src/policy/validate.js +310 -0
  83. package/src/preflight/cli.js +164 -0
  84. package/src/preflight/miner.js +329 -0
  85. package/src/proxy/agent-router.js +93 -0
  86. package/src/redteam.js +428 -0
  87. package/src/replay.js +446 -0
  88. package/src/report/index.js +224 -0
  89. package/src/runtime.js +595 -0
  90. package/src/scanner/index.js +49 -0
  91. package/src/selftest.js +192 -0
  92. package/src/session.js +36 -0
package/docs/AUDIT.md ADDED
@@ -0,0 +1,120 @@
1
+ # Occasio — Audit Log Format and Independent Verification
2
+
3
+ **Audience.** Security / compliance reviewers and platform engineers who need to verify Occasio's audit trail without trusting Occasio's own verifier.
4
+
5
+ **Promise.** Every governed tool call writes one row to `~/.occasio/pipeline-events.jsonl`. Each row is hash-chained to the previous row using SHA-256, starting from a fixed genesis sentinel. Any post-hoc edit, reordering, or deletion within the chain is detectable by re-walking the file. This document specifies the row format and the canonical-serialization rules precisely enough that an independent walker (a small Python script, included) reproduces the verification end-to-end.
6
+
7
+ ---
8
+
9
+ ## 1. Row format
10
+
11
+ Each line of `pipeline-events.jsonl` is a UTF-8 JSON object. A row is built in **this exact field order**:
12
+
13
+ ```
14
+ ts, event_id, session_id, run_id,
15
+ agent, protocol, direction, kind,
16
+ tool_name, tool_inputs,
17
+ action, reason, policy_source, executor, transform,
18
+ result_kind, exit_code, secrets_redacted, distilled, tokens_saved,
19
+ prev_hash, hash
20
+ ```
21
+
22
+ Field semantics:
23
+
24
+ | Field | Type | Notes |
25
+ |---|---|---|
26
+ | `ts` | string (ISO-8601) | Event timestamp from the boundary event. |
27
+ | `event_id` | string (UUID) | Unique per event. |
28
+ | `session_id` | string | Stable per Occasio session. |
29
+ | `run_id` | string | Stable per agent run. |
30
+ | `agent` | string | Canonical agent id (e.g. `claude-code`). |
31
+ | `protocol` | string | Wire protocol (e.g. `anthropic-http`). |
32
+ | `direction` | string | `inbound` (agent → cloud) or `outbound`. |
33
+ | `kind` | string | `tool_call`, `request`, etc. |
34
+ | `tool_name` | string | Canonical tool name (e.g. `read_file`). |
35
+ | `tool_inputs` | object \| absent | Normalized inputs (see `src/audit/input-normalizer.js`). Absent means the tool's inputs are intentionally not logged. |
36
+ | `action` | string | `LOCAL`, `PASS`, `BLOCK`, or `TRANSFORM`. |
37
+ | `reason` | string | Reason code from the policy engine. |
38
+ | `policy_source` | string | `default` or `user`. |
39
+ | `executor` | string \| absent | Where the action ran (e.g. `native`). |
40
+ | `transform` | string \| absent | Transform applied, if any. |
41
+ | `result_kind` | string | `local`, `pass`, `block`, `transform`, or `unknown`. |
42
+ | `exit_code` | number \| absent | Non-zero on local execution failure. |
43
+ | `secrets_redacted` | number \| absent | Count of secrets redacted in the result. |
44
+ | `distilled` | bool \| absent | Whether output was distilled. |
45
+ | `tokens_saved` | number \| absent | Tokens saved by distillation. |
46
+ | `prev_hash` | string (64-hex) | Hash of the previous row, or genesis on the first row. |
47
+ | `hash` | string (64-hex) | SHA-256 of the row's canonical serialization with `hash` removed. |
48
+
49
+ Fields whose value would be `undefined` (in JS) or `None` (in Python) are **omitted** from the serialized row, not emitted with a null value. This matches V8's `JSON.stringify` default behavior.
50
+
51
+ ### Row kinds
52
+
53
+ `kind` distinguishes what an audit row records. As of v0.6.6 there are two:
54
+
55
+ | `kind` | When it fires | Semantics |
56
+ |---|---|---|
57
+ | `tool_call` | Every governed tool call (Claude Code or MCP) | `tool_inputs` is per-tool (file path, glob, count). `action` is one of `LOCAL`/`PASS`/`BLOCK`/`TRANSFORM`. `result_kind` is `local`/`pass`/`block`/`transform`. |
58
+ | `policy_loaded` | Process startup, and on every policy-file edit (hot-reload) | `tool_inputs` is `{ policy_hash, policy_path, version }`. `tool_name` is the placeholder string `"policy_loaded"`. `action` is `"INFO"`. `reason` is `"policy-loaded"`. **`result_kind` is omitted** — a policy-load event has no dispatcher Result. |
59
+
60
+ The `policy_loaded` row binds the audit chain to a specific policy file's bytes: a buyer can prove not just "what was blocked" but "under which exact `policy.yml` the block was decided." Because the hash is over the raw file bytes (not the normalized policy object), comments and whitespace count — the hash matches whatever a reviewer reads in source control.
61
+
62
+ ## 2. Genesis sentinel
63
+
64
+ The `prev_hash` of the first row in a chain is:
65
+
66
+ ```
67
+ 0000000000000000000000000000000000000000000000000000000000000000
68
+ ```
69
+
70
+ (64 zero hex digits.)
71
+
72
+ ## 3. Hash algorithm
73
+
74
+ For each row:
75
+
76
+ 1. Take the row object.
77
+ 2. Remove the `hash` field.
78
+ 3. Serialize **in insertion order** to a UTF-8 string with no whitespace between tokens, no key sorting, and non-ASCII characters emitted literally. (V8 `JSON.stringify` default; equivalent to Python `json.dumps(d, separators=(",", ":"), ensure_ascii=False)` over a Python 3.7+ dict.)
79
+ 4. Compute the lowercase hex SHA-256 of the resulting bytes.
80
+
81
+ That is the value of `hash`. The `prev_hash` of the next row equals this `hash`.
82
+
83
+ ## 4. Independent walker
84
+
85
+ A standalone Python script, [`audit_walker.py`](audit_walker.py), implements the verification with no Occasio dependencies — only `hashlib`, `json`, `sys` from the standard library. To run it:
86
+
87
+ ```sh
88
+ python3 docs/audit_walker.py ~/.occasio/pipeline-events.jsonl
89
+ ```
90
+
91
+ Expected output for an intact chain:
92
+
93
+ ```
94
+ OK: 31 rows verified
95
+ ```
96
+
97
+ If any row's `prev_hash` does not match the previous row's `hash`, or any row's recomputed hash does not match its stored `hash`, the script exits non-zero with a `MISMATCH at line N: …` message identifying the first inconsistency.
98
+
99
+ ## 5. Parity with Occasio's own verifier
100
+
101
+ Occasio ships its own verifier (`occasio audit verify`). For audit credibility, both must agree on the same file. Parity is checked at every release; v0.6.4 is verified to agree on the maintainer's 31-row reference log.
102
+
103
+ If you find a row where `audit_walker.py` and `occasio audit verify` disagree, that is a bug. Open an issue with the row line number and we will treat it as audit-credibility-critical (i.e. fix-before-next-release).
104
+
105
+ ## 6. What this proves and does not prove
106
+
107
+ **Proves.** No row in the chain has been edited after the fact. No row has been removed from the middle of the chain. No row has been reordered.
108
+
109
+ **Does not prove.**
110
+
111
+ - That no rows were *omitted* — i.e. that the proxy was running and recording during every session in which it should have been. Gaps in time are visible in the `ts` field, but proving "no governed action escaped the log" requires comparing the audit log against an external record of agent activity. For pilots that need this guarantee, ship the audit rows offsite (SIEM, S3, append-only file) on a tail cadence.
112
+ - ~~That the proxy was running with the policy file you expected.~~ **(Resolved in v0.6.6.)** Every process startup and every hot-reload appends a `policy_loaded` row carrying the SHA-256 of the active policy file's bytes; subsequent tool-call rows are bound to the most recent `policy_loaded` row by chain position. To verify "this BLOCK happened under this exact policy file": (1) find the BLOCK row, (2) walk backward to the most recent `policy_loaded` row, (3) compare its `tool_inputs.policy_hash` to a SHA-256 of the file you intend to compare against. The walker in `audit_walker.py` will accept both `kind` values without modification.
113
+ - That **multiple processes** writing to the same audit file did not interleave. The Claude Code proxy and the MCP server each emit their own `policy_loaded` rows, which is correct, but they share `pipeline-events.jsonl` under a single-writer assumption documented in v0.6.5's CHANGELOG. Concurrent writers on Windows can interleave; the chain detects the corruption but cannot repair it.
114
+ - That a row written *during* a write outage was not lost. v0.6.4 aborts the proxy with exit code 1 when an audit append fails, so a successful tool dispatch cannot coexist with a missing row in steady state. The combination of (a) fail-fatal audit writes and (b) a supervisor that restarts the proxy is the operational guarantee.
115
+
116
+ ## 7. Stability commitment
117
+
118
+ The audit row schema and field-order list in §1 are part of Occasio's stable surface. They will not change incompatibly across v0.6.x. Any future field will be added in a way that does not invalidate existing rows or re-walks of the chain.
119
+
120
+ `audit_walker.py` in this repository is the canonical reference. If your verifier produces different bytes on the same input, your verifier is wrong, not the spec.
@@ -0,0 +1,283 @@
1
+ #!/usr/bin/env python3
2
+ """
3
+ attest_verify.py — independent Python verifier for Occasio's
4
+ AI-Agent Behavioral Attestation v1 predicate.
5
+
6
+ Mirrors src/attest/verify.js, but written for an auditor whose
7
+ environment is Python-only and who refuses to trust Occasio's
8
+ own verifier to certify Occasio's own output. Three independent
9
+ checks, in order, each must pass:
10
+
11
+ 1. Sigstore signature (Fulcio cert chain + Rekor inclusion)
12
+ 2. DSSE payload ↔ attestation predicate canonical-byte equivalence
13
+ 3. Audit-chain integrity end-to-end + first/last hash containment
14
+
15
+ Step 1 is delegated to ``sigstore-python`` when available. When the
16
+ library is not installed the step is marked "skipped (install
17
+ sigstore-python)" rather than silently passing; the auditor decides
18
+ whether to install it or to accept a partial verification.
19
+
20
+ Usage::
21
+
22
+ python3 attest_verify.py <attestation.json> [--bundle <path>]
23
+ [--chain <path>]
24
+
25
+ Exit code 0 when every (non-skipped) check passes, 1 otherwise.
26
+
27
+ Companion files in this directory:
28
+ canonicalize.py RFC 8785 subset (kept in lockstep with the
29
+ Node and browser implementations)
30
+ audit_walker.py the audit-chain walker, reused for step 3
31
+ """
32
+
33
+ from __future__ import annotations
34
+
35
+ import argparse
36
+ import base64
37
+ import json
38
+ import os
39
+ import sys
40
+ from typing import Any
41
+
42
+ # Allow `import canonicalize` / `import audit_walker` when invoked
43
+ # directly from this folder.
44
+ sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
45
+
46
+ from canonicalize import canonicalize # noqa: E402
47
+ import audit_walker # noqa: E402
48
+
49
+ PREDICATE_TYPE = (
50
+ "https://github.com/occasiolabs/occasio/spec/agent-attestation/v1"
51
+ )
52
+ DSSE_PAYLOAD_TYPE = "application/vnd.in-toto+json"
53
+
54
+
55
+ def _read_json(path: str) -> Any:
56
+ with open(path, "r", encoding="utf-8") as fh:
57
+ return json.load(fh)
58
+
59
+
60
+ def _check_sigstore(bundle: dict) -> tuple[bool, str | None, str]:
61
+ """Try to verify the Sigstore bundle. Returns (passed, detail, note).
62
+
63
+ ``note`` is one of: 'verified', 'skipped', or an error string.
64
+ The auditor can decide whether to treat a 'skipped' as a failure.
65
+ """
66
+ try:
67
+ from sigstore.verify import Verifier, policy # type: ignore
68
+ from sigstore.models import Bundle # type: ignore
69
+ except ImportError:
70
+ return (
71
+ False,
72
+ "sigstore-python not installed (pip install sigstore)",
73
+ "skipped",
74
+ )
75
+
76
+ try:
77
+ bundle_obj = Bundle.from_json(json.dumps(bundle))
78
+ verifier = Verifier.production()
79
+ # We do not pin the identity here — the auditor's policy may
80
+ # require workflow-ref pinning. For the reference verifier we
81
+ # accept any Fulcio cert and let the audit-chain step bind
82
+ # the predicate to a concrete run_id.
83
+ verifier.verify_dsse(bundle_obj, policy.UnsafeNoOp())
84
+ return True, None, "verified"
85
+ except Exception as exc: # noqa: BLE001
86
+ return False, str(exc), "error"
87
+
88
+
89
+ def _check_payload_equivalence(
90
+ attestation: dict, bundle: dict
91
+ ) -> tuple[bool, str | None]:
92
+ env = bundle.get("dsseEnvelope") or bundle.get("dsse_envelope")
93
+ if not env or not env.get("payload"):
94
+ return False, "bundle missing dsseEnvelope.payload"
95
+ if env.get("payloadType") != DSSE_PAYLOAD_TYPE:
96
+ return False, f"unexpected payloadType: {env.get('payloadType')!r}"
97
+
98
+ try:
99
+ payload_bytes = base64.b64decode(env["payload"])
100
+ statement = json.loads(payload_bytes)
101
+ except Exception as exc: # noqa: BLE001
102
+ return False, f"cannot decode DSSE payload: {exc}"
103
+
104
+ if statement.get("predicateType") != PREDICATE_TYPE:
105
+ return False, f"unexpected predicateType: {statement.get('predicateType')!r}"
106
+
107
+ expected = {k: v for k, v in attestation.items() if k != "signature"}
108
+ if canonicalize(statement.get("predicate")) != canonicalize(expected):
109
+ return False, "predicate differs from DSSE payload predicate"
110
+ return True, None
111
+
112
+
113
+ def _check_audit_chain(
114
+ attestation: dict, chain_path: str | None
115
+ ) -> tuple[bool, str | None]:
116
+ chain_file = chain_path or attestation.get("audit_chain", {}).get("chain_file")
117
+ if not chain_file:
118
+ return False, "no chain_file in attestation and --chain not provided"
119
+ if not os.path.exists(chain_file):
120
+ return False, f"chain file not found: {chain_file}"
121
+
122
+ # audit_walker exits 0/1 based on integrity; reuse its internals
123
+ # to also capture the first/last hash positions.
124
+ first_target = attestation.get("audit_chain", {}).get("first_hash")
125
+ last_target = attestation.get("audit_chain", {}).get("last_hash")
126
+ if not first_target or not last_target:
127
+ return False, "attestation missing first_hash/last_hash"
128
+
129
+ prev = audit_walker.GENESIS
130
+ chained = 0
131
+ first_idx = -1
132
+ last_idx = -1
133
+ with open(chain_file, "r", encoding="utf-8") as fh:
134
+ for lineno, raw in enumerate(fh, 1):
135
+ line = raw.rstrip("\n")
136
+ if not line:
137
+ continue
138
+ row = json.loads(line)
139
+ stored = row.pop("hash", None)
140
+ if not isinstance(stored, str) or len(stored) != 64:
141
+ continue
142
+ if row.get("prev_hash") != prev:
143
+ return False, f"chain broken at line {lineno}"
144
+ recomputed = audit_walker.hashlib.sha256(
145
+ audit_walker.canonical_serialize(row)
146
+ ).hexdigest()
147
+ if recomputed != stored:
148
+ return False, f"hash mismatch at line {lineno}"
149
+ prev = stored
150
+ chained += 1
151
+ if stored == first_target and first_idx == -1:
152
+ first_idx = lineno
153
+ if stored == last_target:
154
+ last_idx = lineno
155
+
156
+ if first_idx == -1:
157
+ return False, "first_hash not found in chain"
158
+ if last_idx == -1:
159
+ return False, "last_hash not found in chain"
160
+ if last_idx < first_idx:
161
+ return False, "last_hash precedes first_hash in chain"
162
+ return True, f"chain_length={chained}, slice rows {first_idx}..{last_idx}"
163
+
164
+
165
+ def verify(
166
+ attestation_path: str,
167
+ bundle_path: str | None = None,
168
+ chain_path: str | None = None,
169
+ ) -> dict:
170
+ """Run all three checks and return a structured result."""
171
+ if not bundle_path:
172
+ if attestation_path.endswith(".json"):
173
+ bundle_path = attestation_path[:-5] + ".sigstore.json"
174
+ else:
175
+ bundle_path = attestation_path + ".sigstore.json"
176
+
177
+ attestation = _read_json(attestation_path)
178
+ if not os.path.exists(bundle_path):
179
+ # An unsigned attestation is still verifiable for the payload
180
+ # equivalence and chain steps — surface that as 'skipped' on
181
+ # step 1 rather than erroring out.
182
+ bundle = None
183
+ else:
184
+ bundle = _read_json(bundle_path)
185
+
186
+ checks = []
187
+
188
+ if bundle is None:
189
+ checks.append({
190
+ "name": "sigstore signature",
191
+ "ok": False,
192
+ "note": "skipped",
193
+ "detail": "no Sigstore bundle file alongside attestation",
194
+ })
195
+ else:
196
+ ok, detail, note = _check_sigstore(bundle)
197
+ checks.append({
198
+ "name": "sigstore signature",
199
+ "ok": ok,
200
+ "note": note,
201
+ "detail": detail,
202
+ })
203
+
204
+ if bundle is None:
205
+ checks.append({
206
+ "name": "bundle payload matches attestation",
207
+ "ok": False,
208
+ "note": "skipped",
209
+ "detail": "no bundle to compare against",
210
+ })
211
+ else:
212
+ ok, detail = _check_payload_equivalence(attestation, bundle)
213
+ checks.append({
214
+ "name": "bundle payload matches attestation",
215
+ "ok": ok,
216
+ "note": "verified" if ok else "failed",
217
+ "detail": detail,
218
+ })
219
+
220
+ ok, detail = _check_audit_chain(attestation, chain_path)
221
+ checks.append({
222
+ "name": "audit chain integrity",
223
+ "ok": ok,
224
+ "note": "verified" if ok else "failed",
225
+ "detail": detail,
226
+ })
227
+
228
+ # Overall pass: every check is ok=True. Skipped counts as not-ok
229
+ # so the caller cannot pretend a partial verification was a full
230
+ # one. The detail line tells the auditor what to install to lift
231
+ # the skip.
232
+ overall = all(c["ok"] for c in checks)
233
+ return {"ok": overall, "checks": checks}
234
+
235
+
236
+ def _render(result: dict) -> int:
237
+ for c in result["checks"]:
238
+ mark = "OK" if c["ok"] else ("SKIP" if c["note"] == "skipped" else "FAIL")
239
+ line = f" [{mark:>4}] {c['name']}"
240
+ if c.get("detail"):
241
+ line += f" ({c['detail']})"
242
+ print(line)
243
+ print()
244
+ print("PASS" if result["ok"] else "FAIL")
245
+ return 0 if result["ok"] else 1
246
+
247
+
248
+ def main() -> int:
249
+ parser = argparse.ArgumentParser(
250
+ description=(
251
+ "Independent Python verifier for Occasio Agent Attestation v1."
252
+ )
253
+ )
254
+ parser.add_argument("attestation", help="Path to attestation.json")
255
+ parser.add_argument(
256
+ "--bundle",
257
+ help=(
258
+ "Path to Sigstore bundle (default: <attestation>.sigstore.json)"
259
+ ),
260
+ )
261
+ parser.add_argument(
262
+ "--chain",
263
+ help=(
264
+ "Path to audit chain file. Default: read chain_file from the "
265
+ "attestation."
266
+ ),
267
+ )
268
+ parser.add_argument(
269
+ "--json",
270
+ action="store_true",
271
+ help="Emit JSON result instead of human-readable lines.",
272
+ )
273
+ args = parser.parse_args()
274
+
275
+ result = verify(args.attestation, args.bundle, args.chain)
276
+ if args.json:
277
+ print(json.dumps(result, indent=2))
278
+ return 0 if result["ok"] else 1
279
+ return _render(result)
280
+
281
+
282
+ if __name__ == "__main__":
283
+ sys.exit(main())
@@ -0,0 +1,65 @@
1
+ #!/usr/bin/env python3
2
+ """
3
+ Independent walker for Occasio's pipeline-events.jsonl audit log.
4
+
5
+ Re-walks the SHA-256 hash chain without using any Occasio code, so the
6
+ audit-trail integrity claim does not depend on trusting Occasio's own
7
+ verifier. See docs/AUDIT.md for the row schema and the canonical-
8
+ serialization rules this script implements.
9
+
10
+ Usage:
11
+ python3 audit_walker.py ~/.occasio/pipeline-events.jsonl
12
+
13
+ Exit code 0 on success, 1 on first inconsistency or I/O error.
14
+ """
15
+
16
+ import hashlib
17
+ import json
18
+ import sys
19
+
20
+ GENESIS = "0" * 64
21
+
22
+
23
+ def canonical_serialize(row_without_hash: dict) -> bytes:
24
+ # Mirrors V8's JSON.stringify with default options:
25
+ # - no whitespace between tokens
26
+ # - non-ASCII characters emitted literally (ensure_ascii=False)
27
+ # - keys in insertion order (Python 3.7+ dict guarantees this)
28
+ return json.dumps(
29
+ row_without_hash,
30
+ separators=(",", ":"),
31
+ ensure_ascii=False,
32
+ ).encode("utf-8")
33
+
34
+
35
+ def walk(path: str) -> int:
36
+ prev_hash = GENESIS
37
+ chained = 0
38
+ with open(path, "r", encoding="utf-8") as fh:
39
+ for lineno, raw in enumerate(fh, 1):
40
+ line = raw.rstrip("\n")
41
+ if not line:
42
+ continue
43
+ row = json.loads(line)
44
+ stored_hash = row.pop("hash", None)
45
+ # Legacy rows (pre-hash-chain) have no hash field — skip silently.
46
+ if not isinstance(stored_hash, str) or len(stored_hash) != 64:
47
+ continue
48
+ if row.get("prev_hash") != prev_hash:
49
+ print(f"MISMATCH at line {lineno}: prev_hash chain broken", file=sys.stderr)
50
+ return 1
51
+ recomputed = hashlib.sha256(canonical_serialize(row)).hexdigest()
52
+ if recomputed != stored_hash:
53
+ print(f"MISMATCH at line {lineno}: stored hash {stored_hash} != recomputed {recomputed}", file=sys.stderr)
54
+ return 1
55
+ prev_hash = stored_hash
56
+ chained += 1
57
+ print(f"OK: {chained} rows verified")
58
+ return 0
59
+
60
+
61
+ if __name__ == "__main__":
62
+ if len(sys.argv) != 2:
63
+ print("usage: audit_walker.py <pipeline-events.jsonl>", file=sys.stderr)
64
+ sys.exit(2)
65
+ sys.exit(walk(sys.argv[1]))
@@ -0,0 +1,99 @@
1
+ """
2
+ canonicalize.py — RFC 8785 subset for byte-stable JSON serialisation.
3
+
4
+ Companion to docs/audit_walker.py and docs/attest_verify.py: lets an
5
+ auditor in a Python-only environment re-verify a Occasio
6
+ attestation against the producer's canonical form without trusting
7
+ the producer's code.
8
+
9
+ Must stay byte-identical to src/attest/canonicalize.js and the
10
+ inline copy in integrations/attest-view/viewer.js. The three
11
+ implementations exist so the schema is provably language-independent;
12
+ diverging them defeats the point.
13
+
14
+ Cross-language invariant (load-bearing):
15
+ JavaScript has a single ``number`` type. ``JSON.parse('1.0')``
16
+ yields the integer 1; ``JSON.stringify(1)`` emits ``'1'``.
17
+ Python distinguishes int from float: ``json.loads('1.0')`` yields
18
+ ``float(1.0)``; ``json.dumps(1.0)`` emits ``'1.0'``. If we silently
19
+ accepted floats, the JS verifier and the Python verifier would
20
+ canonicalize the same JSON file to different bytes — silent
21
+ byte-equivalence breakage. This module:
22
+
23
+ - rejects non-integer floats (e.g. 1.5) with a clear error
24
+ - coerces integer-valued floats (e.g. 1.0) to the integer
25
+ representation so that a Python parse of ``"1.0"`` and a JS
26
+ parse of ``"1.0"`` canonicalize identically
27
+
28
+ If a future schema requires decimal precision, encode it as a
29
+ string. The canonicalize boundary stays integer-only.
30
+
31
+ Deviations from strict RFC 8785 (documented, intentional):
32
+ - Float rejection above (instead of RFC 8785's prescribed form).
33
+ Load-bearing for cross-language byte-equivalence.
34
+ - Lone-surrogate handling matches Python json.dumps (escapes
35
+ via \\uXXXX). JCS specifies the same.
36
+ """
37
+
38
+ from __future__ import annotations
39
+
40
+ import json
41
+ from typing import Any
42
+
43
+
44
+ def canonicalize(value: Any) -> str:
45
+ """Return the canonical-JSON string for ``value``.
46
+
47
+ Rules:
48
+ - object keys sorted lexicographically by UTF-16 code unit
49
+ (Python's default ``sorted`` on strs is UTF-16-equivalent for
50
+ the BMP, which covers every key in the v1 schema)
51
+ - ``None`` ``True`` ``False`` map to ``null``/``true``/``false``
52
+ - object members whose value is ``None`` are kept (they encode
53
+ explicit nullable fields like ``policy.version``); members
54
+ absent from the dict are not invented
55
+ - arrays preserve order
56
+ - rejects ``float('nan')``/``inf``, callables, types,
57
+ non-string keys
58
+
59
+ Raises ``ValueError`` on rejected inputs.
60
+ """
61
+ if value is None:
62
+ return "null"
63
+ if isinstance(value, bool):
64
+ return "true" if value else "false"
65
+ if isinstance(value, int):
66
+ return str(value)
67
+ if isinstance(value, float):
68
+ if value != value or value in (float("inf"), float("-inf")):
69
+ raise ValueError("canonicalize: non-finite number")
70
+ # Cross-language invariant: a JSON literal like "1.0" parses to
71
+ # int(1) in JavaScript but float(1.0) in Python. Coerce the
72
+ # integer-valued case so both implementations canonicalize to
73
+ # the same bytes. Reject genuine non-integer floats — see the
74
+ # module docstring for the schema-design rationale.
75
+ if not value.is_integer():
76
+ raise ValueError(
77
+ f"canonicalize: non-integer number {value} — "
78
+ "cross-language byte-equivalence requires schema fields "
79
+ "be integers or strings. Encode decimal values as strings."
80
+ )
81
+ return str(int(value))
82
+ if isinstance(value, str):
83
+ # json.dumps emits a fully-escaped RFC 8259 string. Matches
84
+ # what V8's JSON.stringify does for ASCII + most Unicode.
85
+ return json.dumps(value, ensure_ascii=False)
86
+ if isinstance(value, (list, tuple)):
87
+ return "[" + ",".join(canonicalize(v) for v in value) + "]"
88
+ if isinstance(value, dict):
89
+ for k in value.keys():
90
+ if not isinstance(k, str):
91
+ raise ValueError(
92
+ f"canonicalize: non-string key {k!r}"
93
+ )
94
+ items = sorted(value.items(), key=lambda kv: kv[0])
95
+ return "{" + ",".join(
96
+ json.dumps(k, ensure_ascii=False) + ":" + canonicalize(v)
97
+ for k, v in items
98
+ ) + "}"
99
+ raise ValueError(f"canonicalize: unsupported type {type(value).__name__}")
@@ -0,0 +1,93 @@
1
+ # Occasio — SOC 2 Control Mapping (DRAFT)
2
+
3
+ **Status.** Draft. Conservative scope. The mappings below are limited to stanzas where the link between the policy and the SOC 2 control is **direct and provable from the audit log** — i.e. every claimed control evidences itself as an actual row, not as a vendor assertion. Mappings that would require interpretive bridging are intentionally absent. Before relying on this document for an audit, have it reviewed by a compliance practitioner familiar with your environment; it is published as a starting point, not a substitute for that review.
4
+
5
+ **Scope.**
6
+ - Framework: SOC 2 Trust Services Criteria, 2017, Common Criteria series only.
7
+ - Template: `policy-templates/finance.yml`.
8
+ - Evidence: rows in `~/.occasio/pipeline-events.jsonl`, verifiable by `occasio audit verify` and the independent walker at [`audit_walker.py`](audit_walker.py).
9
+
10
+ What this document deliberately does not do:
11
+ - Map ISO 27001, HIPAA, PCI-DSS, FedRAMP, or NIST 800-53. (Single-framework discipline; per-framework mapping is a separate effort.)
12
+ - Claim coverage of any availability, processing integrity, confidentiality, or privacy criteria beyond what `finance.yml` directly produces evidence for.
13
+ - Imply that Occasio alone is sufficient for SOC 2 attestation — it is one signal in a control set that includes IAM, endpoint controls, network controls, and HR processes.
14
+
15
+ ---
16
+
17
+ ## CC6.1 — Logical and Physical Access Controls (Restrict)
18
+
19
+ > *"The entity implements logical access security software, infrastructure, and architectures over protected information assets to protect them from security events to meet the entity's objectives."*
20
+
21
+ **Mapped stanza.** `deny_paths` in `finance.yml`.
22
+
23
+ ```yaml
24
+ deny_paths:
25
+ - ~/.ssh
26
+ - ~/.aws
27
+ - ~/.config/gcloud
28
+ - ~/.gnupg
29
+ ```
30
+
31
+ **Why this maps.** A `deny_paths` entry blocks any read of a path under the listed prefix by an AI agent's tool call, regardless of which agent is calling and regardless of the routing that would otherwise apply. The control point is enforced at the Occasio boundary; the agent receives a `(blocked by policy)` synthetic refusal and the underlying file is never opened.
32
+
33
+ **Evidence in the audit log.** Every blocked attempt produces a row of this exact shape:
34
+
35
+ ```json
36
+ {
37
+ "kind": "tool_call",
38
+ "tool_name": "read_file",
39
+ "tool_inputs": { "path": "<resolved absolute path>" },
40
+ "action": "BLOCK",
41
+ "reason": "path-denied",
42
+ "result_kind": "block",
43
+ "prev_hash": "...",
44
+ "hash": "..."
45
+ }
46
+ ```
47
+
48
+ A reviewer asks: *"show me every time the agent attempted to read protected credentials in the period."* The answer is `occasio report --days N`'s `blocked_accesses[]` array filtered by `reason: "path-denied"`, with row-level evidence verifiable via the hash chain.
49
+
50
+ **Limitations.**
51
+ - Coverage is bounded by what is in the `deny_paths` list. A path not listed is not blocked.
52
+ - A developer with write access to `~/.occasio/policy.yml` can edit the list; that edit produces a `policy_loaded` row in the audit log (under v0.6.6+) carrying the SHA-256 of the new file, so the change is detectable, but it is not prevented.
53
+ - Concurrent multi-process audit writes are an unmitigated risk in v0.6.5 and v0.6.6 — if two processes are appending to the same `pipeline-events.jsonl`, an interleaved write on Windows can corrupt the chain. Document the single-writer discipline alongside this control.
54
+
55
+ ---
56
+
57
+ ## CC7.2 — System Monitoring (Detection of Security Events)
58
+
59
+ > *"The entity monitors system components and the operation of those components for anomalies that are indicative of malicious acts, natural disasters, and errors affecting the entity's ability to meet its objectives; anomalies are analyzed to determine whether they represent security events."*
60
+
61
+ **Mapped stanza.** The audit log itself, as produced by the Occasio proxy and MCP server.
62
+
63
+ **Why this maps.** Every governed tool call produces an immutable audit row. The hash chain detects post-hoc edits, and the `policy_loaded` synthetic event (v0.6.6+) binds tool-call rows to the specific policy file under which they were decided. A `BLOCK` row with `reason: "secret in tool result: <label>"` or `reason: "path-denied"` is the security event a CC7.2 program would treat as anomalous.
64
+
65
+ **Evidence in the audit log.** The full `pipeline-events.jsonl` file, plus the integrity statement from `occasio audit verify` (or the equivalent independent walker output). The `occasio report` command summarises these into `summary.paths_blocked`, `summary.secrets_detected`, and the corresponding `blocked_accesses[]` and `secret_events[]` arrays.
66
+
67
+ **Limitations.**
68
+ - The audit log is local. CC7.2 typically expects centralised log aggregation; the log must be shipped to a SIEM or equivalent for organisation-wide monitoring. v0.6.6 does not ship a built-in shipper; this is a separate operational integration.
69
+ - The control covers detection, not response. Action on detected events (notification, ticketing, remediation) is out of scope for the policy file alone.
70
+ - Absence of rows is not a control signal in v0.6.6: gaps can occur if the proxy was not running. Pair with a supervisor template (see `bin/supervisor/`) and external uptime monitoring to close this gap.
71
+
72
+ ---
73
+
74
+ ## Mappings deliberately not included in this draft
75
+
76
+ The following criteria are sometimes claimed by AI-tooling vendors but are **not mapped here** because the link to a `finance.yml` stanza is not directly evidenced by an audit row:
77
+
78
+ - **CC6.6 — Encryption.** Occasio does not encrypt data at rest or in transit on its own; it relies on the underlying filesystem and HTTPS to Anthropic. No stanza in `finance.yml` produces evidence relevant to a CC6.6 review.
79
+ - **CC6.7 — Information classification.** `deny_patterns` partially address this for credential-shaped strings, but classification systems (DLP labels, sensitivity tags) are an organisational concern, not a regex-pattern concern. Mapping `deny_patterns` to CC6.7 would overstate what the audit log proves.
80
+ - **CC8.1 — Change management.** The `policy_loaded` row records *that* a policy changed and *what hash* it changed to, but not who changed it or whether the change was approved. Layering an MDM/dotfiles-with-PR-review process on top of `policy.yml` is what actually addresses CC8.1.
81
+ - **A series (Availability), PI (Processing Integrity), C (Confidentiality), P (Privacy).** Out of scope for a Common-Criteria-only mapping. Add per-framework mapping documents if a customer needs them.
82
+
83
+ ---
84
+
85
+ ## How to use this document
86
+
87
+ 1. **Pre-pilot review.** Hand this document to the customer's compliance contact alongside `GOVERNANCE.md` and `docs/AUDIT.md`. Ask them to flag any mapping that is too aggressive (we'd rather narrow the document than overclaim) and any criterion they expected to see addressed (which becomes a roadmap input, not a v0.6.6 ship).
88
+ 2. **At pilot end.** Run `occasio report --days <pilot-length>` and walk through the output with the compliance contact, pointing at the rows that evidence each mapped control.
89
+ 3. **Re-review.** This document is versioned with the policy schema. If `finance.yml` gains a new stanza, this mapping must be revisited at that point — a stanza without explicit mapping or explicit "not mapped" treatment is documentation drift.
90
+
91
+ ---
92
+
93
+ *Last reviewed: pending. To request a review, see the issues link in `package.json`.*