wicked-vault 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md ADDED
@@ -0,0 +1,176 @@
1
+ ```
2
+ ██╗ ██╗██╗ ██████╗██╗ ██╗███████╗██████╗ ██╗ ██╗ █████╗ ██╗ ██╗██╗ ████████╗
3
+ ██║ ██║██║██╔════╝██║ ██╔╝██╔════╝██╔══██╗ ██║ ██║██╔══██╗██║ ██║██║ ╚══██╔══╝
4
+ ██║ █╗ ██║██║██║ █████╔╝ █████╗ ██║ ██║█████╗██║ ██║███████║██║ ██║██║ ██║
5
+ ██║███╗██║██║██║ ██╔═██╗ ██╔══╝ ██║ ██║╚════╝╚██╗ ██╔╝██╔══██║██║ ██║██║ ██║
6
+ ╚███╔███╔╝██║╚██████╗██║ ██╗███████╗██████╔╝ ╚████╔╝ ██║ ██║╚██████╔╝███████╗██║
7
+ ╚══╝╚══╝ ╚═╝ ╚═════╝╚═╝ ╚═╝╚══════╝╚═════╝ ╚═══╝ ╚═╝ ╚═╝ ╚═════╝ ╚══════╝╚═╝
8
+ ```
9
+
10
+ **Local-first evidence primitive. Records claim-backing evidence with the acceptance
11
+ criteria it must clear, checks integrity deterministically, and records independent
12
+ third-party judgments — never trusting a stored verdict, never letting work self-grade
13
+ its own "done".**
14
+
15
+ Sibling to wicked-bus / wicked-brain / wicked-testing. Works with **Claude Code**,
16
+ **Gemini**, **Copilot**, **Codex**, **Cursor**, **Kiro**, and **Antigravity** (skills
17
+ install across all of them).
18
+
19
+ It exists to answer one question honestly: **is this claim actually backed by
20
+ evidence that meets its bar?** — so "tests pass", "build clean", "ready to merge"
21
+ can't be *asserted* into truth, and can't be *self-graded* into truth either.
22
+
23
+ It checks on two tiers (ADR-0002):
24
+
25
+ - **Integrity tier** — deterministic, re-derivable, model-free. Recompute the
26
+ hashes, re-run the pure verifier. CI-gate-safe. *Never trust a cached status.*
27
+ - **Judgment tier** — an **independent** evaluator (≠ the agent that did the
28
+ work) judges the frozen evidence against the frozen criteria; the opinion is
29
+ recorded as a tamper-evident, append-only `opinion_attestation`. *Never trust
30
+ a self-graded "done".*
31
+
32
+ ## Boundary
33
+
34
+ | Owns (the primitive) | Refuses (lives in a consumer) |
35
+ |---|---|
36
+ | `record` · `verify` · `inspect` · `attest` · `cross-check` · `supersede` | "is the work *done*?" (gate logic) |
37
+ | criteria-binding + tamper-evidence (envelope hash; git as audit chain) | scenario/flake history; claim authoring; work-shape |
38
+ | the deterministic verifier family **and** the append-only attestation ledger | running the judge — the model lives in the `analyze-evidence` *skill*, never in the CLI |
39
+
40
+ The consumer authors the contract; the vault evaluates it mechanically (G9) and
41
+ records independent judgments without re-deriving them (G10). It cannot decide
42
+ "done" — it has no gate logic to leak.
43
+
44
+ ## Install
45
+
46
+ The CLI runs via `npx wicked-vault <command>` once the package is present. To
47
+ teach your AI CLIs/IDEs how to use it, install the skills across every detected
48
+ config root (Claude Code, Gemini, Copilot, Codex, Cursor, Kiro, Antigravity):
49
+
50
+ ```bash
51
+ npx wicked-vault-install # detect and install everywhere
52
+ npx wicked-vault-install --cli=claude # one CLI only (comma-separated for several)
53
+ npx wicked-vault-install --path ~/.claude # a specific config root
54
+ ```
55
+
56
+ This mirrors the shared wicked-bus / wicked-brain installer: `$CLAUDE_CONFIG_DIR`
57
+ is honored, alt-config layouts are probed, and skills land as
58
+ `wicked-vault-{init,record-evidence,verify-evidence,analyze-evidence,cross-check-evidence}/`
59
+ under each CLI's `skills/`. If wicked-bus is installed, the installer also
60
+ registers the vault as a bus provider (see below).
61
+
62
+ ## CLI
63
+
64
+ ```bash
65
+ wicked-vault init
66
+ # record: --criteria is MANDATORY (the bar this evidence claims to clear); --verifier is optional
67
+ wicked-vault record --scope S --phase build --claim tests-pass --kind test-run \
68
+ --source "npm test" --criteria "all unit tests pass (exit 0)" \
69
+ --verifier "exit_code_eq:0" --run
70
+ wicked-vault verify <artifact-id> # integrity tier; exit 0 iff hash_ok && pass; surfaces latest opinion
71
+ wicked-vault inspect <artifact-id> # frozen criteria + evidence + integrity (feeds the judge)
72
+ wicked-vault attest <artifact-id> --opinion pass --rationale "…" \
73
+ --evaluator gemini-reviewer --model gemini/2.5-pro # independent judgment; fail-closed
74
+ wicked-vault attestations <artifact-id> # append-only opinion log
75
+ wicked-vault cross-check --scope S --phase build # --integrity-only (default, CI-safe)
76
+ wicked-vault cross-check --scope S --phase build --with-attestations # opt-in judgment tier
77
+ wicked-vault supersede <old-id> --scope S --criteria "…" ... --run
78
+ wicked-vault declare-contract --scope S --phase build --spec contract.json
79
+ wicked-vault list --scope S
80
+ ```
81
+
82
+ Output is JSON; exit code is the gate signal (0 = PASS). The model judge runs in
83
+ the `wicked-vault:analyze-evidence` skill (`inspect → eval → attest`) — the CLI
84
+ itself never calls a model.
85
+
86
+ ## Two skills, two tiers (so the caller knows what they're invoking)
87
+
88
+ - **`wicked-vault:verify-evidence`** — integrity tier. Deterministic, model-free,
89
+ reproducible, CI-safe: is the artifact intact and does its pure verifier pass?
90
+ - **`wicked-vault:analyze-evidence`** — judgment tier. Runs an *independent*
91
+ model to judge evidence against criteria; non-reproducible; records an
92
+ attestation. The name tells you you're spending a model call and getting an
93
+ opinion, not a re-derivation.
94
+
95
+ ## Independent evaluation (the judgment tier — G10)
96
+
97
+ For criteria a deterministic verifier can't express ("the change adequately
98
+ addresses the documented failure modes"), the `wicked-vault:analyze-evidence`
99
+ skill orchestrates an **independent** judge:
100
+
101
+ 1. `inspect` returns the frozen criteria + evidence.
102
+ 2. a model **distinct from the worker** judges criteria-vs-evidence (criteria
103
+ and evidence are passed as escaped *data*, never as instructions).
104
+ 3. `attest` records the `{opinion, rationale, evaluator, model, …}` to an
105
+ append-only, tamper-evident log.
106
+
107
+ Guarantees that hold: criteria are frozen to the evidence (anti-downgrade);
108
+ `attest` is **fail-closed** on a tampered artifact and **rejects a self-grade**
109
+ (`evaluator == created_by`). What's traded: a judgment is **not reproducible** —
110
+ it's re-evaluated, not re-derived. The default CI gate stays on the
111
+ deterministic `--integrity-only` path; the judgment tier is opt-in. Threat model
112
+ (prompt injection, lax-bar self-grade) and the council 5–0 review:
113
+ [`docs/adr/0002`](docs/adr/0002-independent-evaluation-and-criteria-binding.md).
114
+
115
+ ## wicked-bus integration (optional)
116
+
117
+ The vault is a zero-dependency primitive; wicked-bus is a **sibling**, never a
118
+ hard dependency. When wicked-bus is resolvable (installed globally or in the
119
+ project the vault runs from) the vault publishes events fire-and-forget. When it
120
+ isn't — or when `WICKED_VAULT_NO_BUS=1` is set — emission is a silent no-op and
121
+ the CLI behaves identically. A bus error never changes a verdict, the JSON on
122
+ stdout, or an exit code.
123
+
124
+ | Command | event_type | subdomain | key payload |
125
+ |---|---|---|---|
126
+ | `record` | `wicked.evidence.recorded` | `vault.record` | scope, phase, claim_id, kind, id, envelope_hash |
127
+ | `supersede` | `wicked.evidence.superseded` | `vault.supersede` | new_id, old_id, scope, phase, claim_id |
128
+ | `verify` / `cross-check` (tamper only) | `wicked.evidence.tampered` | `vault.tamper` / `vault.cross_check` | id(s), payload_ok, envelope_ok |
129
+ | `declare-contract` | `wicked.contract.declared` | `vault.contract` | scope, phase, contract_version |
130
+ | `cross-check` | `wicked.contract.checked` | `vault.cross_check` | scope, phase, **overall**, mode, contract_version |
131
+ | `attest` | `wicked.evidence.attested` | `vault.attest` | artifact_id, attestation_id, **opinion**, evaluator, model |
132
+ | `cross-check --with-attestations` | `wicked.claim.evaluated` | `vault.cross_check` | scope, phase, claim_id, opinion, evaluator |
133
+
134
+ All events use `domain: wicked-vault`. `wicked.contract.checked` carries the
135
+ mechanical verdict (`PASS` / `REJECT` / `ERROR`) — the signal a gate consumer
136
+ (wicked-testing, wicked-garden) subscribes to. `wicked.evidence.attested` carries
137
+ an independent `opinion` + its `evaluator`/`model` provenance (a judgment-tier
138
+ signal, *not* a deterministic verdict). `wicked.evidence.tampered` is the
139
+ high-value alarm: a payload, criteria, or envelope diverged from what was
140
+ recorded (G2).
141
+
142
+ ## Guarantees
143
+
144
+ G1 server-minted ids · G2 envelope-hash tamper-evidence (**binds the criteria
145
+ too**) · **G3 re-derivation (never trust a cached status)** · G4 honest
146
+ recording (not sandboxed — harness owns isolation) · G5 fail-closed · G6
147
+ append-only · G7 verifier purity (CLI never calls a model) · G8 contract pinning
148
+ · G9 mechanical evaluation · **G10 attestation-chain trust** (independent
149
+ judgments are recorded, not re-derived; distinct from deterministic results).
150
+ Full text + threat model: [`docs/CONTRACTS.md`](docs/CONTRACTS.md). Founding
151
+ decisions + council reviews:
152
+ [`docs/adr/0001`](docs/adr/0001-standalone-and-council-revisions.md) ·
153
+ [`docs/adr/0002`](docs/adr/0002-independent-evaluation-and-criteria-binding.md).
154
+
155
+ ## Verifiers (deterministic sub-checks — optional)
156
+
157
+ `exit_code_eq` · `regex_match` · `not_contains` · `jq_pred` · `commit_exists`.
158
+ Since ADR-0002 the verifier is an *optional* composable sub-check an independent
159
+ evaluator may cite — not the whole story. Nondeterministic observation verifiers
160
+ (`pr_check_status`, `http_status_eq`) are a separate extension. `llm_eval` is
161
+ **not** a verifier kind (it would falsify G7) — independent judgment lives in the
162
+ `analyze-evidence` skill instead, recorded as an `opinion_attestation` under G10.
163
+
164
+ ## Proof
165
+
166
+ ```bash
167
+ npm run prove # record -> tamper -> verify-rejects on a real repo
168
+ bash test/verifiers.sh # the 5 verifiers, pass + fail cases
169
+ bash test/attestation.sh # criteria-binding, attest fail-closed/independence, require_attestation
170
+ bash test/bus-integration.sh # graceful no-op + event validity + emission (incl. attested)
171
+ ```
172
+
173
+ Status: v0.2.0 — deterministic core proven on real repos; criteria-binding +
174
+ independent judgment tier (ADR-0002, council 5–0) implemented and proven;
175
+ wicked-bus emission + provider registration (optional, fire-and-forget). Not yet
176
+ implemented: `pr_check_status`/`http_status_eq` and the sqlite query cache.
@@ -0,0 +1,161 @@
1
+ #!/usr/bin/env node
2
+ import { readFileSync } from 'node:fs';
3
+ import {
4
+ findRoot, initVault, record, verify, crossCheck, declareContract, listEntries, supersede,
5
+ inspect, attest, listAttestations,
6
+ } from '../src/vault.mjs';
7
+ import { initBus } from '../src/bus.mjs';
8
+
9
+ // --criteria accepts inline text or @file (acceptance criteria are often
10
+ // multi-line). Resolved here so src/vault.mjs stays pure text-in.
11
+ function resolveCriteria(val) {
12
+ if (typeof val !== 'string') return val;
13
+ if (val.startsWith('@')) return readFileSync(val.slice(1), 'utf8');
14
+ return val;
15
+ }
16
+
17
+ function parseArgs(argv) {
18
+ const out = { _: [] };
19
+ for (let i = 0; i < argv.length; i++) {
20
+ const a = argv[i];
21
+ if (a.startsWith('--')) {
22
+ const k = a.slice(2);
23
+ const next = argv[i + 1];
24
+ if (next === undefined || next.startsWith('--')) out[k] = true;
25
+ else { out[k] = next; i++; }
26
+ } else out._.push(a);
27
+ }
28
+ return out;
29
+ }
30
+
31
+ function emit(obj, ok) {
32
+ process.stdout.write(JSON.stringify(obj, null, 2) + '\n');
33
+ process.exit(ok ? 0 : 1);
34
+ }
35
+
36
+ const [cmd, ...rest] = process.argv.slice(2);
37
+ const args = parseArgs(rest);
38
+ const cwd = (typeof args.cwd === 'string' && args.cwd) || process.cwd();
39
+
40
+ // Optional, fire-and-forget wicked-bus publisher. `publish` is a no-op when the
41
+ // bus is unavailable or disabled (WICKED_VAULT_NO_BUS=1) and never throws, so
42
+ // event emission cannot alter the JSON on stdout or the exit code below.
43
+ const publish = await initBus(cwd);
44
+
45
+ try {
46
+ if (cmd === 'init') {
47
+ emit({ initialized: initVault(cwd) }, true);
48
+ }
49
+
50
+ const root = findRoot(cwd, { create: cmd === 'record' || cmd === 'declare-contract' || cmd === 'supersede' });
51
+ if (!root) emit({ error: 'no .wicked-vault/ found; run `wicked-vault init`' }, false);
52
+
53
+ switch (cmd) {
54
+ case 'record': {
55
+ const res = record(root, {
56
+ scope: args.scope, phase: args.phase, claim: args.claim, kind: args.kind,
57
+ source: args.source, verifier: args.verifier, criteria: resolveCriteria(args.criteria),
58
+ run: !!args.run, artifact: typeof args.artifact === 'string' ? args.artifact : undefined,
59
+ cwd,
60
+ });
61
+ publish('wicked.evidence.recorded', 'vault.record', {
62
+ scope: args.scope, phase: args.phase, claim_id: args.claim, kind: args.kind,
63
+ source: args.source, id: res.id, envelope_hash: res.envelope_hash,
64
+ criteria_authored_by: res.criteria_authored_by, status_at_record: res.status_at_record,
65
+ });
66
+ emit(res, true);
67
+ break;
68
+ }
69
+ case 'inspect':
70
+ emit(inspect(root, args._[0] || args.id), true);
71
+ break;
72
+ case 'attest': {
73
+ const res = attest(root, args._[0] || args.id, {
74
+ opinion: args.opinion, rationale: args.rationale, evaluator: args.evaluator,
75
+ model: args.model, prompt_hash: args['prompt-hash'],
76
+ sampling: typeof args.sampling === 'string' ? JSON.parse(args.sampling) : undefined,
77
+ });
78
+ publish('wicked.evidence.attested', 'vault.attest', {
79
+ artifact_id: args._[0] || args.id, attestation_id: res.attestation_id,
80
+ opinion: res.opinion, evaluator: args.evaluator, model: args.model || null,
81
+ });
82
+ emit(res, true);
83
+ break;
84
+ }
85
+ case 'attestations':
86
+ emit(listAttestations(root, args._[0] || args.id), true);
87
+ break;
88
+ case 'verify': {
89
+ const res = verify(root, args._[0] || args.id);
90
+ // Only the rare, high-value tamper case is published — a verify is a read
91
+ // and would otherwise be noise. hash_ok=false means the payload or
92
+ // envelope diverged from what was recorded (G2).
93
+ if (res.rederived && res.hash_ok === false) {
94
+ publish('wicked.evidence.tampered', 'vault.tamper', {
95
+ id: res.id, payload_ok: res.payload_ok, envelope_ok: res.envelope_ok,
96
+ });
97
+ }
98
+ emit(res, res.hash_ok && res.status === 'pass');
99
+ break;
100
+ }
101
+ case 'cross-check': {
102
+ // --integrity-only is the default (deterministic, CI-safe); attestation
103
+ // consultation is opt-in via --with-attestations (ADR-0002 D3/D10).
104
+ const withAttestations = args['with-attestations'] === true;
105
+ const res = crossCheck(root, args.scope, args.phase, { withAttestations });
106
+ publish('wicked.contract.checked', 'vault.cross_check', {
107
+ scope: res.scope, phase: res.phase, overall: res.overall, mode: res.mode,
108
+ contract_version: res.contract_version, claims: (res.claims || []).length,
109
+ });
110
+ const tampered = (res.claims || []).filter((c) => c.hash_ok === false);
111
+ if (tampered.length > 0) {
112
+ publish('wicked.evidence.tampered', 'vault.cross_check', {
113
+ scope: res.scope, phase: res.phase,
114
+ artifact_ids: tampered.map((c) => c.artifact_id),
115
+ });
116
+ }
117
+ // Surface each consulted independent opinion (judgment tier).
118
+ if (withAttestations) {
119
+ for (const c of (res.claims || [])) {
120
+ if (c.attestation) {
121
+ publish('wicked.claim.evaluated', 'vault.cross_check', {
122
+ scope: res.scope, phase: res.phase, claim_id: c.claim_id,
123
+ opinion: c.attestation.opinion, evaluator: c.attestation.evaluator,
124
+ });
125
+ }
126
+ }
127
+ }
128
+ emit(res, res.overall === 'PASS');
129
+ break;
130
+ }
131
+ case 'declare-contract': {
132
+ const res = declareContract(root, args.scope, args.phase, JSON.parse(readFileSync(args.spec, 'utf8')));
133
+ publish('wicked.contract.declared', 'vault.contract', {
134
+ scope: args.scope, phase: args.phase, contract_version: res.contract_version,
135
+ });
136
+ emit(res, true);
137
+ break;
138
+ }
139
+ case 'list':
140
+ emit(listEntries(root, args.scope, args.phase), true);
141
+ break;
142
+ case 'supersede': {
143
+ const res = supersede(root, args._[0] || args.id, {
144
+ scope: args.scope, phase: args.phase, claim: args.claim, kind: args.kind,
145
+ source: args.source, verifier: args.verifier, criteria: resolveCriteria(args.criteria),
146
+ run: !!args.run, artifact: typeof args.artifact === 'string' ? args.artifact : undefined,
147
+ cwd,
148
+ });
149
+ publish('wicked.evidence.superseded', 'vault.supersede', {
150
+ scope: args.scope, phase: args.phase, claim_id: args.claim,
151
+ new_id: res.new_id, old_id: res.old_id,
152
+ });
153
+ emit(res, true);
154
+ break;
155
+ }
156
+ default:
157
+ emit({ error: `unknown command: ${cmd}`, commands: ['init', 'record', 'verify', 'inspect', 'attest', 'attestations', 'cross-check', 'declare-contract', 'list', 'supersede'] }, false);
158
+ }
159
+ } catch (e) {
160
+ emit({ error: e.message }, false);
161
+ }