wicked-vault 0.3.1 → 0.4.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +62 -16
- package/bin/wicked-vault.mjs +10 -0
- package/docs/CONTRACTS.md +20 -5
- package/package.json +3 -2
- package/skills/wicked-vault/analyze-evidence/SKILL.md +21 -5
- package/skills/wicked-vault/init/SKILL.md +28 -10
- package/skills/wicked-vault/record-evidence/SKILL.md +6 -3
- package/src/vault.mjs +105 -6
package/README.md
CHANGED
|
@@ -26,15 +26,16 @@ It checks on two tiers (ADR-0002):
|
|
|
26
26
|
hashes, re-run the pure verifier. CI-gate-safe. *Never trust a cached status.*
|
|
27
27
|
- **Judgment tier** — an **independent** evaluator (≠ the agent that did the
|
|
28
28
|
work) judges the frozen evidence against the frozen criteria; the opinion is
|
|
29
|
-
recorded as a
|
|
30
|
-
|
|
29
|
+
recorded as a hash-bound, append-only `opinion_attestation` (mutation-
|
|
30
|
+
detecting in the same sense as the envelope — see "Tamper detection"). *Never
|
|
31
|
+
trust a self-graded "done".*
|
|
31
32
|
|
|
32
33
|
## Boundary
|
|
33
34
|
|
|
34
35
|
| Owns (the primitive) | Refuses (lives in a consumer) |
|
|
35
36
|
|---|---|
|
|
36
37
|
| `record` · `verify` · `inspect` · `attest` · `cross-check` · `supersede` | "is the work *done*?" (gate logic) |
|
|
37
|
-
| criteria-binding +
|
|
38
|
+
| criteria-binding + mutation detection (envelope hash detects naive tamper; committed git history is the audit chain) | scenario/flake history; claim authoring; work-shape |
|
|
38
39
|
| the deterministic verifier family **and** the append-only attestation ledger | running the judge — the model lives in the `analyze-evidence` *skill*, never in the CLI |
|
|
39
40
|
|
|
40
41
|
The consumer authors the contract; the vault evaluates it mechanically (G9) and
|
|
@@ -108,11 +109,21 @@ skill orchestrates an **independent** judge:
|
|
|
108
109
|
2. a model **distinct from the worker** judges criteria-vs-evidence (criteria
|
|
109
110
|
and evidence are passed as escaped *data*, never as instructions).
|
|
110
111
|
3. `attest` records the `{opinion, rationale, evaluator, model, …}` to an
|
|
111
|
-
append-only,
|
|
112
|
+
append-only, hash-bound log (mutation-detecting; the committed git history is
|
|
113
|
+
the durable tamper-evidence — see "Tamper detection").
|
|
112
114
|
|
|
113
115
|
Guarantees that hold: criteria are frozen to the evidence (anti-downgrade);
|
|
114
116
|
`attest` is **fail-closed** on a tampered artifact and **rejects a self-grade**
|
|
115
|
-
(`evaluator == created_by
|
|
117
|
+
(`evaluator == created_by`, compared trimmed + case-folded). The independence
|
|
118
|
+
check is hardened: the worker should record with an explicit `--actor` (or
|
|
119
|
+
`WICKED_VAULT_ACTOR`) — when the artifact carries only an *ambient* identity
|
|
120
|
+
(bare `$USER` / anonymous), `attest` **fails closed** and requires
|
|
121
|
+
`--allow-weak-worker-identity` (which stamps the weakness on the attestation for
|
|
122
|
+
audit), and the **evaluator** identity must itself be an explicit assertion.
|
|
123
|
+
This is a stronger mechanical baseline + audit trail, **not** cryptographic
|
|
124
|
+
independence — a determined human can still assert two distinct strings locally;
|
|
125
|
+
real independence comes from a separate evaluator process/credential and the
|
|
126
|
+
committed git trail. What's traded: a judgment is **not reproducible** —
|
|
116
127
|
it's re-evaluated, not re-derived. The default CI gate stays on the
|
|
117
128
|
deterministic `--integrity-only` path; the judgment tier is opt-in. Threat model
|
|
118
129
|
(prompt injection, lax-bar self-grade) and the council 5–0 review:
|
|
@@ -149,15 +160,48 @@ signal, *not* a deterministic verdict). `wicked.evidence.tampered` is the
|
|
|
149
160
|
high-value alarm: a payload, criteria, or envelope diverged from what was
|
|
150
161
|
recorded (G2).
|
|
151
162
|
|
|
163
|
+
## Tamper detection — what it does, and what it does NOT do
|
|
164
|
+
|
|
165
|
+
Be precise about the word "tamper-evident", because the mechanism is easy to
|
|
166
|
+
overstate:
|
|
167
|
+
|
|
168
|
+
- **What the envelope hash catches:** *naive or accidental* mutation. The
|
|
169
|
+
envelope is an **unkeyed SHA-256** over the artifact's public fields (scope,
|
|
170
|
+
phase, claim, kind, source, verifier, `criteria_sha256`, `payload_sha256`).
|
|
171
|
+
`verify` re-derives every hash from the bytes on disk and re-runs the pure
|
|
172
|
+
verifier — so a hand-edit to a payload, the criteria, or a cached status is
|
|
173
|
+
detected (`hash_ok: false`), and a stale "pass" is never trusted (G3). This
|
|
174
|
+
defeats the common failure modes: a fat-fingered edit, a tool that rewrites a
|
|
175
|
+
file, an agent that flips `status_at_record`.
|
|
176
|
+
- **What it does NOT do:** it is **not** cryptographically tamper-*resistant*
|
|
177
|
+
against a *determined local writer*. Because the hashes are unkeyed and over
|
|
178
|
+
public fields, anyone who can edit `entries/` can also recompute every hash to
|
|
179
|
+
match — `verify` would then return `hash_ok: true` on a forged entry. There is
|
|
180
|
+
no secret key, no signature, no HMAC. **Do not rely on the envelope hash alone
|
|
181
|
+
as a security boundary.**
|
|
182
|
+
- **Where real tamper-EVIDENCE comes from:** the **committed, branch-protected
|
|
183
|
+
git history** of `.wicked-vault/`. Evidence is committed by default; the PR
|
|
184
|
+
diff shows exactly what was recorded, and branch protection prevents silent
|
|
185
|
+
rewrites. This is **audit-trail-grade** tamper-evidence (you can see, in a
|
|
186
|
+
reviewable history, what changed and who changed it) — **not** cryptographic
|
|
187
|
+
immutability (a force-push by a privileged actor can still rewrite history; CI
|
|
188
|
+
branch protection is the backstop). This matches CONTRACTS.md §6 and ADR-0002.
|
|
189
|
+
|
|
190
|
+
In one line: **the envelope hash detects mutation; committed, branch-protected
|
|
191
|
+
git history is what makes that mutation *evident and accountable*.**
|
|
192
|
+
|
|
152
193
|
## Guarantees
|
|
153
194
|
|
|
154
|
-
G1 server-minted ids · G2 envelope
|
|
155
|
-
|
|
156
|
-
|
|
157
|
-
|
|
158
|
-
|
|
159
|
-
|
|
160
|
-
|
|
195
|
+
G1 server-minted ids · **G2 envelope hash — detects naive/accidental payload,
|
|
196
|
+
criteria, or envelope mutation (unkeyed SHA-256 over public fields; binds the
|
|
197
|
+
criteria too). NOT a defense against a determined local writer — see "Tamper
|
|
198
|
+
detection" above.** · **G3 re-derivation (never trust a cached status)** · G4
|
|
199
|
+
honest recording (not sandboxed — harness owns isolation) · G5 fail-closed · G6
|
|
200
|
+
append-only (git history is the audit chain) · G7 verifier purity (CLI never
|
|
201
|
+
calls a model) · G8 contract pinning · G9 mechanical evaluation · **G10
|
|
202
|
+
attestation-chain trust** (independent judgments are recorded, not re-derived;
|
|
203
|
+
distinct from deterministic results). Full text + threat model:
|
|
204
|
+
[`docs/CONTRACTS.md`](docs/CONTRACTS.md). Founding
|
|
161
205
|
decisions + council reviews:
|
|
162
206
|
[`docs/adr/0001`](docs/adr/0001-standalone-and-council-revisions.md) ·
|
|
163
207
|
[`docs/adr/0002`](docs/adr/0002-independent-evaluation-and-criteria-binding.md).
|
|
@@ -174,16 +218,18 @@ evaluator may cite — not the whole story. Nondeterministic observation verifie
|
|
|
174
218
|
## Proof
|
|
175
219
|
|
|
176
220
|
```bash
|
|
177
|
-
npm
|
|
221
|
+
npm test # the full gating suite (cli-baseline + attestation + bus + verifiers)
|
|
222
|
+
npm run prove # record -> tamper -> verify-rejects on a real repo (needs a sibling repo)
|
|
178
223
|
bash test/verifiers.sh # the 5 verifiers, pass + fail cases
|
|
179
|
-
bash test/attestation.sh # criteria-binding, attest fail-closed/independence, require_attestation
|
|
224
|
+
bash test/attestation.sh # criteria-binding, attest fail-closed/independence (incl. weak-identity), payload limit, require_attestation
|
|
180
225
|
bash test/bus-integration.sh # graceful no-op + schema validity + real-bus emission (init/record/attest/cross-check)
|
|
181
226
|
```
|
|
182
227
|
|
|
183
|
-
`
|
|
228
|
+
`npm test` runs the gating proofs (`cli-baseline.sh`, `attestation.sh`,
|
|
229
|
+
`bus-integration.sh`, `verifiers.sh`) and is what CI invokes
|
|
184
230
|
(`.github/workflows/ci.yml`) on ubuntu + macos, with a Windows CLI smoke.
|
|
185
231
|
|
|
186
|
-
Status: v0.3.
|
|
232
|
+
Status: v0.3.1 — deterministic core proven on real repos; criteria-binding +
|
|
187
233
|
independent judgment tier (ADR-0002, council 5–0) implemented and proven;
|
|
188
234
|
wicked-bus integration **proven end-to-end against a real bus** (emit → store →
|
|
189
235
|
poll), optional and fire-and-forget; `--help` on both binaries + a
|
package/bin/wicked-vault.mjs
CHANGED
|
@@ -47,12 +47,17 @@ COMMANDS
|
|
|
47
47
|
record Capture evidence + the criteria it must clear
|
|
48
48
|
--scope S --phase P --claim C --kind K --source "<cmd|file>"
|
|
49
49
|
--criteria "<text|@file>" (--run | --artifact <file>) [--verifier "kind:arg"]
|
|
50
|
+
[--actor ID] (the asserted worker identity; strengthens the
|
|
51
|
+
independence check — falls back to WICKED_VAULT_ACTOR then $USER)
|
|
50
52
|
verify <artifact-id> Integrity tier: re-derive hashes + verifier (deterministic,
|
|
51
53
|
model-free). Exit 0 iff intact AND pass. Surfaces latest opinion.
|
|
52
54
|
inspect <artifact-id> Frozen criteria + evidence + integrity (what a judge evaluates)
|
|
53
55
|
attest <artifact-id> Record an INDEPENDENT judgment (fail-closed; evaluator != creator)
|
|
54
56
|
--opinion <pass|reject|unclear> --rationale "..." --evaluator ID
|
|
55
57
|
[--model prov/ver] [--prompt-hash H] [--sampling '<json>']
|
|
58
|
+
[--allow-weak-worker-identity] (attest anyway when the artifact
|
|
59
|
+
was recorded under an ambient $USER/anonymous identity; the
|
|
60
|
+
weakness is stamped on the attestation for audit)
|
|
56
61
|
attestations <artifact-id> Show the append-only opinion log
|
|
57
62
|
cross-check Mechanical contract verdict; exit 0 iff PASS
|
|
58
63
|
--scope S --phase P [--integrity-only (default) | --with-attestations]
|
|
@@ -67,6 +72,8 @@ GLOBAL
|
|
|
67
72
|
|
|
68
73
|
OUTPUT JSON on stdout; exit code is the gate signal (0 = PASS / success).
|
|
69
74
|
ENV WICKED_VAULT_NO_BUS=1 Disable optional wicked-bus event emission
|
|
75
|
+
WICKED_VAULT_ACTOR=ID Assert the worker identity for record/supersede
|
|
76
|
+
(used by the G10/D4 independence check)
|
|
70
77
|
|
|
71
78
|
Skills (AI CLIs): wicked-vault:{init,record-evidence,verify-evidence,analyze-evidence,cross-check-evidence,update}
|
|
72
79
|
Install skills: npx wicked-vault-install (run with --help for options)
|
|
@@ -127,6 +134,7 @@ try {
|
|
|
127
134
|
scope: args.scope, phase: args.phase, claim: args.claim, kind: args.kind,
|
|
128
135
|
source: args.source, verifier: args.verifier, criteria: resolveCriteria(args.criteria),
|
|
129
136
|
run: !!args.run, artifact: typeof args.artifact === 'string' ? args.artifact : undefined,
|
|
137
|
+
actor: typeof args.actor === 'string' ? args.actor : undefined,
|
|
130
138
|
cwd,
|
|
131
139
|
});
|
|
132
140
|
publish('wicked.evidence.recorded', 'vault.record', {
|
|
@@ -145,6 +153,7 @@ try {
|
|
|
145
153
|
opinion: args.opinion, rationale: args.rationale, evaluator: args.evaluator,
|
|
146
154
|
model: args.model, prompt_hash: args['prompt-hash'],
|
|
147
155
|
sampling: typeof args.sampling === 'string' ? JSON.parse(args.sampling) : undefined,
|
|
156
|
+
allowWeakWorkerIdentity: args['allow-weak-worker-identity'] === true,
|
|
148
157
|
});
|
|
149
158
|
publish('wicked.evidence.attested', 'vault.attest', {
|
|
150
159
|
artifact_id: args._[0] || args.id, attestation_id: res.attestation_id,
|
|
@@ -215,6 +224,7 @@ try {
|
|
|
215
224
|
scope: args.scope, phase: args.phase, claim: args.claim, kind: args.kind,
|
|
216
225
|
source: args.source, verifier: args.verifier, criteria: resolveCriteria(args.criteria),
|
|
217
226
|
run: !!args.run, artifact: typeof args.artifact === 'string' ? args.artifact : undefined,
|
|
227
|
+
actor: typeof args.actor === 'string' ? args.actor : undefined,
|
|
218
228
|
cwd,
|
|
219
229
|
});
|
|
220
230
|
publish('wicked.evidence.superseded', 'vault.supersede', {
|
package/docs/CONTRACTS.md
CHANGED
|
@@ -97,6 +97,7 @@ artifact still verify?* and *is this scope+phase's contract satisfied?*
|
|
|
97
97
|
| `supersedes` | string? | prior artifact id |
|
|
98
98
|
| `contract_version` | string? | the contract hash in force at record |
|
|
99
99
|
| `created_at` / `created_by` | ts / string | actor provenance |
|
|
100
|
+
| `created_by_source` | enum | how `created_by` was resolved: `explicit` (`--actor`) · `env-actor` (`WICKED_VAULT_ACTOR`) · `env-user` (ambient `$USER`, weak) · `anonymous` (none, weak). Governs the G10/D4 independence check — a weak source makes `evaluator != created_by` untrustworthy, so `attest` fails closed unless explicitly overridden. |
|
|
100
101
|
|
|
101
102
|
### 3.2 Contract (exit-criteria — what evidence a scope+phase requires)
|
|
102
103
|
|
|
@@ -136,7 +137,9 @@ Its trust is G10 (attestation-chain), not G3 (re-derivation).
|
|
|
136
137
|
| `artifact_id` | string | the evidence it judges |
|
|
137
138
|
| `opinion` | enum | `pass` · `reject` · `unclear` — deliberately NOT named `verdict`/`status` |
|
|
138
139
|
| `rationale` | string | the judge's reasoning (structured output, not free-form prose injection) |
|
|
139
|
-
| `evaluator` | string | the judging identity — **MUST differ from the artifact's `created_by`** (G10/D4) |
|
|
140
|
+
| `evaluator` | string | the judging identity — **MUST differ from the artifact's `created_by`** (G10/D4), compared trimmed + case-folded; **MUST be an explicit assertion** (an ambient `$USER` evaluator is refused) |
|
|
141
|
+
| `evaluator_source` | enum | provenance of the evaluator identity (`explicit` · `env-actor`); ambient sources are refused at `attest` |
|
|
142
|
+
| `worker_identity_weak` | bool | true if the judged artifact was recorded under a weak/ambient/legacy `created_by_source`; `attest` fails closed in that case unless `--allow-weak-worker-identity` is passed, which stamps this flag for audit |
|
|
140
143
|
| `model` | string | provider/version, e.g. `gemini/2.5-pro` |
|
|
141
144
|
| `prompt_hash` | string? | hash of the prompt template used |
|
|
142
145
|
| `sampling` | object? | `{temperature, …}` — provenance for disagreement analysis |
|
|
@@ -203,8 +206,16 @@ reproducible.**
|
|
|
203
206
|
(a) acceptance criteria are mandatory and bound into the envelope, frozen to
|
|
204
207
|
the evidence (anti-downgrade); (b) the model runs only in the orchestration
|
|
205
208
|
layer (`analyze-evidence` skill) — the CLI never calls a model, so G7 holds;
|
|
206
|
-
(c) `attest` is fail-closed if the frozen inputs no longer hash-match, and
|
|
207
|
-
rejects when `evaluator == created_by
|
|
209
|
+
(c) `attest` is fail-closed if the frozen inputs no longer hash-match, and the
|
|
210
|
+
independence check is hardened: it rejects when `evaluator == created_by`
|
|
211
|
+
(trimmed + case-folded), **requires an explicit (non-ambient) evaluator
|
|
212
|
+
identity**, and **fails closed when the worker identity is ambient/weak**
|
|
213
|
+
(`created_by_source` of `env-user`/`anonymous`/legacy) unless explicitly
|
|
214
|
+
overridden with `--allow-weak-worker-identity` (which records the weakness for
|
|
215
|
+
audit). This is a mechanical baseline + audit trail, **not** cryptographic
|
|
216
|
+
independence — a determined local actor can still assert two strings; real
|
|
217
|
+
independence is a separate evaluator process/credential + the committed git
|
|
218
|
+
trail; (d) judgments are non-reproducible by
|
|
208
219
|
design — "never trust the cached verdict" here means *re-evaluate
|
|
209
220
|
independently*, complementary to G3's *re-derive deterministically*. Threat
|
|
210
221
|
model in §5a.
|
|
@@ -298,8 +309,12 @@ In-repo, committed (Decision D1) — **one file per artifact** (council Q2):
|
|
|
298
309
|
audit-trail-grade tamper-evidence, not cryptographic immutability. G2's
|
|
299
310
|
envelope hash detects payload/verdict mutation; it does not prevent a force-push
|
|
300
311
|
that rewrites both. CI branch protection is the backstop.
|
|
301
|
-
- **Large payloads:** `payload_max_bytes`
|
|
302
|
-
|
|
312
|
+
- **Large payloads:** `payload_max_bytes` (default 1 MiB) is **enforced at
|
|
313
|
+
`record` time** — an over-size payload is rejected fail-closed (G5): no entry
|
|
314
|
+
and no blob are written, keeping the committed repo lean. Set it to `0` to
|
|
315
|
+
disable the guard. (Externalizing over-size blobs out-of-tree with the hash
|
|
316
|
+
recorded in the entry is future hardening, not yet implemented — today the
|
|
317
|
+
contract is "reject", not "externalize".)
|
|
303
318
|
|
|
304
319
|
---
|
|
305
320
|
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "wicked-vault",
|
|
3
|
-
"version": "0.
|
|
3
|
+
"version": "0.4.0",
|
|
4
4
|
"description": "Local-first evidence primitive — record evidence with its acceptance criteria, re-derive integrity deterministically, and record independent third-party judgments. Never trusts a stored verdict, never lets work self-grade its own \"done\".",
|
|
5
5
|
"type": "module",
|
|
6
6
|
"bin": {
|
|
@@ -8,7 +8,7 @@
|
|
|
8
8
|
"wicked-vault-install": "install.mjs"
|
|
9
9
|
},
|
|
10
10
|
"engines": {
|
|
11
|
-
"node": ">=
|
|
11
|
+
"node": ">=20.0.0"
|
|
12
12
|
},
|
|
13
13
|
"license": "MIT",
|
|
14
14
|
"author": "Mike Parcewski",
|
|
@@ -43,6 +43,7 @@
|
|
|
43
43
|
"install.mjs"
|
|
44
44
|
],
|
|
45
45
|
"scripts": {
|
|
46
|
+
"test": "bash test/cli-baseline.sh && bash test/attestation.sh && bash test/bus-integration.sh && bash test/verifiers.sh",
|
|
46
47
|
"prove": "bash test/prove-on-memos.sh",
|
|
47
48
|
"prove:verifiers": "bash test/verifiers.sh",
|
|
48
49
|
"prove:attestation": "bash test/attestation.sh",
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: wicked-vault:analyze-evidence
|
|
3
|
-
description: Have an INDEPENDENT party analyze whether recorded evidence actually meets its frozen acceptance criteria, and record the judgment as a tamper-
|
|
3
|
+
description: Have an INDEPENDENT party analyze whether recorded evidence actually meets its frozen acceptance criteria, and record the judgment as a hash-bound, append-only attestation (mutation-detecting; durable tamper-evidence is the committed git history). Use when judging free-form criteria a deterministic check can't express ("does this adequately address the failure modes"), or producing a third-party sign-off that defeats self-graded "done". Runs a model (non-reproducible, costs a call). For the cheap deterministic integrity check, use wicked-vault:verify-evidence instead.
|
|
4
4
|
---
|
|
5
5
|
|
|
6
6
|
# wicked-vault:analyze-evidence
|
|
@@ -8,7 +8,9 @@ description: Have an INDEPENDENT party analyze whether recorded evidence actuall
|
|
|
8
8
|
This is the vault's **independent referee** — the judgment tier (G10). The agent
|
|
9
9
|
that produced the work cannot grade its own "done"; this flow has a *different*
|
|
10
10
|
evaluator analyze the frozen evidence against its frozen acceptance criteria,
|
|
11
|
-
then records that analysis as a
|
|
11
|
+
then records that analysis as a hash-bound, append-only `opinion_attestation`
|
|
12
|
+
(mutation-detecting; the durable tamper-evidence is the committed git history —
|
|
13
|
+
see the README "Tamper detection" section).
|
|
12
14
|
|
|
13
15
|
**Know what you're invoking.** This skill:
|
|
14
16
|
- **runs a model** (an independent evaluator), so it costs a call and is
|
|
@@ -26,9 +28,23 @@ satisfy the acceptance criteria?"* and the criteria need judgment.
|
|
|
26
28
|
|
|
27
29
|
The evaluator **MUST be distinct from the agent that produced the evidence.**
|
|
28
30
|
Use a separate model CLI (e.g. `gemini`, `codex`) or an isolated subagent — not
|
|
29
|
-
the same context that did the work. The CLI enforces
|
|
30
|
-
|
|
31
|
-
|
|
31
|
+
the same context that did the work. The CLI enforces a hardened floor:
|
|
32
|
+
|
|
33
|
+
- `attest` **rejects** when `--evaluator` equals the artifact's `created_by`
|
|
34
|
+
(compared trimmed + case-folded, so `Alice`/`alice ` can't sidestep it).
|
|
35
|
+
- For the independence claim to be meaningful, the **worker** should record with
|
|
36
|
+
an explicit `--actor "<id>"` (or set `WICKED_VAULT_ACTOR`). If the artifact was
|
|
37
|
+
recorded under only an *ambient* identity (bare `$USER` / anonymous), `attest`
|
|
38
|
+
**fails closed** — pass `--allow-weak-worker-identity` to proceed anyway, which
|
|
39
|
+
stamps `worker_identity_weak: true` on the attestation for audit.
|
|
40
|
+
- The **evaluator** identity must itself be an explicit assertion; a bare ambient
|
|
41
|
+
evaluator id is refused (that is the silent self-grade).
|
|
42
|
+
|
|
43
|
+
This is a stronger mechanical baseline + audit trail, **not** cryptographic
|
|
44
|
+
independence — a determined human can still assert two distinct strings for the
|
|
45
|
+
same person locally. Real independence comes from a genuinely separate evaluator
|
|
46
|
+
process/credential and the committed, branch-protected git trail. Treat the rule
|
|
47
|
+
as real, not as a checkbox.
|
|
32
48
|
|
|
33
49
|
## Orchestration
|
|
34
50
|
|
|
@@ -6,8 +6,11 @@ description: Initialize a wicked-vault in a repository so claims can be backed b
|
|
|
6
6
|
# wicked-vault:init
|
|
7
7
|
|
|
8
8
|
Set up the local-first **evidence primitive** in the current repository. The
|
|
9
|
-
vault records claim-backing artifacts, hashes them
|
|
10
|
-
*re-derives* their verdict on demand — it
|
|
9
|
+
vault records claim-backing artifacts, hashes them so naive/accidental mutation
|
|
10
|
+
is detected on re-derivation, and *re-derives* their verdict on demand — it
|
|
11
|
+
never trusts a stored status. (The hash detects mutation; the committed,
|
|
12
|
+
branch-protected git history is the durable tamper-evidence — see below and the
|
|
13
|
+
README "Tamper detection" section.)
|
|
11
14
|
|
|
12
15
|
## When to use
|
|
13
16
|
|
|
@@ -29,12 +32,17 @@ This creates `.wicked-vault/` at the repo root with:
|
|
|
29
32
|
|
|
30
33
|
```
|
|
31
34
|
.wicked-vault/
|
|
32
|
-
vault.json # schema_version, store_mode: in-repo, payload_max_bytes
|
|
35
|
+
vault.json # schema_version, store_mode: in-repo, payload_max_bytes (enforced on record)
|
|
33
36
|
entries/ # one JSON envelope per recorded artifact (append-only)
|
|
34
37
|
payloads/ # content-addressed payload blobs (sha256-named, deduped)
|
|
35
38
|
contracts/ # consumer-authored contracts, per scope/phase
|
|
39
|
+
attestations/ # append-only independent opinion log, per artifact
|
|
36
40
|
```
|
|
37
41
|
|
|
42
|
+
`payload_max_bytes` (default 1 MiB) is enforced at `record` time: an over-size
|
|
43
|
+
payload is rejected fail-closed (no entry, no blob written) so the committed
|
|
44
|
+
audit chain stays lean. Set it to `0` to disable the guard.
|
|
45
|
+
|
|
38
46
|
`record`, `declare-contract`, and `supersede` auto-create the vault if one
|
|
39
47
|
isn't found, so explicit `init` is mostly for clarity. `verify`, `cross-check`,
|
|
40
48
|
and `list` do **not** auto-create — they fail-closed when no vault exists.
|
|
@@ -42,13 +50,23 @@ and `list` do **not** auto-create — they fail-closed when no vault exists.
|
|
|
42
50
|
The vault is discovered by walking up from the current directory, so any
|
|
43
51
|
subdirectory of the repo can run vault commands.
|
|
44
52
|
|
|
45
|
-
##
|
|
46
|
-
|
|
47
|
-
`store_mode` defaults to `in-repo
|
|
48
|
-
|
|
49
|
-
|
|
50
|
-
|
|
51
|
-
|
|
53
|
+
## Commit the vault — it is the real tamper-evidence backstop
|
|
54
|
+
|
|
55
|
+
`store_mode` defaults to `in-repo`, and **`.wicked-vault/` should be committed.**
|
|
56
|
+
This is not incidental: the envelope hash only catches *naive/accidental*
|
|
57
|
+
mutation — a determined local writer can recompute every hash after editing
|
|
58
|
+
(the hashes are unkeyed SHA-256 over public fields). The protection that
|
|
59
|
+
actually survives a determined editor is the **committed, branch-protected git
|
|
60
|
+
history**: it is what makes after-the-fact tampering visible in a diff and
|
|
61
|
+
preventable with branch protection. Audit-trail-grade, not cryptographic — see
|
|
62
|
+
the README "Tamper detection" section and CONTRACTS.md §6.
|
|
63
|
+
|
|
64
|
+
So **do not git-ignore the vault.** Commit `entries/`, `payloads/`,
|
|
65
|
+
`contracts/`, and `attestations/`; only the derived `index.sqlite` query cache
|
|
66
|
+
is ignored (it is rebuilt from the source of truth). If you have a deliberate
|
|
67
|
+
reason to keep evidence local-only (throwaway scratch, never reviewed), that is
|
|
68
|
+
an explicit opt-out — add `.wicked-vault/` to your `.gitignore` knowing you have
|
|
69
|
+
forfeited the only durable tamper-evidence the vault offers.
|
|
52
70
|
|
|
53
71
|
## Next steps
|
|
54
72
|
|
|
@@ -5,9 +5,12 @@ description: Record a claim-backing artifact in the vault and attach a determini
|
|
|
5
5
|
|
|
6
6
|
# wicked-vault:record-evidence
|
|
7
7
|
|
|
8
|
-
Capture an artifact, hash it
|
|
9
|
-
**re-derive** its verdict later.
|
|
10
|
-
trusts a claimed status (G4).
|
|
8
|
+
Capture an artifact, hash it (so naive/accidental mutation is detected on
|
|
9
|
+
re-derivation), and attach a verifier that can **re-derive** its verdict later.
|
|
10
|
+
The vault does the capture itself — it never trusts a claimed status (G4). The
|
|
11
|
+
hash detects mutation; the committed git history is the durable tamper-evidence
|
|
12
|
+
(see the README "Tamper detection" section — the envelope hash is unkeyed, so it
|
|
13
|
+
is not a defense against a determined local writer).
|
|
11
14
|
|
|
12
15
|
## When to use
|
|
13
16
|
|
package/src/vault.mjs
CHANGED
|
@@ -74,6 +74,45 @@ function loadContract(root, scope, phase) {
|
|
|
74
74
|
return existsSync(p) ? JSON.parse(readFileSync(p, 'utf8')) : null;
|
|
75
75
|
}
|
|
76
76
|
|
|
77
|
+
// Resolve the acting identity for provenance + the G10/D4 independence check.
|
|
78
|
+
// Precedence (strongest first):
|
|
79
|
+
// 1. explicit value (CLI --actor / --evaluator) -> source 'explicit'
|
|
80
|
+
// 2. WICKED_VAULT_ACTOR env (harness-asserted identity) -> source 'env-actor'
|
|
81
|
+
// 3. $USER env (the OS login — easily spoofed) -> source 'env-user'
|
|
82
|
+
// 4. nothing -> source 'anonymous'
|
|
83
|
+
// The *source* matters: 'explicit' and 'env-actor' are deliberate assertions;
|
|
84
|
+
// 'env-user'/'anonymous' are weak and must not silently satisfy independence.
|
|
85
|
+
function resolveActor(explicit) {
|
|
86
|
+
if (typeof explicit === 'string' && explicit.trim() !== '') {
|
|
87
|
+
return { id: explicit.trim(), source: 'explicit' };
|
|
88
|
+
}
|
|
89
|
+
const envActor = process.env.WICKED_VAULT_ACTOR;
|
|
90
|
+
if (typeof envActor === 'string' && envActor.trim() !== '') {
|
|
91
|
+
return { id: envActor.trim(), source: 'env-actor' };
|
|
92
|
+
}
|
|
93
|
+
const user = process.env.USER || process.env.USERNAME; // USERNAME = Windows
|
|
94
|
+
if (typeof user === 'string' && user.trim() !== '') {
|
|
95
|
+
return { id: user.trim(), source: 'env-user' };
|
|
96
|
+
}
|
|
97
|
+
return { id: 'unknown', source: 'anonymous' };
|
|
98
|
+
}
|
|
99
|
+
// Weak identity provenance — derived from ambient env, not deliberately asserted.
|
|
100
|
+
const WEAK_IDENTITY_SOURCES = new Set(['env-user', 'anonymous']);
|
|
101
|
+
|
|
102
|
+
// Read the vault config (vault.json). Falls back to defaults if the file is
|
|
103
|
+
// absent or unreadable — record auto-creates the vault, so a config always
|
|
104
|
+
// exists by the time a payload is captured, but be defensive.
|
|
105
|
+
const DEFAULT_PAYLOAD_MAX_BYTES = 1048576;
|
|
106
|
+
function loadConfig(root) {
|
|
107
|
+
const cfg = join(root, DIR, 'vault.json');
|
|
108
|
+
if (!existsSync(cfg)) return { payload_max_bytes: DEFAULT_PAYLOAD_MAX_BYTES };
|
|
109
|
+
try {
|
|
110
|
+
return JSON.parse(readFileSync(cfg, 'utf8'));
|
|
111
|
+
} catch {
|
|
112
|
+
return { payload_max_bytes: DEFAULT_PAYLOAD_MAX_BYTES };
|
|
113
|
+
}
|
|
114
|
+
}
|
|
115
|
+
|
|
77
116
|
export function record(root, opts) {
|
|
78
117
|
const P = paths(root);
|
|
79
118
|
|
|
@@ -99,6 +138,18 @@ export function record(root, opts) {
|
|
|
99
138
|
throw new Error('record requires --run or --artifact');
|
|
100
139
|
}
|
|
101
140
|
|
|
141
|
+
// Enforce the configured payload ceiling (CONTRACTS.md §6). Oversize payloads
|
|
142
|
+
// are rejected here — before hashing or writing the blob — so a too-large
|
|
143
|
+
// capture can never bloat the committed audit chain. Fail-closed (G5): a
|
|
144
|
+
// rejected record produces NO entry and NO payload blob. `payload_max_bytes`
|
|
145
|
+
// <= 0 disables the guard (escape hatch for an explicitly unbounded vault).
|
|
146
|
+
const cfg = loadConfig(root);
|
|
147
|
+
const maxBytes = typeof cfg.payload_max_bytes === 'number'
|
|
148
|
+
? cfg.payload_max_bytes : DEFAULT_PAYLOAD_MAX_BYTES;
|
|
149
|
+
if (maxBytes > 0 && blob.length > maxBytes) {
|
|
150
|
+
throw new Error(`payload exceeds payload_max_bytes: ${blob.length} > ${maxBytes} (set payload_max_bytes in .wicked-vault/vault.json to raise the limit, or 0 to disable)`);
|
|
151
|
+
}
|
|
152
|
+
|
|
102
153
|
const payload_sha256 = sha256(blob);
|
|
103
154
|
|
|
104
155
|
// G10/D1 — acceptance criteria are mandatory and frozen to the evidence.
|
|
@@ -147,6 +198,11 @@ export function record(root, opts) {
|
|
|
147
198
|
? runVerifier(verifier, payloadView(blob), { repoRoot: opts.cwd || root })
|
|
148
199
|
: { status: 'n/a', detail: 'no deterministic verifier (judgment-tier claim)' };
|
|
149
200
|
|
|
201
|
+
// Actor provenance for the G10/D4 independence assertion. An explicit
|
|
202
|
+
// --actor (or WICKED_VAULT_ACTOR) is a deliberate identity claim; a bare
|
|
203
|
+
// $USER is ambient and weak. attest() uses created_by_source to refuse a
|
|
204
|
+
// silent self-grade where both worker and judge are unasserted (see attest).
|
|
205
|
+
const actor = resolveActor(opts.actor);
|
|
150
206
|
const entry = {
|
|
151
207
|
id, ...fields,
|
|
152
208
|
acceptance_criteria, criteria_authored_by,
|
|
@@ -157,7 +213,8 @@ export function record(root, opts) {
|
|
|
157
213
|
supersedes: null,
|
|
158
214
|
contract_version: contract ? contract.contract_version : null,
|
|
159
215
|
created_at: new Date().toISOString(),
|
|
160
|
-
created_by:
|
|
216
|
+
created_by: actor.id,
|
|
217
|
+
created_by_source: actor.source,
|
|
161
218
|
};
|
|
162
219
|
writeFileSync(join(P.entries, `${id}.json`), JSON.stringify(entry, null, 2));
|
|
163
220
|
return { id, envelope_hash, criteria_authored_by, status_at_record: sr.status, status_detail: sr.detail };
|
|
@@ -267,6 +324,7 @@ export function inspect(root, id) {
|
|
|
267
324
|
acceptance_criteria: entry.acceptance_criteria,
|
|
268
325
|
criteria_authored_by: entry.criteria_authored_by,
|
|
269
326
|
created_by: entry.created_by,
|
|
327
|
+
created_by_source: entry.created_by_source || null,
|
|
270
328
|
evidence: { text: view.text, json: view.json },
|
|
271
329
|
hash_ok: v.hash_ok,
|
|
272
330
|
integrity_status: v.status,
|
|
@@ -287,11 +345,47 @@ export function attest(root, id, opts) {
|
|
|
287
345
|
const entry = JSON.parse(readFileSync(entryPath, 'utf8'));
|
|
288
346
|
|
|
289
347
|
if (!OPINIONS.has(opts.opinion)) throw new Error(`attest: --opinion must be one of pass|reject|unclear (got '${opts.opinion}')`);
|
|
290
|
-
if (typeof opts.evaluator !== 'string' ||
|
|
348
|
+
if (typeof opts.evaluator !== 'string' || opts.evaluator.trim() === '') throw new Error('attest requires --evaluator');
|
|
349
|
+
|
|
350
|
+
// G10/D4 — mechanical independence, hardened. The judge must be a DELIBERATELY
|
|
351
|
+
// ASSERTED identity that differs from the worker. Three failure modes are
|
|
352
|
+
// closed here (all on top of the existing equality check):
|
|
353
|
+
//
|
|
354
|
+
// (a) trivial-equality bypass — compare trimmed + case-folded so 'Alice',
|
|
355
|
+
// 'alice', and 'Alice ' can't sidestep the self-grade rejection.
|
|
356
|
+
// (b) ambiguous worker identity — if the artifact was recorded under an
|
|
357
|
+
// ambient identity ($USER / anonymous, created_by_source weak), the
|
|
358
|
+
// independence claim cannot be trusted from a string compare alone.
|
|
359
|
+
// We FAIL CLOSED unless the caller acknowledges it explicitly
|
|
360
|
+
// (--allow-weak-worker-identity / opts.allowWeakWorkerIdentity), and we
|
|
361
|
+
// stamp the weakness onto the attestation so audit can see it.
|
|
362
|
+
// (c) ambiguous evaluator identity — the evaluator must be an explicit
|
|
363
|
+
// assertion. A bare ambient identity for the JUDGE is refused: that is
|
|
364
|
+
// exactly the silent self-grade the env var would otherwise enable.
|
|
365
|
+
//
|
|
366
|
+
// This is a stronger mechanical baseline + audit trail, NOT cryptographic
|
|
367
|
+
// independence. A determined human can still assert two distinct strings for
|
|
368
|
+
// the same person locally; real independence comes from a separate evaluator
|
|
369
|
+
// process/credential (see analyze-evidence skill) and the committed git trail.
|
|
370
|
+
const evaluator = resolveActor(opts.evaluator);
|
|
371
|
+
const norm = (s) => (typeof s === 'string' ? s.trim().toLowerCase() : '');
|
|
372
|
+
|
|
373
|
+
if (WEAK_IDENTITY_SOURCES.has(evaluator.source)) {
|
|
374
|
+
throw new Error(`attest refused (G10/D4): evaluator identity is ambient (${evaluator.source}='${evaluator.id}'), not a deliberate assertion. Pass an explicit --evaluator naming the independent judge (e.g. a model CLI or reviewer id) so a self-grade can't slip through silently.`);
|
|
375
|
+
}
|
|
376
|
+
|
|
377
|
+
if (entry.created_by && norm(evaluator.id) === norm(entry.created_by)) {
|
|
378
|
+
throw new Error(`attest refused (G10/D4): evaluator '${evaluator.id}' equals the artifact creator '${entry.created_by}' — a judgment must be independent of the worker`);
|
|
379
|
+
}
|
|
291
380
|
|
|
292
|
-
//
|
|
293
|
-
|
|
294
|
-
|
|
381
|
+
// The worker's identity provenance governs how much the independence claim is
|
|
382
|
+
// worth. A weak (ambient) worker identity means "different string" proves
|
|
383
|
+
// little. Fail closed unless the caller explicitly accepts that risk.
|
|
384
|
+
const workerSource = entry.created_by_source
|
|
385
|
+
|| (entry.created_by && entry.created_by !== 'unknown' ? 'legacy' : 'anonymous');
|
|
386
|
+
const workerIdentityWeak = WEAK_IDENTITY_SOURCES.has(workerSource) || workerSource === 'legacy';
|
|
387
|
+
if (workerIdentityWeak && !opts.allowWeakWorkerIdentity) {
|
|
388
|
+
throw new Error(`attest refused (G10/D4): the artifact was recorded under a weak/ambient worker identity (created_by_source='${workerSource}'), so 'evaluator != created_by' is not a trustworthy independence signal. Re-record with an explicit --actor for the worker, or pass --allow-weak-worker-identity to attest anyway (the weakness is stamped on the attestation for audit).`);
|
|
295
389
|
}
|
|
296
390
|
|
|
297
391
|
// Fail-closed (G5/G10): never attest against a tampered artifact.
|
|
@@ -303,12 +397,17 @@ export function attest(root, id, opts) {
|
|
|
303
397
|
artifact_id: id,
|
|
304
398
|
opinion: opts.opinion,
|
|
305
399
|
rationale: opts.rationale || '',
|
|
306
|
-
evaluator:
|
|
400
|
+
evaluator: evaluator.id,
|
|
401
|
+
evaluator_source: evaluator.source, // provenance of the judge identity (G10/D4)
|
|
307
402
|
model: opts.model || null,
|
|
308
403
|
prompt_hash: opts.prompt_hash || null,
|
|
309
404
|
sampling: opts.sampling || null,
|
|
310
405
|
evidence_sha256: entry.payload_sha256,
|
|
311
406
|
criteria_sha256: entry.criteria_sha256,
|
|
407
|
+
// Audit flag: the worker identity this independence claim rests on was weak
|
|
408
|
+
// (ambient $USER / anonymous / legacy). The attestation was allowed via an
|
|
409
|
+
// explicit acknowledgement; a downstream gate may choose to discount it.
|
|
410
|
+
worker_identity_weak: workerIdentityWeak,
|
|
312
411
|
created_at: new Date().toISOString(),
|
|
313
412
|
};
|
|
314
413
|
// tamper-evident binding over the attestation tuple (G2-style, G10)
|