wicked-vault 0.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +176 -0
- package/bin/wicked-vault.mjs +161 -0
- package/docs/CONTRACTS.md +421 -0
- package/docs/adr/0001-standalone-and-council-revisions.md +101 -0
- package/docs/adr/0002-independent-evaluation-and-criteria-binding.md +184 -0
- package/install.mjs +192 -0
- package/package.json +52 -0
- package/skills/wicked-vault/analyze-evidence/SKILL.md +119 -0
- package/skills/wicked-vault/cross-check-evidence/SKILL.md +141 -0
- package/skills/wicked-vault/init/SKILL.md +58 -0
- package/skills/wicked-vault/record-evidence/SKILL.md +129 -0
- package/skills/wicked-vault/verify-evidence/SKILL.md +76 -0
- package/src/bus.mjs +75 -0
- package/src/hash.mjs +40 -0
- package/src/id.mjs +9 -0
- package/src/vault.mjs +425 -0
- package/src/verifiers.mjs +84 -0
|
@@ -0,0 +1,421 @@
|
|
|
1
|
+
# wicked-vault — Interaction & Contract Specification
|
|
2
|
+
|
|
3
|
+
**Status:** v2 — council-reviewed twice (see `adr/0001-…` and
|
|
4
|
+
`adr/0002-independent-evaluation-and-criteria-binding.md`). Standalone product
|
|
5
|
+
confirmed. Defines the contracts every consumer integrates against. Sibling to
|
|
6
|
+
wicked-bus / wicked-brain / wicked-testing.
|
|
7
|
+
|
|
8
|
+
> v1 changed §4 (G3/G4 honest scoping + new G9), §5 (scope cut to 5
|
|
9
|
+
> deterministic verifiers; `llm_eval` removed), §6 (per-entry storage), and §12
|
|
10
|
+
> (decisions resolved) per the council. Rationale + dissent: ADR-0001.
|
|
11
|
+
>
|
|
12
|
+
> **v2 (ADR-0002, council 5–0 Accept-with-Revisions)** adds an *independent
|
|
13
|
+
> evaluation* tier on top of the deterministic core: acceptance criteria are
|
|
14
|
+
> mandatory and hashed into the envelope (§3.1); a new **`opinion_attestation`**
|
|
15
|
+
> data contract (§3.4) holds non-reproducible model judgments, kept strictly
|
|
16
|
+
> distinct from deterministic verifier results; new invariant **G10**
|
|
17
|
+
> (attestation-chain trust, §4); `verify` is two-tier with an integrity-only
|
|
18
|
+
> default (§8); new events (§7). The Node CLI still **never calls a model** —
|
|
19
|
+
> G7 holds; the judge runs in the `analyze-evidence` skill.
|
|
20
|
+
|
|
21
|
+
wicked-vault is the **evidence primitive**: it records claim-backing
|
|
22
|
+
artifacts, hashes them tamper-evidently, and *re-derives* their status on
|
|
23
|
+
demand — never trusting a stored verdict. It is consumed by wicked-garden's
|
|
24
|
+
compiled harness, by wicked-testing, by hand-run builds, and by CI directly.
|
|
25
|
+
|
|
26
|
+
Derived from the proven semantics of command_iq's `the-vault.EvidencePort`
|
|
27
|
+
(server-minted ids, four-column envelope hash, never-trust-cached
|
|
28
|
+
re-derivation, template registry, fail-closed), translated to a portable,
|
|
29
|
+
app-free, git-native standalone.
|
|
30
|
+
|
|
31
|
+
---
|
|
32
|
+
|
|
33
|
+
## 1. Identity & boundary
|
|
34
|
+
|
|
35
|
+
| Owns (the primitive) | Refuses (lives in a consumer) |
|
|
36
|
+
|---|---|
|
|
37
|
+
| `record` — independent capture: vault runs the source, hashes the payload, mints the id | "is the work *done*?" → wicked-garden gate logic / triggers |
|
|
38
|
+
| `verify` — re-derive status against the payload; never read a cached status | scenario / flake / verdict-*history* semantics → wicked-testing |
|
|
39
|
+
| `cross-check` — claims → artifacts → verdict vs. a pinned contract | claim *authoring* / work-shape / archetype → wicked-garden |
|
|
40
|
+
| `supersede` — atomic, append-only replacement | risk-surface → which-claims policy (consumer supplies the contract) |
|
|
41
|
+
| verifier family + tamper-evidence (envelope hash; git as audit chain) | notification / dashboards (subscribe to vault events instead) |
|
|
42
|
+
|
|
43
|
+
**The boundary is the package boundary.** The vault cannot decide "done" —
|
|
44
|
+
it has no gate logic to leak. It only answers two questions: *does this
|
|
45
|
+
artifact still verify?* and *is this scope+phase's contract satisfied?*
|
|
46
|
+
|
|
47
|
+
---
|
|
48
|
+
|
|
49
|
+
## 2. Place in the family
|
|
50
|
+
|
|
51
|
+
```
|
|
52
|
+
wicked-garden (compiler+harness) wicked-testing (ledger)
|
|
53
|
+
│ declares contracts, │ records runs,
|
|
54
|
+
│ fires cross-check triggers │ cites artifact ids
|
|
55
|
+
▼ ▼
|
|
56
|
+
┌──────────────────── wicked-vault ───────────────────┐
|
|
57
|
+
│ record / verify / cross-check / supersede │
|
|
58
|
+
│ verifier registry · envelope hash · contracts │
|
|
59
|
+
└──────────────────────────────────────────────────────┘
|
|
60
|
+
│ emits events │ stores in
|
|
61
|
+
▼ ▼
|
|
62
|
+
wicked-bus git (audit chain)
|
|
63
|
+
```
|
|
64
|
+
|
|
65
|
+
- **wicked-garden**: consumes via integration-discovery. Its compiler emits
|
|
66
|
+
per-repo contracts and triggers that call `cross-check`. Its
|
|
67
|
+
`scripts/qe/evidence_tracker.py` ("satisfied when *claimed*") is **replaced**
|
|
68
|
+
by reading vault verdicts ("satisfied when *verified*").
|
|
69
|
+
- **wicked-testing**: substrate-ready consumer. A scenario run `record`s its
|
|
70
|
+
evidence; the ledger stores its verdict citing the `artifact_id`; history
|
|
71
|
+
queries `verify` to re-derive. (Migration not forced — see Decision D2.)
|
|
72
|
+
- **wicked-bus**: vault emits lifecycle events; consumers subscribe.
|
|
73
|
+
|
|
74
|
+
---
|
|
75
|
+
|
|
76
|
+
## 3. Data contracts
|
|
77
|
+
|
|
78
|
+
### 3.1 Artifact (the recorded evidence unit — immutable)
|
|
79
|
+
|
|
80
|
+
| Field | Type | Notes |
|
|
81
|
+
|---|---|---|
|
|
82
|
+
| `id` | string (ULID) | **server-minted**; caller MUST NOT supply (G1) |
|
|
83
|
+
| `scope` | string | unit of work (branch / PR / epic id) |
|
|
84
|
+
| `phase` | string | e.g. `test`, `build`, `review` |
|
|
85
|
+
| `claim_id` | string | the claim this artifact backs |
|
|
86
|
+
| `kind` | enum | `test-run`·`typecheck`·`build`·`pr-check`·`http-probe`·`review-verdict`·`custom` |
|
|
87
|
+
| `source` | string | provenance: the command, file path, or URL that produced the payload — **pinned by the contract** (G8) |
|
|
88
|
+
| `verifier` | `{kind, params}`? | optional deterministic sub-check (see §5) — a composable signal an evaluator may cite, no longer the whole story |
|
|
89
|
+
| `acceptance_criteria` | string | **mandatory (G10/D1)** — the bar this evidence claims to clear; free-form text or `@file`. Frozen to the evidence. |
|
|
90
|
+
| `criteria_sha256` | string | hash of `acceptance_criteria`; bound into the envelope (anti-downgrade) |
|
|
91
|
+
| `criteria_authored_by` | enum | `contract` (trusted — pinned via `declare-contract`) · `record` (worker-supplied — weaker provenance, auditable) |
|
|
92
|
+
| `payload_sha256` | string | hash of the captured payload blob |
|
|
93
|
+
| `payload_ref` | string | `payloads/<sha256>` (content-addressed) |
|
|
94
|
+
| `envelope_hash` | string | sha256 over canonical(`scope,phase,claim_id,kind,source,verifier,criteria_sha256,payload_sha256`) (G2) — **now binds the criteria** |
|
|
95
|
+
| `status_at_record` | enum | verifier result computed **once** at record — informational; `verify` NEVER reads it (G3) |
|
|
96
|
+
| `state` | enum | `active` · `superseded` |
|
|
97
|
+
| `supersedes` | string? | prior artifact id |
|
|
98
|
+
| `contract_version` | string? | the contract hash in force at record |
|
|
99
|
+
| `created_at` / `created_by` | ts / string | actor provenance |
|
|
100
|
+
|
|
101
|
+
### 3.2 Contract (exit-criteria — what evidence a scope+phase requires)
|
|
102
|
+
|
|
103
|
+
| Field | Type | Notes |
|
|
104
|
+
|---|---|---|
|
|
105
|
+
| `scope` / `phase` | string | |
|
|
106
|
+
| `required_evidence` | `[{claim_id, kind, source_pin, verifier, required: bool}]` | the pinned shape — prevents criterion/verifier downgrade (G8) |
|
|
107
|
+
| `contract_version` | string | sha256 of the canonicalized `required_evidence` set — detects contract drift |
|
|
108
|
+
| `origin` | string | who declared it (the wicked-garden compiler, typically) |
|
|
109
|
+
|
|
110
|
+
### 3.3 Verdict (cross-check output)
|
|
111
|
+
|
|
112
|
+
```jsonc
|
|
113
|
+
{
|
|
114
|
+
"scope": "...", "phase": "test", "contract_version": "ab12…",
|
|
115
|
+
"overall": "PASS | REJECT | ERROR",
|
|
116
|
+
"claims": [
|
|
117
|
+
{ "claim_id": "tests-pass", "artifact_id": "01J…",
|
|
118
|
+
"in_contract": true, "hash_ok": true,
|
|
119
|
+
"verifier_status": "pass",
|
|
120
|
+
"result": "PASS | FAIL | MISSING | STALE | ERROR" }
|
|
121
|
+
],
|
|
122
|
+
"evaluated_at": "…", "evaluated_by": "…"
|
|
123
|
+
}
|
|
124
|
+
```
|
|
125
|
+
|
|
126
|
+
### 3.4 Opinion attestation (independent judgment — append-only, NON-reproducible)
|
|
127
|
+
|
|
128
|
+
A distinct type from §3.1/§3.3 — **never commingled** with deterministic
|
|
129
|
+
verifier results (council revision #1). It records that an *independent* judge
|
|
130
|
+
evaluated the frozen criteria against the frozen evidence at a point in time.
|
|
131
|
+
Its trust is G10 (attestation-chain), not G3 (re-derivation).
|
|
132
|
+
|
|
133
|
+
| Field | Type | Notes |
|
|
134
|
+
|---|---|---|
|
|
135
|
+
| `attestation_id` | string (ULID) | server-minted |
|
|
136
|
+
| `artifact_id` | string | the evidence it judges |
|
|
137
|
+
| `opinion` | enum | `pass` · `reject` · `unclear` — deliberately NOT named `verdict`/`status` |
|
|
138
|
+
| `rationale` | string | the judge's reasoning (structured output, not free-form prose injection) |
|
|
139
|
+
| `evaluator` | string | the judging identity — **MUST differ from the artifact's `created_by`** (G10/D4) |
|
|
140
|
+
| `model` | string | provider/version, e.g. `gemini/2.5-pro` |
|
|
141
|
+
| `prompt_hash` | string? | hash of the prompt template used |
|
|
142
|
+
| `sampling` | object? | `{temperature, …}` — provenance for disagreement analysis |
|
|
143
|
+
| `evidence_sha256` / `criteria_sha256` | string | the frozen inputs judged — used to flag `stale` if the artifact changed |
|
|
144
|
+
| `attestation_hash` | string | sha256 over the canonical attestation tuple — tamper-evident (G2-style) |
|
|
145
|
+
| `created_at` | ts | when judged |
|
|
146
|
+
|
|
147
|
+
Stored append-only at `attestations/<artifact_id>/<attestation_id>.json` (G6).
|
|
148
|
+
Multiple attestations per artifact are expected and retained — they surface
|
|
149
|
+
evaluator disagreement over identical inputs. `verify` returns the *latest* one
|
|
150
|
+
for reference, flagged `stale` if `evidence_sha256`/`criteria_sha256` no longer
|
|
151
|
+
match the artifact. **It is never re-derived; it is never trusted as
|
|
152
|
+
reproducible.**
|
|
153
|
+
|
|
154
|
+
---
|
|
155
|
+
|
|
156
|
+
## 4. Guarantee invariants (the load-bearing promises)
|
|
157
|
+
|
|
158
|
+
- **G1 server-minted ids** — the caller cannot supply or forge an id.
|
|
159
|
+
- **G2 envelope hash** — bound over the identifying tuple + payload hash;
|
|
160
|
+
recomputed and checked on *every* `verify`. Any mutation ⇒ `hash_ok:false`.
|
|
161
|
+
- **G3 re-derivation (integrity tier)** — `verify` re-runs the deterministic
|
|
162
|
+
verifier against the payload and re-checks the envelope, returning a fresh
|
|
163
|
+
integrity status. It **never** reads `status_at_record`. (BA-1 defense.)
|
|
164
|
+
*Bound:* G3 proves the recorded payload+criteria still verify and are
|
|
165
|
+
untampered; it does not re-prove the payload was captured honestly (G4), and
|
|
166
|
+
it does **not** cover the judgment tier — independent judgments are governed by
|
|
167
|
+
G10, not re-derived.
|
|
168
|
+
- **G4 honest recording, NOT sandboxed capture** — `record --run` executes the
|
|
169
|
+
source and captures its output; `record --artifact <file>` records
|
|
170
|
+
caller-supplied content. In both cases the vault hashes the payload and runs
|
|
171
|
+
the verifier — it trusts no *claimed* status. **Threat model, stated plainly:**
|
|
172
|
+
the vault defends against (a) post-hoc tampering — mutating a recorded payload
|
|
173
|
+
or verdict (caught by G2) — and (b) stale-cache trust (caught by G3). It does
|
|
174
|
+
**NOT** defend against a poisoned capture environment (PATH/env/cwd are
|
|
175
|
+
inherited from the caller); execution isolation is the harness's / CI's
|
|
176
|
+
responsibility, exactly as it was the runtime's in command_iq. `--artifact`
|
|
177
|
+
mode proves the artifact verifies against its pinned verifier; it does not
|
|
178
|
+
prove independent capture. Sandboxed capture is future hardening (later ADR).
|
|
179
|
+
- **G5 fail-closed** — missing artifact / unknown verifier / missing contract /
|
|
180
|
+
source-pin mismatch ⇒ `ERROR`/`REJECT`, never `PASS`.
|
|
181
|
+
- **G6 append-only** — artifacts are immutable; `supersede` writes a new row +
|
|
182
|
+
flips state atomically. git is the audit chain.
|
|
183
|
+
- **G7 verifier purity** — a verifier is a pure, deterministic function of
|
|
184
|
+
`(payload, params)`. Re-derivable; no hidden state. (Nondeterministic kinds
|
|
185
|
+
are quarantined — see §5.)
|
|
186
|
+
- **G8 contract pinning** — `kind`, `source`, and `verifier` are pinned per
|
|
187
|
+
claim in the contract; `record` rejects a downgrade (e.g. swapping a strict
|
|
188
|
+
verifier for a weaker one, or pointing `source` at a different command).
|
|
189
|
+
- **G9 mechanical evaluation (the boundary, enforced)** — the vault never
|
|
190
|
+
*authors* a contract; the consumer (wicked-garden compiler / wicked-testing)
|
|
191
|
+
does. `cross-check`'s verdict is a pure function of `(consumer-authored
|
|
192
|
+
contract, recorded artifacts)`: for each required claim, does an active
|
|
193
|
+
artifact exist whose `verify()` passes and whose `source`/`verifier` match the
|
|
194
|
+
pin? The vault decides *whether the contract is satisfied*, never *what the
|
|
195
|
+
contract should require*. This is what keeps `cross-check` a primitive and not
|
|
196
|
+
gate-decision policy — answering the council's Q4.
|
|
197
|
+
- **G10 attestation-chain trust (the judgment tier — ADR-0002)** — an
|
|
198
|
+
independent judgment's trust is the trust of its *attestation chain* (frozen
|
|
199
|
+
`{criteria, evidence}` + `evaluator` identity + `model`/`prompt`/`sampling`
|
|
200
|
+
provenance + tamper-evident `attestation_hash`), **not** re-derivation. The
|
|
201
|
+
integrity tier (G1–G9) and the judgment tier (G10) are **distinct guarantee
|
|
202
|
+
types and are never represented as the same kind of result.** Corollaries:
|
|
203
|
+
(a) acceptance criteria are mandatory and bound into the envelope, frozen to
|
|
204
|
+
the evidence (anti-downgrade); (b) the model runs only in the orchestration
|
|
205
|
+
layer (`analyze-evidence` skill) — the CLI never calls a model, so G7 holds;
|
|
206
|
+
(c) `attest` is fail-closed if the frozen inputs no longer hash-match, and
|
|
207
|
+
rejects when `evaluator == created_by`; (d) judgments are non-reproducible by
|
|
208
|
+
design — "never trust the cached verdict" here means *re-evaluate
|
|
209
|
+
independently*, complementary to G3's *re-derive deterministically*. Threat
|
|
210
|
+
model in §5a.
|
|
211
|
+
|
|
212
|
+
---
|
|
213
|
+
|
|
214
|
+
## 5. Verifier contract (the extension point)
|
|
215
|
+
|
|
216
|
+
**Interface:** `verify(payload: bytes, params: dict) -> {status: "pass"|"fail", detail: str}`.
|
|
217
|
+
Pure, deterministic, side-effect-free beyond reading the payload.
|
|
218
|
+
|
|
219
|
+
**v1 core — 5 deterministic verifiers only** (council Q5: scope cut):
|
|
220
|
+
|
|
221
|
+
| Kind | Params |
|
|
222
|
+
|---|---|
|
|
223
|
+
| `exit_code_eq` | `{code: 0}` |
|
|
224
|
+
| `regex_match` | `{pattern, flags}` |
|
|
225
|
+
| `not_contains` | `{pattern}` |
|
|
226
|
+
| `jq_pred` | `{expr}` |
|
|
227
|
+
| `commit_exists` | `{sha}` (shells `git cat-file -e`) |
|
|
228
|
+
|
|
229
|
+
All five are pure, deterministic functions of `(payload, params)` — re-derivable
|
|
230
|
+
indefinitely (G7 holds). `structural_eq` is a deferred deterministic add (niche
|
|
231
|
+
golden-file comparison).
|
|
232
|
+
|
|
233
|
+
**Deferred — observation-verifier extension (separate spec, NOT v1 core).**
|
|
234
|
+
`pr_check_status` (`gh pr checks`) and `http_status_eq` are point-in-time
|
|
235
|
+
*observations*, not re-derivable facts. They ship as a distinct tier with
|
|
236
|
+
**explicitly different semantics**: `verify` performs a *fresh capture* rather
|
|
237
|
+
than re-deriving the old payload, params are **pinned by the contract** (URL,
|
|
238
|
+
PR — never agent-controlled), and G7 is declared inapplicable for the tier.
|
|
239
|
+
Sequenced here because the ci-aware-merge discipline needs `pr_check_status` —
|
|
240
|
+
but it does not belong in the deterministic founding spec.
|
|
241
|
+
|
|
242
|
+
**`llm_eval` is still NOT a verifier kind.** A probabilistic judge is neither
|
|
243
|
+
pure, deterministic, nor re-derivable — registering it as a verifier would
|
|
244
|
+
falsify G7 at the type level (ADR-0001 council disqualifier, upheld). ADR-0002
|
|
245
|
+
adds independent judgment **at a different layer**: the model runs in the
|
|
246
|
+
`analyze-evidence` *skill*, never in the CLI, and its output is recorded as an
|
|
247
|
+
`opinion_attestation` (§3.4) under G10 — a distinct, non-reproducible type, not
|
|
248
|
+
a verifier result. G7's boundary is intact; the capability lives above it.
|
|
249
|
+
|
|
250
|
+
**Custom verifiers** register via `verifiers/<kind>.{js,py}` exporting the
|
|
251
|
+
interface + a `determinism` declaration. Unknown kind at `verify` ⇒ `ERROR`
|
|
252
|
+
(G5), never a silent pass.
|
|
253
|
+
|
|
254
|
+
### 5a. Judgment-tier threat model (ADR-0002 D7)
|
|
255
|
+
|
|
256
|
+
The evidence payload and (worker-supplied) acceptance criteria are
|
|
257
|
+
attacker-influenceable inputs to the judge. Stated plainly:
|
|
258
|
+
|
|
259
|
+
- **T1 — lax-bar self-grade:** a worker authors weak criteria → guaranteed
|
|
260
|
+
`pass`. *Mitigation:* contract-pinned criteria (`criteria_authored_by:
|
|
261
|
+
contract`) are the trusted path; worker-supplied criteria are recorded as
|
|
262
|
+
`criteria_authored_by: record` and treated as a weaker provenance class.
|
|
263
|
+
- **T2 — payload/criteria prompt injection:** content steers the judge.
|
|
264
|
+
*Mitigations the skill MUST apply:* feed evidence + criteria as **escaped,
|
|
265
|
+
quoted data**, never as instructions; require a **structured output schema**
|
|
266
|
+
(opinion + rationale + cited sub-checks); **`unclear`/refuse on
|
|
267
|
+
instruction-conflict**; **fail-closed** on unparseable evaluator output.
|
|
268
|
+
- **Residual risk (honest scoping, per ADR-0001 Q6):** a capable injection may
|
|
269
|
+
still flip a judgment. The attestation chain makes inputs + evaluator
|
|
270
|
+
auditable after the fact; it does not prevent T2 in v1.
|
|
271
|
+
|
|
272
|
+
|
|
273
|
+
---
|
|
274
|
+
|
|
275
|
+
## 6. Storage contract (portable, git-native)
|
|
276
|
+
|
|
277
|
+
In-repo, committed (Decision D1) — **one file per artifact** (council Q2):
|
|
278
|
+
|
|
279
|
+
```
|
|
280
|
+
.wicked-vault/
|
|
281
|
+
vault.json # {schema_version, store_mode, payload_max_bytes}
|
|
282
|
+
entries/<ulid>.json # SOURCE OF TRUTH — ONE artifact per file
|
|
283
|
+
payloads/<sha256> # content-addressed payload blobs
|
|
284
|
+
contracts/<scope>/<phase>.json
|
|
285
|
+
index.sqlite # DERIVED query cache (gitignored), rebuilt by `reindex`
|
|
286
|
+
```
|
|
287
|
+
|
|
288
|
+
- **One file per artifact eliminates the write-serialization bottleneck.**
|
|
289
|
+
Concurrent CI jobs / branches never touch the same path, so there are no
|
|
290
|
+
merge conflicts on the source of truth. (This replaced a single
|
|
291
|
+
`manifest.jsonl`, which the council flagged as a disqualifier under
|
|
292
|
+
concurrent writers — and the standalone decision (ADR-0001) makes concurrent
|
|
293
|
+
writers the default case.)
|
|
294
|
+
- **Source of truth = `entries/` + `payloads/`** (supersedes v0 Decision D5).
|
|
295
|
+
Both content-addressed and committed ⇒ git is the audit chain: append-only,
|
|
296
|
+
tamper-evident, and the PR diff shows exactly what evidence was added.
|
|
297
|
+
*Caveat (honest):* git history is rewritable by a determined actor — this is
|
|
298
|
+
audit-trail-grade tamper-evidence, not cryptographic immutability. G2's
|
|
299
|
+
envelope hash detects payload/verdict mutation; it does not prevent a force-push
|
|
300
|
+
that rewrites both. CI branch protection is the backstop.
|
|
301
|
+
- **Large payloads:** `payload_max_bytes` guard; over-size payloads externalize
|
|
302
|
+
(hash recorded in the entry, blob stored out-of-tree) to keep the repo lean.
|
|
303
|
+
|
|
304
|
+
---
|
|
305
|
+
|
|
306
|
+
## 7. Event contract (wicked-bus)
|
|
307
|
+
|
|
308
|
+
Emits (domain `vault`):
|
|
309
|
+
|
|
310
|
+
| Event | Payload | Subscribers |
|
|
311
|
+
|---|---|---|
|
|
312
|
+
| `vault:artifact:recorded` | `{id, scope, phase, claim_id, kind, criteria_authored_by}` | testing, dashboards |
|
|
313
|
+
| `vault:artifact:verified` | `{id, hash_ok, status}` | garden triggers |
|
|
314
|
+
| `vault:crosscheck:completed` | `{scope, phase, overall, contract_version}` | garden, testing |
|
|
315
|
+
| `vault:artifact:superseded` | `{old_id, new_id}` | testing |
|
|
316
|
+
| `vault:verify:failed` | `{id, reason}` | dashboards, alerting |
|
|
317
|
+
| `wicked.evidence.attested` | `{artifact_id, attestation_id, opinion, evaluator, model, stale}` | garden, testing, dashboards |
|
|
318
|
+
| `wicked.claim.evaluated` | `{scope, phase, claim_id, opinion, evaluator}` | garden gate (opt-in tier) |
|
|
319
|
+
|
|
320
|
+
Vault is a pure **producer** here (mirrors `the-vault`'s subscriber-clean shape).
|
|
321
|
+
Attestation events (G10 tier) carry `evaluator`/`model` so subscribers can weigh
|
|
322
|
+
provenance; they are explicitly *not* deterministic-verdict events.
|
|
323
|
+
|
|
324
|
+
---
|
|
325
|
+
|
|
326
|
+
## 8. CLI contract (authoritative, cross-language)
|
|
327
|
+
|
|
328
|
+
The CLI with stable `--json` output is the lingua franca (Python wicked-garden
|
|
329
|
+
scripts, Node consumers, CI shell all call it identically).
|
|
330
|
+
|
|
331
|
+
```
|
|
332
|
+
wicked-vault record --scope S --phase P --claim C --kind K \
|
|
333
|
+
--source "<cmd|file|url>" --criteria "<text|@file>" \
|
|
334
|
+
[--verifier "exit_code_eq:0"] (--run | --artifact <file>)
|
|
335
|
+
-> {id, envelope_hash, criteria_authored_by, status_at_record?}
|
|
336
|
+
wicked-vault verify <artifact-id> -> {id, hash_ok, status, rederived:true, latest_attestation?} (integrity tier; exit 0 iff pass+hash_ok)
|
|
337
|
+
wicked-vault inspect <artifact-id> -> {criteria, evidence, hash_ok, raw} (what the skill feeds the judge)
|
|
338
|
+
wicked-vault attest <artifact-id> --opinion <pass|reject|unclear> --rationale <t> \
|
|
339
|
+
--evaluator <id> --model <prov/ver> [--prompt-hash h] [--sampling <json>]
|
|
340
|
+
-> {attestation_id, attestation_hash} (fail-closed if tampered; reject if evaluator==created_by)
|
|
341
|
+
wicked-vault attestations <artifact-id> -> [OpinionAttestation…] (append-only log)
|
|
342
|
+
wicked-vault cross-check --scope S --phase P [--integrity-only | --with-attestations]
|
|
343
|
+
-> Verdict (default --integrity-only; exit 0 iff overall PASS)
|
|
344
|
+
wicked-vault supersede <artifact-id> (--run|--artifact …) -> {new_id, old_id}
|
|
345
|
+
wicked-vault declare-contract --scope S --phase P --spec <f> -> {contract_version}
|
|
346
|
+
wicked-vault list --scope S [--phase P] -> [Artifact…]
|
|
347
|
+
```
|
|
348
|
+
|
|
349
|
+
Vault root auto-detected by walking up to `.wicked-vault/`; `--cwd` overrides.
|
|
350
|
+
Every command emits JSON and exits non-zero on `FAIL`/`ERROR` (G5). The model
|
|
351
|
+
judge runs in the `wicked-vault:analyze-evidence` skill, which orchestrates
|
|
352
|
+
`inspect → independent eval → attest`; the CLI itself never calls a model.
|
|
353
|
+
|
|
354
|
+
---
|
|
355
|
+
|
|
356
|
+
## 9. Integration-discovery contract
|
|
357
|
+
|
|
358
|
+
Registers provider `wicked-vault` with capabilities `[record, verify,
|
|
359
|
+
cross-check, declare-contract]`. wicked-garden and wicked-testing discover it
|
|
360
|
+
the same way they discover bus/brain; if absent, consumers degrade to a
|
|
361
|
+
documented "no-vault" path (garden: emit a claims-doc-only lint; testing:
|
|
362
|
+
local evidence JSON as today).
|
|
363
|
+
|
|
364
|
+
---
|
|
365
|
+
|
|
366
|
+
## 10. Consumer interaction sequences
|
|
367
|
+
|
|
368
|
+
**A. wicked-garden — compile time**
|
|
369
|
+
1. detect repo bindings (test_command, surfaces).
|
|
370
|
+
2. `declare-contract` per (scope, phase): required claims, with `source` pinned
|
|
371
|
+
to the repo's real commands and `verifier` per claim.
|
|
372
|
+
3. emit triggers (pre-commit / CI) that call `cross-check --from-contract`.
|
|
373
|
+
|
|
374
|
+
**B. wicked-garden — runtime gate (the on-switch)**
|
|
375
|
+
1. agent claims "done".
|
|
376
|
+
2. compiled trigger runs `wicked-vault cross-check --from-contract`.
|
|
377
|
+
3. verdict gates: `overall != PASS` ⇒ block / CI red. The agent never supplies
|
|
378
|
+
the verdict; it can only `record` artifacts the vault re-derives.
|
|
379
|
+
|
|
380
|
+
**C. wicked-testing — substrate**
|
|
381
|
+
1. scenario run ⇒ `record --run` (capture).
|
|
382
|
+
2. ledger stores its verdict + `artifact_id`.
|
|
383
|
+
3. history/flake queries ⇒ `verify <id>` to re-derive past evidence.
|
|
384
|
+
|
|
385
|
+
**D. hand-run / CI — standalone**
|
|
386
|
+
- `wicked-vault record --run` in a build step; `cross-check` before merge.
|
|
387
|
+
No wicked-garden required.
|
|
388
|
+
|
|
389
|
+
**E. validator-pair utility**
|
|
390
|
+
- each validator's verdict is `record`ed as `kind: review-verdict`; the PR body
|
|
391
|
+
cites the `artifact_id`s; a high-risk surface's contract lists them as
|
|
392
|
+
`required: true`, so `cross-check` REJECTs until both verdicts exist.
|
|
393
|
+
|
|
394
|
+
---
|
|
395
|
+
|
|
396
|
+
## 11. Versioning
|
|
397
|
+
|
|
398
|
+
- `schema_version` in `vault.json` gates entry-format migrations.
|
|
399
|
+
- `contract_version` (hash of `required_evidence`) is stamped on every artifact
|
|
400
|
+
and verdict — a verdict declares which contract it judged against, so contract
|
|
401
|
+
drift is detectable (Copilot's `criteria_version` insight from command_iq).
|
|
402
|
+
|
|
403
|
+
---
|
|
404
|
+
|
|
405
|
+
## 12. Open decisions (recommendations — override as needed)
|
|
406
|
+
|
|
407
|
+
All resolved by the council + ADR-0001.
|
|
408
|
+
|
|
409
|
+
| # | Decision | Resolution | Source |
|
|
410
|
+
|---|---|---|---|
|
|
411
|
+
| D0 | standalone vs incubate | **standalone product** — council's incubate verdict overridden; command_iq's proven `EvidencePort` + Phase-0 proof answer the "premature" critique | user adjudication (ADR-0001) |
|
|
412
|
+
| D1 | evidence location | **in-repo `.wicked-vault/`, committed** — evidence travels with the PR | accepted |
|
|
413
|
+
| D2 | wicked-testing relationship | **substrate-ready, no forced migration** | accepted |
|
|
414
|
+
| D3 | runtime/packaging | **node/npm + CLI** (CLI authoritative) — runtime-fragmentation cost accepted as a tradeoff | accepted (council Q3 dissent noted) |
|
|
415
|
+
| D4 | verifier scope | **5 deterministic verifiers in v1 core**; nondeterministic = separate observation extension; `llm_eval` removed | council Q5 |
|
|
416
|
+
| D5 | source of truth | **`entries/<ulid>.json` (one file per artifact)** + content-addressed payloads; sqlite derived. *Supersedes v0 single-manifest.* | council Q2 (mandatory under D0) |
|
|
417
|
+
| D6 | gate boundary | **consumer authors contract, vault evaluates mechanically** (G9) — `cross-check` retained | council Q4 |
|
|
418
|
+
| D7 | capture model | **G4 = honest recording, not sandboxed capture**; harness owns execution isolation (as command_iq's runtime did) | council Q6 |
|
|
419
|
+
| D8 | independent evaluation | **two-tier: deterministic CLI integrity + skill-orchestrated independent judgment** recorded as `opinion_attestation` under G10; CLI never calls a model (G7 upheld) | ADR-0002, council 5–0 |
|
|
420
|
+
| D9 | acceptance criteria | **mandatory, hashed into the envelope, frozen to the evidence**; contract-pinned criteria are the trusted path, worker-supplied are attributed + weaker | ADR-0002 D1 (+Gemini escalation) |
|
|
421
|
+
| D10 | eval timing / gating | **eval runs once/on-demand, not every verify**; `verify` integrity-only; `cross-check --with-attestations` is opt-in; `--integrity-only` default & CI-safe | ADR-0002 D3 (council disqualified live-every-verify) |
|
|
@@ -0,0 +1,101 @@
|
|
|
1
|
+
# ADR-0001 — wicked-vault is a standalone product; council-driven v1 revisions
|
|
2
|
+
|
|
3
|
+
**Status:** Accepted
|
|
4
|
+
**Date:** 2026-05-24
|
|
5
|
+
**Context:** Founding contract (CONTRACTS.md v0) was reviewed by a multi-model
|
|
6
|
+
council (wicked-garden:jam:council) per the command_iq BLUEPRINT discipline:
|
|
7
|
+
council on contentious decisions, surface dissent, user adjudicates.
|
|
8
|
+
|
|
9
|
+
## Council composition
|
|
10
|
+
|
|
11
|
+
4 independent perspectives ran across 4 adversarial axes: Claude
|
|
12
|
+
(security/integration), Codex/gpt-5.5 (skeptic), Gemini (ops/maintainer),
|
|
13
|
+
Pi (pragmatist). 6 models were unavailable (no API key/config) and were
|
|
14
|
+
reported as such. 3 of the 4 are foreign model families — genuine
|
|
15
|
+
model-diversity, not an all-Claude pass.
|
|
16
|
+
|
|
17
|
+
## Findings (council verdicts on the v0 spec)
|
|
18
|
+
|
|
19
|
+
| Q | Topic | Council verdict |
|
|
20
|
+
|---|---|---|
|
|
21
|
+
| Q1 | separate product vs incubate | 4–0 NEEDS-REWORK (premature; one consumer) |
|
|
22
|
+
| Q2 | in-repo `manifest.jsonl` | effectively 4–0 against as-specified (concurrent-write conflicts) |
|
|
23
|
+
| Q3 | node/npm + CLI | 3–1 NEEDS-REWORK (stack fragmentation / scope) |
|
|
24
|
+
| Q4 | `cross-check` boundary | 3–1 (smuggles gate-decision logic into the primitive) |
|
|
25
|
+
| Q5 | deterministic core | principle APPROVED; scope rejected (cut nondeterministic tier) |
|
|
26
|
+
| Q6 | G3+G4 enforceability | 4–0 NEEDS-REWORK (theater without an exec-env contract) |
|
|
27
|
+
|
|
28
|
+
Two minority constraints were load-bearing and are preserved:
|
|
29
|
+
- **Pi (Q2 lone APPROVE)** — in-repo JSONL is fine *only* single-writer. Collapses
|
|
30
|
+
to REJECT under concurrent writers. → drives the per-entry storage decision.
|
|
31
|
+
- **Gemini (Q4 lone APPROVE)** — removing `cross-check` fragments consumers into
|
|
32
|
+
bespoke verdict aggregation. → cross-check is retained, the *boundary* is pinned.
|
|
33
|
+
|
|
34
|
+
## Decision
|
|
35
|
+
|
|
36
|
+
### Q1 — Standalone product: council OVERRIDDEN (user adjudication)
|
|
37
|
+
|
|
38
|
+
The council judged the spec in isolation. Its prematurity verdict rests on
|
|
39
|
+
"unproven/speculative design." That critique is materially answered by evidence
|
|
40
|
+
outside the spec:
|
|
41
|
+
1. command_iq's `the-vault.EvidencePort` is a production-proven implementation of
|
|
42
|
+
these exact semantics (envelope hash, never-trust-cached re-derivation,
|
|
43
|
+
template registry, fail-closed), validated across the evidence-domain epic and
|
|
44
|
+
3 caught regressions.
|
|
45
|
+
2. The Phase-0 probe proved detect+emit generalize across 4 ecosystems.
|
|
46
|
+
3. Prior design sessions worked the architecture end to end.
|
|
47
|
+
|
|
48
|
+
We are extracting a known-good design, not decomposing speculatively. Codex's
|
|
49
|
+
own "what would change my mind — proven design" is met by the prior art the
|
|
50
|
+
council could not see. Consumer #1 = wicked-garden harness; declared #2 =
|
|
51
|
+
wicked-testing (substrate). **wicked-vault is a standalone product.**
|
|
52
|
+
|
|
53
|
+
### Q2 — Per-entry storage: ACCEPTED, and UPGRADED to mandatory
|
|
54
|
+
|
|
55
|
+
The Q1 override removes the single-writer world that made a single manifest
|
|
56
|
+
survivable. Source of truth becomes `entries/<ulid>.json` (one file per
|
|
57
|
+
artifact); concurrent writers never touch the same path. `index.sqlite` and any
|
|
58
|
+
rolled-up manifest are derived caches. (Supersedes v0 Decision D5.)
|
|
59
|
+
|
|
60
|
+
### Q3 — node/npm + CLI: ACCEPTED as a tradeoff
|
|
61
|
+
|
|
62
|
+
Family-consistent with bus/brain/testing (all npm). Consumers (Python
|
|
63
|
+
wicked-garden, shell CI) touch only the CLI; no Node library dependency is
|
|
64
|
+
imposed on them. Gemini's runtime-fragmentation concern is acknowledged and
|
|
65
|
+
accepted.
|
|
66
|
+
|
|
67
|
+
### Q4 — Boundary: ACCEPTED, pinned via new invariant G9
|
|
68
|
+
|
|
69
|
+
The consumer *authors and owns* the contract (what evidence is required, which
|
|
70
|
+
verifier). The vault *evaluates mechanically*: for each required claim, does an
|
|
71
|
+
active artifact exist whose `verify()` passes and whose `source`/`verifier`
|
|
72
|
+
match the pin? The vault never decides *what* is required. `cross-check` is
|
|
73
|
+
retained (preserving Gemini's anti-fragmentation constraint); the gate-logic
|
|
74
|
+
accusation is answered structurally by G9.
|
|
75
|
+
|
|
76
|
+
### Q5 — Scope: ACCEPTED
|
|
77
|
+
|
|
78
|
+
v1 core = 5 deterministic verifiers (`exit_code_eq`, `regex_match`,
|
|
79
|
+
`not_contains`, `jq_pred`, `commit_exists`). The nondeterministic tier
|
|
80
|
+
(`pr_check_status`, `http_status_eq`) is deferred to a separate
|
|
81
|
+
**observation-verifier extension** with explicit fresh-capture (not
|
|
82
|
+
re-derivation) semantics. `llm_eval` is removed entirely — it falsifies G7
|
|
83
|
+
(verifier purity) at the type level.
|
|
84
|
+
|
|
85
|
+
### Q6 — G3/G4: RESOLVED by command_iq's actual approach (honest scoping)
|
|
86
|
+
|
|
87
|
+
command_iq's vault did not sandbox capture; it recorded + re-derived +
|
|
88
|
+
hash-bound, and execution isolation was the runtime's responsibility. v1 adopts
|
|
89
|
+
the same: **G4 = honest recording, not sandboxed capture.** Threat model stated
|
|
90
|
+
plainly — the vault defends against post-hoc tampering (mutating a recorded
|
|
91
|
+
payload/verdict) and stale-cache trust; it does NOT defend against a poisoned
|
|
92
|
+
capture environment, which is the harness/CI's responsibility. Sandboxed capture
|
|
93
|
+
is future hardening (a later ADR).
|
|
94
|
+
|
|
95
|
+
## Consequences
|
|
96
|
+
|
|
97
|
+
- CONTRACTS.md revised to v1 (§1, §4, §5, §6, §12 changed).
|
|
98
|
+
- Concurrency-safe storage is non-negotiable given the standalone decision.
|
|
99
|
+
- `pr_check_status` capability (needed by the ci-aware-merge discipline) is
|
|
100
|
+
sequenced to the observation-verifier extension, not v1 core.
|
|
101
|
+
- The council's dissent is preserved here; the override is traceable.
|