@intentsolutions/audit-harness 1.2.1 → 1.2.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +246 -430
- package/README.md +10 -1
- package/package.json +1 -1
- package/scripts/emit-evidence.sh +77 -12
package/CHANGELOG.md
CHANGED
|
@@ -1,553 +1,369 @@
|
|
|
1
1
|
# Changelog
|
|
2
2
|
|
|
3
|
-
All notable changes are
|
|
3
|
+
All notable changes to `@intentsolutions/audit-harness` are documented here. The
|
|
4
|
+
format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/) and this
|
|
5
|
+
project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
|
4
6
|
|
|
5
7
|
## [Unreleased]
|
|
6
8
|
|
|
7
|
-
|
|
9
|
+
> **Riding a future v2.1 routine release (descoped from 1.2.0):** OTel event-name
|
|
10
|
+
> polish (iah-E07b/c). The `agent.rollout.gate.evaluated` and `gate.decision.emitted`
|
|
11
|
+
> event names are already locked + tested on main (PRs #78, #81 per NORMATIVE
|
|
12
|
+
> `intent-eval-lab/000-docs/067-AT-SPEC`). Any further attribute-schema polish on
|
|
13
|
+
> those events is deferred to a routine v2.1 release rather than headlined here — it
|
|
14
|
+
> is additive telemetry refinement, not a 1.2.0 capability boundary.
|
|
8
15
|
|
|
9
|
-
|
|
16
|
+
## [1.2.3] - 2026-06-20
|
|
10
17
|
|
|
11
|
-
|
|
18
|
+
A patch release shipping a correctness fix to the CLI `emit-evidence` command. No
|
|
19
|
+
CLI surface, no new commands — the evidence emitter now produces kernel-valid
|
|
20
|
+
output where it previously did not.
|
|
12
21
|
|
|
13
|
-
|
|
14
|
-
|
|
15
|
-
A patch release: release-pipeline supply-chain hardening (polyglot signing) plus
|
|
16
|
-
dev-dependency bumps. No CLI surface, runtime behavior, or API boundary changes —
|
|
17
|
-
the published artifacts are byte-identical in behavior to 1.2.0; only the release
|
|
18
|
-
machinery and dev tooling moved.
|
|
19
|
-
|
|
20
|
-
### Changed — polyglot release signing wired into the publish pipeline (#90)
|
|
21
|
-
|
|
22
|
-
- **crates.io build-provenance attestation.** The `publish-crates` leg now emits a
|
|
23
|
-
GitHub build-provenance attestation for the published crate artifact, extending the
|
|
24
|
-
signed-supply-chain guarantee to the Rust distribution.
|
|
25
|
-
- **sigstore-python wheel + sdist signing.** The `publish-pypi` leg now signs the built
|
|
26
|
-
wheel and sdist with `sigstore-python` (keyless Fulcio OIDC + Rekor), so the PyPI
|
|
27
|
-
distribution carries verifiable provenance alongside the existing npm sigstore path.
|
|
28
|
-
- **crates.io publish is now active.** With `CARGO_REGISTRY_TOKEN` provisioned as a
|
|
29
|
-
repository secret, the `publish-crates` leg goes live on this tag — closing the
|
|
30
|
-
polyglot publish loop (npm + PyPI + crates.io all publish + sign from one tag).
|
|
31
|
-
|
|
32
|
-
### Changed — dev-dependency bumps
|
|
33
|
-
|
|
34
|
-
- Bump `eslint` from 9.39.4 to 10.5.0 (#71).
|
|
35
|
-
- Bump `jeremylongshore/intent-rollout-gate` GitHub Action pin (#86).
|
|
36
|
-
- Bump `crate-ci/typos` from 1.29.4 to 1.47.2 (#87).
|
|
37
|
-
|
|
38
|
-
## [1.2.0] - 2026-06-15
|
|
39
|
-
|
|
40
|
-
A minor release: the read-only "comprehensive audit, on any repo" brain (`classify` → `conform` → `audit` → `scan` → `currency`), the kernel-emitting evidence path (`emit-evidence` Evidence Bundle, E04), the provider credential gate (`cred-gate`, E08), shared vendorable lint configs (#85), and a golden-master fitness function — all additive, with the zero-runtime-dependency guarantee preserved.
|
|
41
|
-
|
|
42
|
-
### Release narrative (what shipped since 1.1.8)
|
|
43
|
-
|
|
44
|
-
- **`emit-evidence` Evidence Bundle emitter (E04).** The CI-only signed-evidence path emits the harness's own deterministic self-gate as a kernel `gate-result/v1` row inside an `EvidenceBundle`, cosign-signs the canonical bytes (Fulcio OIDC + Rekor), and publishes a `report-manifest.json` the dashboard re-verifies at ingest. Detail under "CI-only signed evidence emit" below.
|
|
45
|
-
- **Provider credential gate (`cred-gate`, E08).** A new gate that asserts provider credentials PASS/FAIL with full redaction + spillover coverage (`scripts/cred-gate.sh`, fixtures via PR #80).
|
|
46
|
-
- **Shared, vendorable lint configs (#85).** `.audit-harness-configs/` (markdownlint / yamllint / ruff / shellcheck) is the canonical config set the IEP repos vendor + extend; `install.sh` now vendors both `scripts/` and `configs/`.
|
|
47
|
-
- **Dogfood AAR (iah-E10d).** First-downstream-adopter run captured at `000-docs/013-AA-AACR-rollout-gate-dogfood-iah-E10-2026-06-15.md`.
|
|
48
|
-
|
|
49
|
-
### Apache-2.0 §4(d) NOTICE obligation — satisfied
|
|
50
|
-
|
|
51
|
-
`NOTICE` is present at the repo root, listed in `package.json#files` (ships in the npm tarball), included in the Python sdist + Rust crate distributions, AND vendored into `.audit-harness/` by `install.sh` (see "`install.sh` vendors NOTICE" below). The §4(d) attribution-travels-with-distribution obligation holds across npm, PyPI, crates.io, and the vendored-install path.
|
|
52
|
-
|
|
53
|
-
### Why minor, not patch
|
|
54
|
-
|
|
55
|
-
Multiple new CLI verbs (`classify`, `conform`, `audit`, `scan`, `currency`, `cred-gate`) and new authored feature surfaces (shared lint configs, golden-master suite, the CI-only evidence emit). Per SemVer this is a minor bump. No CLI command was renamed or removed; the change is purely additive and the published tarball stays zero-runtime-dependency.
|
|
56
|
-
|
|
57
|
-
### Added — golden-master suite for gherkin-lint + crap-score stdout shapes (iah-golden-master)
|
|
58
|
-
|
|
59
|
-
A fitness function that pins the raw stdout of the two scorers whose output is a downstream contract.
|
|
60
|
-
|
|
61
|
-
- **`tests/golden/run-golden.sh`** captures `gherkin-lint.sh` (text rubric) and `crap-score.py --json` (gate-result envelope) stdout against a `tests/fixtures/deliberate-failure/` corpus and diffs each against a checked-in golden, failing on any drift. Environment-volatile bytes are normalized out (gherkin-lint's installed-vs-awk-fallback first line; crap-score's absolute `summary_path`) so the golden is byte-stable across machines. CI installs no complexity provider, so the crap golden captures the deterministic no-provider envelope shape.
|
|
62
|
-
- **Why this and not the per-row schema gate:** the schema gate validates the *augmented* predicate that `emit-evidence` produces, not the raw scorer stdout. A silent reshape of the scorer stdout — a renamed field, a dropped WARN line, changed summary wording — is a backward-compat break the schema gate cannot see. This suite is that missing guard.
|
|
63
|
-
- Regenerate intentional changes with `bash tests/golden/run-golden.sh --update` and review the golden diff in the PR. Wired into `.github/workflows/ci.yml` as the `golden` job.
|
|
64
|
-
|
|
65
|
-
### Changed — `install.sh` vendors NOTICE + the Node dispatcher (iah-install-sh-completeness)
|
|
66
|
-
|
|
67
|
-
The vendored-install path (non-Node repos) now ships a complete, traceable copy.
|
|
68
|
-
|
|
69
|
-
- **`NOTICE`** is copied into `.audit-harness/` — Apache-2.0 §4(d) requires the NOTICE file to travel with any distribution, and vendoring is a distribution.
|
|
70
|
-
- **`bin/audit-harness.js`** (the Node CLI dispatcher) and **`package.json`** are copied into `.audit-harness/bin/` + `.audit-harness/` so the canonical dispatcher surface is present and its `--version` (which reads `../package.json`) resolves in the vendored tree.
|
|
71
|
-
- A **`PROVENANCE`** file records the source repo, version, tarball URL, and install timestamp so a vendored tree is traceable back to the exact release it came from.
|
|
72
|
-
|
|
73
|
-
### Added — CI-only signed evidence emit for the intent-eval-dashboard (nr75.12)
|
|
74
|
-
|
|
75
|
-
The dashboard reports hub (labs.intentsolutions.io) ingests a signed `report-manifest.json` of kernel `gate-result/v1` rows per repo. This adds audit-harness's own emit, lighting up its row.
|
|
76
|
-
|
|
77
|
-
- **`ci/emit-evidence.ts` + `ci/assemble-manifest.ts`** — run the real deterministic self-gate (`harness-hash --verify`), shape it into a kernel `gate-result/v1` + `EvidenceBundle` (fail-closed against `@intentsolutions/core`), cosign-sign the canonical bytes (Fulcio OIDC + Rekor), and assemble the manifest the dashboard re-verifies at ingest.
|
|
78
|
-
- **Zero-dep guarantee preserved.** The emitter lives in `ci/` (excluded from `package.json#files`) and the kernel is installed CI-only via `npm i --no-save` — `dependencies` + `devDependencies` stay empty and the published tarball is unchanged (verified via `npm pack --dry-run`).
|
|
79
|
-
- **`.github/workflows/release.yml`** — adds a GitHub Release on tag push + an `emit-evidence` job (tag-only) that publishes the manifest as a Release asset.
|
|
80
|
-
|
|
81
|
-
### Added — `currency` advisory upstream-currency report (PP-PLAN-040 Phase 5 / E7)
|
|
82
|
-
|
|
83
|
-
The fifth verb, and deliberately the weakest: an advisory report with no exit-code authority.
|
|
84
|
-
|
|
85
|
-
- **`audit-harness currency`** (`scripts/currency.py`, stdlib): reads the per-upstream-identity pin relation (`schemas/currency/pins.v1.json`) and reports which pins are themselves **stale** — `checked_at` older than the pin's staleness window. Each upstream (mcp-spec, skill-md-schema, claude-code, gate-result-predicate, anthropic-sdk, agentskills-spec) carries its own `pinned_version` + `checked_at` + window, so the *pin's own staleness* is detectable (not one opaque scalar).
|
|
86
|
-
- **No exit-code authority (always exit 0), no live-fetch, no auto-fix.** Currency depends on upstream state — non-deterministic and network-bound — so it only reports. `/sync-testing-harness` consumes the report to open advisory bump PRs; it never reddens a build. `--today YYYY-MM-DD` makes reports reproducible.
|
|
87
|
-
- **`tests/currency/`**: golden suite (3 checks) — stale/current/unknown classification, the no-exit-authority guarantee (exit 0 even when all pins are stale), and the shipped relation reporting.
|
|
88
|
-
|
|
89
|
-
### Added — `scan` security/hygiene/skill-quality gate-runner (PP-PLAN-040 Phase 4 / E6)
|
|
90
|
-
|
|
91
|
-
The fourth read-only verb: security + hygiene + skill-quality, by orchestrating standard tools (never reimplementing them).
|
|
92
|
-
|
|
93
|
-
- **`audit-harness scan [repo]`** (`scripts/scan.py`, stdlib): for every `dimension: security | hygiene | skill-quality` gate in the profile, emits a `gate-result/v1` row. Three strategies: **local** (`hygiene-readme` README presence — deterministic), **shell-out** (every gate carrying a `tool` — gitleaks / osv-scanner / semgrep / syft / markdownlint / lychee — clean exit → PASS, findings → ADVISORY(error), tool absent → ADVISORY indeterminate), **consume** (`skill-behavioral` ingests a j-rig Evidence Bundle verdict via `--jrig-verdict`; the harness never runs behavioral judgment itself — no verdict → indeterminate).
|
|
94
|
-
- Advisory-first; `--strict` (or a blocking gate) turns a finding/gap into `FAIL`. Kill-switch → `[]`. Each row records `metadata.method` (`local-presence` / `shell-out` / `consume-j-rig`).
|
|
95
|
-
- **`tests/scan/`**: golden suite (10 checks) with pinned-profile isolation so shell-out tool availability never makes the suite flaky.
|
|
96
|
-
|
|
97
|
-
**Security note:** on first run this gate caught — and this release redacts from HEAD — a PyPI publish token that had been pasted as a literal value in `python/PUBLISH.md`. The value remains in git history; it must be rotated at the registry (tracked separately). The doc now carries a placeholder.
|
|
98
|
-
|
|
99
|
-
### Added — `audit` testing-depth gate-runner (PP-PLAN-040 Phase 3 / E5)
|
|
100
|
-
|
|
101
|
-
The third read-only verb: the "finish the pyramid" testing-depth diagnostic.
|
|
102
|
-
|
|
103
|
-
- **`audit-harness audit [repo]`** (`scripts/audit.py`, stdlib): for every `dimension: testing-depth` gate in the profile, assesses the gate and emits a `gate-result/v1` row. Two read-only strategies: `crap-score` runs the bundled `crap` scorer (static complexity×coverage); every pyramid layer (unit/integration/e2e/smoke/perf/a11y/contract/migration/property-based/fuzz/sanitizers) gets a per-layer **presence heuristic** (test dirs, framework configs, dependency markers). Layer present → `PASS`; absent → `ADVISORY(warn)` testing-depth gap; not statically assessable → `ADVISORY` indeterminate.
|
|
104
|
-
- **`--fast` (default)** presence heuristics only (<10s); **`--deep`** adds `crap-score`; **`--strict`** turns a gap on a blocking gate into `FAIL`. Kill-switch → `[]`. Each row records `metadata.method` (`crap-static` / `presence-heuristic` / `delegated`) for provenance.
|
|
105
|
-
- **Deliberately does NOT execute the repo's test suite.** Running arbitrary untrusted suites is the repo's own CI's job; the harness reports coverage *presence* and the repo's CI test step produces the execution verdict. `audit` is the diagnostic, not the test runner.
|
|
106
|
-
- **`tests/audit/`**: golden suite (7 checks) + `has-tests`/`no-tests` fixtures — asserts unit→PASS / gap→ADVISORY(default) / gap→FAIL(`--strict`), crap deep-only-in-fast, kill-switch, and gate-result/v1 validity. CI `audit` job.
|
|
107
|
-
|
|
108
|
-
### Added — registry projection + FP-rate harness (PP-PLAN-040 Phase 0 completion: c2b + c2e)
|
|
109
|
-
|
|
110
|
-
Closes the data/safety-spine epic (E2): the registry becomes the single canonical datum and gate promotion gets a measured bar.
|
|
111
|
-
|
|
112
|
-
- **`audit-harness gen-layer-applicability`** (`scripts/gen-layer-applicability.py`): projects `schemas/audit-profile/registry.v1.json` into `schemas/audit-profile/layer-applicability.md`. `--write` regenerates; `--check` fails on drift. The doc is now a **projection** of the registry datum, not a hand-maintained parallel source — CI gate `layer-applicability-drift` enforces it (c2b).
|
|
113
|
-
- **`audit-harness fp-rate`** (`scripts/fp-rate.py`): measures each gate's false-positive / false-negative rate over a labeled corpus (`tests/fixtures/conform/{valid,malformed}/`). This is the metric that gates advisory→blocking promotion. `--max-fp-rate X` exits 1 if any gate exceeds the bar; CI runs it advisory at the 5% default bar (c2e).
|
|
114
|
-
- **`docs/gate-promotion.md`**: the dedicated advisory→blocking promotion rule — FP-rate ≤ 5% bar, engineer-pinned in `tests/TESTING.md`, re-pinned manifest. Documents *why* FP-rate (not FN-rate) is the gate and how demotion/kill-switch works. `docs/` now ships in the npm package (`files`).
|
|
115
|
-
|
|
116
|
-
### Added — `conform` verb + bundled content-addressed schemas (PP-PLAN-040 Phase 2)
|
|
117
|
-
|
|
118
|
-
The second piece of the read-only brain: deterministic conformance, emitting Evidence Bundle rows.
|
|
119
|
-
|
|
120
|
-
- **`audit-harness conform [repo]`** (`scripts/conform.py`, stdlib + PyYAML): read-only conformance gate-runner. For every `dimension: conformance` gate in the repo's `audit-profile/v1`, it locates the artifact(s) and emits a `gate-result/v1` row (JSON array, stdout). **Never writes, never live-fetches.**
|
|
121
|
-
- **Bundled content-addressed schemas** (`schemas/conform/v1/`): `skillmd-frontmatter`, `mcp-config`, `plugin-manifest`, `agent-frontmatter` — the deterministic *structural floor* (parses + required keys + types), distinct from the IS 100-point rubric / SAK authoring kernel (judgment, stays in `/validate-*`). conform records each schema's sha256 in the row's `policy_hash`, so a row re-verifies against the exact schema version that produced it.
|
|
122
|
-
- **Reproducible-by-design engine.** Bundled JSON-Schemas are checked by an embedded subset validator (complete for the closed bundled schemas) rather than ajv — deliberately, because ajv's availability/version varies per machine and would make signed evidence non-reproducible. Same commit + same harness version produce an identical verdict.
|
|
123
|
-
- **Genuinely-external formats shell out**: OpenAPI to `spectral`, GitHub Action to `yamllint`. Missing tool produces an `ADVISORY` indeterminate (never a false `FAIL`).
|
|
124
|
-
- **Advisory-first.** A conformance violation on an `enforcement: advisory` gate is `ADVISORY` (severity `error`), exit 0 — logged, not blocking. `--strict` (or an engineer-promoted `enforcement: blocking` gate) turns a violation into `FAIL` (exit 1). Missing artifact produces `NOT_APPLICABLE`. Kill-switch (`AUDIT_HARNESS_DISABLE=1` / `.audit-harness.yml`) produces an empty `[]`, exit 0.
|
|
125
|
-
- **`tests/conform/`**: golden suite (31 checks) + pass/fail fixtures (valid + malformed SKILL.md, .mcp.json, plugin manifest, agent) — asserts valid to PASS, malformed to ADVISORY (default) / FAIL (`--strict`), every row validates against `gate-result/v1`, the NOT_APPLICABLE + indeterminate paths, and `policy_hash` == bundled-schema sha256 + reproducible. Wired into CI (`conform` job).
|
|
126
|
-
|
|
127
|
-
Scope boundary: conformance kinds without a bundled schema (`marketplace`, `hook`) resolve to `ADVISORY` indeterminate — drop a schema into `schemas/conform/v1/` to light them up, no code change. No gate *execution* for testing-depth/security yet (Phase 3+).
|
|
128
|
-
|
|
129
|
-
### Added — `classify` verb + `audit-profile/v1` (PP-PLAN-040 Phase 0 + Phase 1)
|
|
130
|
-
|
|
131
|
-
The first piece of the "comprehensive audit, on any repo" build: the read-only brain.
|
|
132
|
-
|
|
133
|
-
- **`audit-profile/v1` schema** (`schemas/audit-profile/v1.schema.json`): closed, versioned, hash-bearing value mirroring `gate-result/v1`. Four invariants: classifications are a UNION (not a winner), `unresolved[]` is the only Claude-refinable surface, `waived ⇒ disabled` (allOf-enforced), `registry_hash` makes a profile reproducible.
|
|
134
|
-
- **Canonical dimension→gate registry** (`schemas/audit-profile/registry.v1.json`): the single datum that answers "which gates apply to repo-type X, in which dimension, at what applicability" — `layer-applicability.md` and `TESTING.md` become projections of it.
|
|
135
|
-
- **`audit-harness classify [repo]`** (`scripts/classify.py`, stdlib-only): read-only repository classifier. Detects the UNION of repo-type + Claude-artifact classifications, resolves the gate set against the registry, records `registry_hash`, and emits an `audit-profile/v1` value to stdout. **Never writes to the repo.**
|
|
136
|
-
- **Safety levers**: `INDETERMINATE` result class (infra failure ≠ policy failure); dispatcher per-command supervision via `AUDIT_HARNESS_TIMEOUT` (kill a hung gate, exit 124); `AUDIT_HARNESS_DISABLE=1` kill-switch (gate commands no-op; classify emits an all-disabled profile); engineer-owned `.audit-harness.yml` override (`classify_pins`, `advisory`, `disable_gates`, `disable`) — see `.audit-harness.example.yml`.
|
|
137
|
-
- **`tests/classify/`**: golden fixture corpus (6 fixtures, authored before the classifier) + suite — golden-matches classifications, schema-validates every profile, exercises the kill-switch, the unknown/unresolved path, and override honoring. Wired into CI (`classify` job).
|
|
138
|
-
- **`schemas/` now ships in the npm package** (`files`) so the registry + schema are available to consumers on any repo.
|
|
139
|
-
|
|
140
|
-
Scope boundary: no `conform` verb, no gate execution yet (Phase 2+). `classify` is read-only and emits a profile only.
|
|
141
|
-
|
|
142
|
-
## [1.1.8] - 2026-06-18
|
|
143
|
-
|
|
144
|
-
Ships the iah-E06 production-signing pre-flight gate to downstream consumers.
|
|
145
|
-
|
|
146
|
-
### Added — DNSSEC + CAA production-signing pre-flight (iah-E06)
|
|
147
|
-
|
|
148
|
-
Before a production-mode `emit-evidence` run signs canonical bytes, two deterministic pre-flight scripts assert the signing domain is cryptographically sound. Both fail closed: any error, missing record, or unreachable resolver blocks the signing path rather than emitting an unverifiable attestation.
|
|
149
|
-
|
|
150
|
-
- **`scripts/dnssec-check.sh`** — verifies the signing domain's DNSSEC chain is present and validates.
|
|
151
|
-
- **`scripts/caa-check.sh`** — verifies the domain's CAA records authorize the signing certificate authority.
|
|
152
|
-
- The `emit-evidence` production path gates on both before signing; staging/draft emit is unaffected.
|
|
153
|
-
|
|
154
|
-
### Fixed — query a trusted validating resolver in the DNSSEC + CAA pre-flight (PR #75)
|
|
155
|
-
|
|
156
|
-
The pre-flight previously trusted the ambient resolver, which may not validate DNSSEC. Both scripts now query known validating resolvers (`1.1.1.1`, `8.8.8.8`) and require the authenticated-data (AD) flag plus an `RRSIG` on the answer. A resolver that does not set AD, or an answer with no RRSIG, is treated as a validation failure (fail-closed) rather than a pass.
|
|
157
|
-
|
|
158
|
-
### Changed — Version bumped to 1.1.8 across all manifests
|
|
159
|
-
|
|
160
|
-
Per the `version-canonical-check` CI gate. `package.json` (canonical), `version.txt`, `python/pyproject.toml`, `python/src/intent_audit_harness/__init__.py`, and `rust/Cargo.toml` all report `1.1.8`.
|
|
161
|
-
|
|
162
|
-
### Why patch, not minor
|
|
163
|
-
|
|
164
|
-
The pre-flight scripts shipped to the repo in earlier PRs (#70, #75); this patch propagates them to npm consumers via a version bump. No new public CLI commands or flag changes in this release boundary.
|
|
165
|
-
|
|
166
|
-
## [v1.1.5] - 2026-06-03
|
|
22
|
+
### Fixed
|
|
167
23
|
|
|
168
|
-
|
|
24
|
+
- **`emit-evidence` now emits kernel-valid `gate-result/v1` predicate bodies (#103).**
|
|
25
|
+
The CLI `emit-evidence` wrapped gate rows in an in-toto Statement declaring
|
|
26
|
+
`predicateType: https://evals.intentsolutions.io/gate-result/v1`, but the predicate
|
|
27
|
+
body carried the legacy draft envelope (`result`/`timestamp`), which fails
|
|
28
|
+
`@intentsolutions/core`'s `GateResultV1Schema` (it forbids additional properties) —
|
|
29
|
+
so a downstream `intent-rollout-gate` rejected the bundle. The emitter now builds the
|
|
30
|
+
canonical body (`gate_decision`, `gate_name`, `gate_version`, `gate_reasons`,
|
|
31
|
+
`coverage`, `policy_ref`, `evaluated_at`), bringing the general-purpose CLI path to
|
|
32
|
+
parity with the internal `ci/emit-evidence.ts` self-gate (which already emitted
|
|
33
|
+
kernel-valid rows). The post-emit predicate is now validated against a full-kernel
|
|
34
|
+
fixture (`tests/fixtures/gate-result-v1.schema.json`); the partial input-envelope
|
|
35
|
+
fixture stays for the gate emitters' raw rows. Surfaced by the first external-adopter
|
|
36
|
+
convergence run; verified `conform | emit-evidence` → 9/9 kernel-valid →
|
|
37
|
+
`intent-rollout-gate` decision `block → allow`.
|
|
169
38
|
|
|
170
|
-
|
|
39
|
+
## [1.2.2] - 2026-06-16
|
|
171
40
|
|
|
172
|
-
|
|
41
|
+
A patch release closing the polyglot publish loop. No CLI surface, runtime behavior,
|
|
42
|
+
or API boundary changes — only the release machinery moved. v1.2.1 published to npm
|
|
43
|
+
but failed PyPI (a twine bug) and crates.io (an account email-verification gate);
|
|
44
|
+
this release publishes all three registries cleanly.
|
|
173
45
|
|
|
174
|
-
### Fixed
|
|
46
|
+
### Fixed
|
|
175
47
|
|
|
176
|
-
|
|
48
|
+
- **twine now uploads only built distributions, not the `.sigstore.json` bundles (#92).** The `publish-pypi` leg's `twine upload` call is scoped to `dist/*.whl dist/*.tar.gz`, so the sigstore signature bundles emitted alongside the wheel + sdist are no longer passed to twine (which rejected them and failed the v1.2.1 PyPI publish).
|
|
49
|
+
- **crates.io publish goes live.** The account email-verification gate that blocked the v1.2.1 crates.io publish is now resolved, so the `publish-crates` leg publishes on this tag — closing the npm + PyPI + crates polyglot publish loop.
|
|
177
50
|
|
|
178
|
-
|
|
179
|
-
- **`python/pyproject.toml` + `rust/Cargo.toml`**: project-URL fields (Homepage / Repository / Issues / Changelog / documentation) repointed to the renamed repo — these render on PyPI and crates.io.
|
|
180
|
-
- **`python/src/intent_audit_harness/__init__.py`**: docstring source-link repointed.
|
|
181
|
-
- **`README.md`**: the `curl … install.sh` line + the two "Related" skill links repointed to the renamed repo.
|
|
182
|
-
- **`install.sh`**: the `REPO=` variable, the usage-comment URLs at the top, and the re-run hint repointed; the default `VERSION` bumped from the stale `v0.1.0` → `v1.1.5`.
|
|
183
|
-
|
|
184
|
-
### Fixed — install.sh tarball-path glob broke after the rename
|
|
185
|
-
|
|
186
|
-
The GitHub archive tarball unpacks as `<repo>-<version>/`, which became `intent-audit-harness-1.1.5/` after the rename. The unpack-dir detection used `find … -name 'audit-harness-*'`, and `-name` matches the basename with no implicit leading wildcard, so it matched **nothing** under the new prefix — every vendored install would have failed at "could not find unpacked dir". Changed the glob to `-name '*audit-harness-*'` (leading wildcard), which matches both the current `intent-audit-harness-*` name and legacy `audit-harness-*` tags. Verified against both directory names.
|
|
187
|
-
|
|
188
|
-
### Added — README badge row
|
|
189
|
-
|
|
190
|
-
npm-version, License Apache-2.0, and Sigstore-provenance shields under the H1 (mirrors the `intent-eval-core` badge row). The "Part of the Intent Eval Platform" cross-link line is preserved.
|
|
191
|
-
|
|
192
|
-
### Changed — Version bumped to v1.1.5 across all manifests
|
|
193
|
-
|
|
194
|
-
Per the `version-canonical-check` CI gate (v1.0.2 PR #35). `package.json` (canonical), `version.txt`, `python/pyproject.toml`, `python/src/intent_audit_harness/__init__.py`, and `rust/Cargo.toml` all report `1.1.5`. (`rust/Cargo.lock` is gitignored; its working-tree entry is aligned for local cargo builds.)
|
|
195
|
-
|
|
196
|
-
### Why patch, not minor
|
|
197
|
-
|
|
198
|
-
No new CLI commands, no new flags, no API change, no script behavior change. This is release-engineering + metadata: the publish pipeline that ships the existing `1.1.x` code, plus URL corrections for the repo rename, plus the install.sh glob fix. The pinned policy scripts (`.harness-hash`) are untouched.
|
|
199
|
-
|
|
200
|
-
### Verification
|
|
201
|
-
|
|
202
|
-
- `npm pack --dry-run` → tarball contains `bin/`, `scripts/`, `README.md`, `LICENSE`, `NOTICE`, `CHANGELOG.md` per `package.json#files`
|
|
203
|
-
- `node bin/audit-harness.js --version` → `1.1.5`
|
|
204
|
-
- `bash -n install.sh` → exit 0; unpack-dir glob matches `intent-audit-harness-1.1.5` (and legacy `audit-harness-*`)
|
|
205
|
-
- `bash scripts/harness-hash.sh --verify` → OK (no pinned files changed)
|
|
206
|
-
|
|
207
|
-
## [v1.1.4] - 2026-05-25
|
|
208
|
-
|
|
209
|
-
### Fixed — gherkin-lint.sh prev_blank print-every-line noise (IEP P3, Gemini #71 review chain)
|
|
210
|
-
|
|
211
|
-
Closes `iah-gherkin-prev-blank-noise` (`bd_000-projects-o9q1`, P2). The third awk block in `scripts/gherkin-lint.sh` (the And-at-scenario-start checker) opened with a bare `prev_blank = 1` expression that awk interpreted as an always-true pattern with implicit `{ print }` default action — flooding stdout with every line of every feature file alongside the intentional ERROR printf. `prev_blank` was never USED anywhere in the awk script (verified via grep). Removed both touches: the top-level expression AND the assignment in the blank-line pattern (which was also unreachable for anything that mattered, since no downstream pattern read `prev_blank`). The third awk block now produces ONLY the targeted ERROR line when triggered. Verified via the same deliberate-failure test from v1.1.2 AAR — output before: full feature file printed interleaved with ERROR. Output after: just the ERROR line.
|
|
212
|
-
|
|
213
|
-
### Changed — gherkin-lint.sh process_awk_output() collapsed to single awk pass (Gemini #38 follow-up)
|
|
214
|
-
|
|
215
|
-
Closes `iah-gherkin-single-awk-opt` (`bd_000-projects-vawm`, P3). v1.1.2 introduced `process_awk_output()` with two awk subprocesses per call (one counting WARN, one counting ERROR). v1.1.4 collapses to a single awk pass via `read -r w e < <(awk '/^WARN /{w++} /^ERROR /{e++} END {print w+0, e+0}' <<< "$out")` per Gemini PR #39 verbatim suggestion. Halves the awk fork count (4 callsites × 2 subprocesses = 8 awk processes/feature → 4). Verified with mixed WARN+ERROR test: 2 WARNs + 1 ERROR in one feature file produces summary `2 warning(s), 1 error(s)` and exit 1.
|
|
216
|
-
|
|
217
|
-
### Fixed — crap-score.py exclusion sets deduplicated via EXCLUDED_DIRS constant (Gemini #71 review)
|
|
218
|
-
|
|
219
|
-
Closes `iah-crap-score-exclusion-dedup` (`bd_000-projects-niv8`, P2). Pre-v1.1.4, `scripts/crap-score.py` had TWO separate sets with overlapping intent but divergent contents:
|
|
220
|
-
|
|
221
|
-
- `ignore` set in `score_python()` (line 85): had `"reports"` but lacked `.next`, `.nuxt`, `.cache`
|
|
222
|
-
- `prune` set in `main()` (line 394, added v1.1.1 for `--json` input-hash walk): had `.next`, `.nuxt`, `.cache` but lacked `"reports"`
|
|
223
|
-
|
|
224
|
-
Asymmetry was a real bug: a repo with `reports/` would skip score_python's candidate scan but its `.py` files DID get hashed by the input-hash walk; opposite for `.next/.nuxt/.cache`. Fixed by extracting a single module-level constant `EXCLUDED_DIRS` (union of both prior sets) referenced by both call sites. Set contents: `.git`, `.venv`, `venv`, `node_modules`, `__pycache__`, `dist`, `build`, `target`, `.tox`, `.mypy_cache`, `.pytest_cache`, `.next`, `.nuxt`, `.cache`, `reports`.
|
|
225
|
-
|
|
226
|
-
### Changed — Shellcheck CI job version-pinned (parity with ruff v1.1.3)
|
|
227
|
-
|
|
228
|
-
Closes `iah-shellcheck-version-pin` (`bd_000-projects-v1ds`, P3). v1.1.2 (Phase A1) installed shellcheck via `apt-get install -y shellcheck` which pulls whatever Ubuntu's runner-image version happens to ship (currently 0.9.0). When the runner image upgrades shellcheck to 0.10.x or later, new rules activate silently and could surface findings in already-merged code. v1.1.4 pins to `v0.10.0` via download from the koalaman/shellcheck GitHub releases. CI step prints `shellcheck --version` for audit trail. To bump: edit `SHELLCHECK_VERSION` env in the workflow + run `shellcheck scripts/*.sh` locally + commit as explicit PR. Matches the ruff version-pin pattern from v1.1.3.
|
|
229
|
-
|
|
230
|
-
### Changed — Version bumped to v1.1.4 across all 5 manifests
|
|
231
|
-
|
|
232
|
-
Per the version-canonical-check CI gate (v1.0.2 PR #35). All 5 manifest locations now report `1.1.4`.
|
|
233
|
-
|
|
234
|
-
### Changed — `.harness-hash` regenerated
|
|
235
|
-
|
|
236
|
-
`scripts/gherkin-lint.sh` + `scripts/crap-score.py` modified; both are pinned. 2 of 9 pinned-file hashes change.
|
|
237
|
-
|
|
238
|
-
### Why patch, not minor
|
|
239
|
-
|
|
240
|
-
Pure cleanup release: dead-code removal, perf microoptimization, bug fixes for cross-call inconsistencies, CI version pin. No new CLI commands, no new flags, no API change. Consumers re-vendor / `pnpm up` and get the cleaner scripts + tighter CI transparently.
|
|
51
|
+
### Changed
|
|
241
52
|
|
|
242
|
-
|
|
53
|
+
- Release-preparation chore for v1.2.2 (#93).
|
|
243
54
|
|
|
244
|
-
|
|
245
|
-
- `ruff check` → `All checks passed!`
|
|
246
|
-
- `bash -n scripts/*.sh` → all pass
|
|
247
|
-
- `python3 -m py_compile scripts/crap-score.py + cli.py` → exit 0
|
|
248
|
-
- `bash scripts/harness-hash.sh --verify` → OK after `--init`
|
|
249
|
-
- gherkin-lint deliberate-failure test (And-at-start): exit 1, summary correct
|
|
250
|
-
- gherkin-lint mixed test (2 WARN + 1 ERROR): summary `2 warning(s), 1 error(s)`, exit 1
|
|
251
|
-
- Output noise gone: feature-file lines no longer printed alongside ERRORs
|
|
55
|
+
## [1.2.1] - 2026-06-16
|
|
252
56
|
|
|
253
|
-
|
|
57
|
+
A patch release: release-pipeline supply-chain hardening (polyglot signing) plus
|
|
58
|
+
dev-dependency bumps. No CLI surface, runtime behavior, or API boundary changes —
|
|
59
|
+
the published artifacts are byte-identical in behavior to 1.2.0; only the release
|
|
60
|
+
machinery and dev tooling moved.
|
|
254
61
|
|
|
255
|
-
###
|
|
62
|
+
### Added
|
|
256
63
|
|
|
257
|
-
|
|
64
|
+
- **sigstore-python wheel + sdist signing (#90).** The `publish-pypi` leg now signs the built wheel and sdist with `sigstore-python` (keyless Fulcio OIDC + Rekor), so the PyPI distribution carries verifiable provenance alongside the existing npm sigstore path.
|
|
65
|
+
- **crates.io build-provenance attestation (#90).** The `publish-crates` leg now emits a GitHub build-provenance attestation for the published crate artifact, extending the signed-supply-chain guarantee to the Rust distribution.
|
|
258
66
|
|
|
259
|
-
|
|
67
|
+
### Changed
|
|
260
68
|
|
|
261
|
-
|
|
69
|
+
- **crates.io publish is now active (#90).** With `CARGO_REGISTRY_TOKEN` provisioned as a repository secret, the `publish-crates` leg goes live on this tag — closing the polyglot publish loop (npm + PyPI + crates.io all publish + sign from one tag).
|
|
70
|
+
- Bump `eslint` from 9.39.4 to 10.5.0 (#71).
|
|
71
|
+
- Bump `jeremylongshore/intent-rollout-gate` GitHub Action pin from 0.1.0 to 0.2.0 (#86).
|
|
72
|
+
- Bump `crate-ci/typos` from 1.29.4 to 1.47.2 (#87).
|
|
73
|
+
- Release-preparation chore for v1.2.1 (#91).
|
|
262
74
|
|
|
263
|
-
|
|
75
|
+
## [1.2.0] - 2026-06-15
|
|
264
76
|
|
|
265
|
-
|
|
266
|
-
-
|
|
77
|
+
A minor release: the provider credential gate (`cred-gate`, iah-E08), the locked
|
|
78
|
+
OTel runtime-event surface (`agent.rollout.gate.evaluated` + `gate.decision.emitted`,
|
|
79
|
+
iah-E07), shared vendorable lint configs, wrapper-mirror drift-guard CI, and tailnet
|
|
80
|
+
CI-failure alerting — all additive, with the zero-runtime-dependency guarantee
|
|
81
|
+
preserved.
|
|
267
82
|
|
|
268
|
-
|
|
83
|
+
> **Why minor, not patch:** A new CLI-adjacent gate surface (`cred-gate`) and new authored feature surfaces (shared lint configs, the locked OTel event taxonomy, the wrapper drift-guard lane). Per SemVer this is a minor bump. No CLI command was renamed or removed; the change is purely additive and the published tarball stays zero-runtime-dependency.
|
|
269
84
|
|
|
270
|
-
|
|
271
|
-
- **`scripts/crap-score.py`**: dead local variable `metrics = rec.get("metrics", {}).get("cyclomatic", {})` in `score_rust()` (line 266; F841). Assigned but never read. The actual cyclomatic value is fetched freshly inside the loop on line 268.
|
|
272
|
-
- **`python/src/intent_audit_harness/cli.py`**: dead `import os` at line 12 (F401). Zero `os.*` usages in the file.
|
|
85
|
+
### Added
|
|
273
86
|
|
|
274
|
-
|
|
87
|
+
- **Provider credential gate (`cred-gate`, iah-E08) (#77).** A new gate that asserts provider credentials PASS/FAIL with full redaction + spillover coverage (`scripts/cred-gate.sh`).
|
|
88
|
+
- **Credential-leak fixtures + failure-mode docs (#80).** Full-catalog fixture coverage for the cred-gate's redaction + spillover behavior (iah-E08a/E08b).
|
|
89
|
+
- **OTel runtime events on `emit-evidence` (iah-E07) (#81).** Emits `agent.rollout.gate.evaluated` (the per-gate evaluation event, name + attributes locked + tested, iah-E07a) and `gate.decision.emitted` (the gate-decision event, iah-E07b) per the NORMATIVE `intent-eval-lab/000-docs/067-AT-SPEC` runtime-event taxonomy.
|
|
90
|
+
- **Shared, vendorable lint configs (#85).** `.audit-harness-configs/` (markdownlint / yamllint / ruff / shellcheck) is the canonical config set the IEP repos vendor + extend; `install.sh` now vendors both `scripts/` and `configs/`. CLAUDE.md cross-references the lab specs.
|
|
91
|
+
- **Advisory `typos` spell-check CI lane (#83)** and **advisory `actionlint` CI lane (#84).**
|
|
92
|
+
- **ntfy CI-failure alert over the tailnet (#79).** CI failures fan out a notification to the private tailnet ntfy topic.
|
|
275
93
|
|
|
276
|
-
|
|
94
|
+
### Changed
|
|
277
95
|
|
|
278
|
-
|
|
96
|
+
- **Provider credential gate + OTel head landed first (#77).** The `cred-gate` head and the OTel `gate.decision.emitted` decision event landed together; PR #78 then renamed the gate-decision event to `gate.decision.emitted` to align with the 067-AT-SPEC runtime-event taxonomy.
|
|
97
|
+
- **Dogfood AAR (iah-E10d) (#88).** First-downstream-adopter run captured at `000-docs/013-AA-AACR-rollout-gate-dogfood-iah-E10-2026-06-15.md`.
|
|
98
|
+
- Release-preparation chore for v1.2.0 (#89).
|
|
279
99
|
|
|
280
|
-
|
|
100
|
+
### Fixed
|
|
281
101
|
|
|
282
|
-
|
|
102
|
+
- **Bundled wrapper mirrors resynced to canonical + drift-guard CI lane (iah-65k4) (#82).** The Python (`python/src/intent_audit_harness/scripts/`) and Rust (`rust/scripts/`) bundled copies of `crap-score.py` were stale mirrors of canonical `scripts/`; this resyncs them and adds a CI lane that fails on any future drift between canonical and the bundled mirrors.
|
|
283
103
|
|
|
284
|
-
|
|
104
|
+
## [1.1.8] - 2026-06-13
|
|
285
105
|
|
|
286
|
-
|
|
106
|
+
Ships the iah-E06 production-signing pre-flight gate to downstream consumers, plus
|
|
107
|
+
the comprehensive PP-PLAN-040 supply-chain + hygiene wave, crap-score backend
|
|
108
|
+
repairs, and a SemVer contract-pin test suite.
|
|
287
109
|
|
|
288
|
-
|
|
110
|
+
> **Why patch, not minor:** The pre-flight scripts shipped to the repo in earlier PRs (#70, #75); this patch propagates them to npm consumers via a version bump. No new public CLI commands or flag changes in this release boundary.
|
|
289
111
|
|
|
290
|
-
###
|
|
112
|
+
### Added
|
|
291
113
|
|
|
292
|
-
- `
|
|
293
|
-
-
|
|
294
|
-
- `
|
|
295
|
-
-
|
|
296
|
-
- `bash scripts/harness-hash.sh --verify` → OK after `--init`
|
|
297
|
-
- CI ruff job will block any future PR that introduces a Python lint finding (F401, F841, E*, etc.)
|
|
114
|
+
- **DNSSEC + CAA production-signing pre-flight (iah-E06) (#70).** Before a production-mode `emit-evidence` run signs canonical bytes, two deterministic pre-flight scripts assert the signing domain (`evals.intentsolutions.io`) is cryptographically sound — `scripts/dnssec-check.sh` verifies the DNSSEC chain is present and validates; `scripts/caa-check.sh` verifies the CAA records authorize the signing certificate authority. Both fail closed: any error, missing record, or unreachable resolver blocks the signing path rather than emitting an unverifiable attestation. Staging/draft emit is unaffected.
|
|
115
|
+
- **Supply-chain + hygiene + kernel-shadow detector (#69).** PyPI/crates publish wiring, dependabot polyglot coverage, lefthook, eslint, a bash-version floor, a kernel-shadow detector, and a crap-score dot-dir fix landed as one supply-chain wave.
|
|
116
|
+
- **`install.sh` completeness + per-repo blueprint + golden-master stdout suite (#63).** The vendored-install path now ships a complete traceable copy, plus a golden-master fitness function pinning the raw stdout of the scorers whose output is a downstream contract.
|
|
117
|
+
- **SemVer CLI/output-contract pin test (#65).** A test that pins the CLI + output contract so a MAJOR-worthy change fails CI rather than slipping out as a patch.
|
|
298
118
|
|
|
299
|
-
###
|
|
119
|
+
### Changed
|
|
300
120
|
|
|
301
|
-
|
|
121
|
+
- **`currency`: one pin per upstream surface + advisory poll-freshness SLA rename (#68).** Each tracked upstream (mcp-spec, skill-md-schema, claude-code, gate-result-predicate, anthropic-sdk, agentskills-spec) carries its own pin relation so the pin's own staleness is detectable per-upstream rather than as one opaque scalar.
|
|
122
|
+
- **Version bumped to 1.1.8 across all manifests (#76).** Per the `version-canonical-check` CI gate: `package.json` (canonical), `version.txt`, `python/pyproject.toml`, `python/src/intent_audit_harness/__init__.py`, and `rust/Cargo.toml` all report `1.1.8`.
|
|
123
|
+
- **audit-harness self-adopts the intent-rollout-gate Action (#74).** CI dogfoods the downstream rollout-gate Action — graduation criterion 5 / M6 first downstream adopter.
|
|
124
|
+
- Bump `DavidAnson/markdownlint-cli2-action` from 17 to 23 (#49); bump `actions/setup-node` from 4 to 6 (#61); record the public gist id for sweep/release tooling (#67).
|
|
302
125
|
|
|
303
|
-
|
|
126
|
+
### Fixed
|
|
304
127
|
|
|
305
|
-
|
|
128
|
+
- **Query a trusted validating resolver in the DNSSEC + CAA pre-flight (#75).** The pre-flight previously trusted the ambient resolver, which may not validate DNSSEC. Both scripts now query known validating resolvers (`1.1.1.1`, `8.8.8.8`) and require the authenticated-data (AD) flag plus an `RRSIG` on the answer. A resolver that does not set AD, or an answer with no RRSIG, is treated as a validation failure (fail-closed) rather than a pass.
|
|
129
|
+
- **crap-score Go/JS scoring backends repaired + 3 bash defects from the umbrella review (#66).**
|
|
130
|
+
- **Evidence-integrity bugs + SHA256 portability + kernel schema URL (#64).**
|
|
306
131
|
|
|
307
|
-
|
|
132
|
+
## [1.1.7] - 2026-06-08
|
|
308
133
|
|
|
309
|
-
-
|
|
310
|
-
- B1: `iep-shared-lint-configs` — `.audit-harness-configs/` for vendoring lint configs to consumer repos
|
|
311
|
-
- Plus 2 bundleable Gemini-found fixes from v1.1.2 review: `iah-gherkin-prev-blank-noise` + `iah-gherkin-single-awk-opt`
|
|
134
|
+
A CI-only patch keeping the dashboard evidence-emit job runnable.
|
|
312
135
|
|
|
313
|
-
|
|
136
|
+
### Fixed
|
|
314
137
|
|
|
315
|
-
|
|
138
|
+
- **`emit-evidence` job needs Node 22 for `--experimental-strip-types` (nr75.12) (#60).** The CI-only `emit-evidence` TypeScript runner uses Node's experimental type-stripping, which requires Node 22; the job's Node version is bumped accordingly. No published-artifact change — the `ci/` emitter is excluded from the npm tarball.
|
|
316
139
|
|
|
317
|
-
|
|
140
|
+
## [1.1.6] - 2026-06-08
|
|
318
141
|
|
|
319
|
-
|
|
142
|
+
A minor release: the read-only "comprehensive audit, on any repo" brain
|
|
143
|
+
(`classify` → `conform` → `audit` → `scan` → `currency`), the registry-projection +
|
|
144
|
+
FP-rate safety spine, and the CI-only kernel-emitting evidence path for the
|
|
145
|
+
dashboard (nr75.12) — all additive, with the zero-runtime-dependency guarantee
|
|
146
|
+
preserved. (Note: an earlier CHANGELOG draft attributed this PP-PLAN-040 verb set
|
|
147
|
+
to 1.2.0; it actually shipped here in 1.1.6 via PRs #52–#59.)
|
|
320
148
|
|
|
321
|
-
|
|
322
|
-
- **`scripts/emit-evidence.sh`**: `INPUT_HASH_HEX="$(echo "$STATEMENT" | python3 -c ...)"` (formerly line 238). SC2034: computed but never read. Vestige from an earlier cosign integration; the surrounding `BLOB_FILE` construction relies on `ARTIFACT_NAME` only.
|
|
323
|
-
- **`scripts/gherkin-lint.sh`**: `err()` helper function. SC2317: zero call sites in the file (verified via `grep -n "\berr\b"` — only the definition matches). The helper was defined symmetrically with `warn()` but never wired up to the awk rubric or the subprocess-fallback path. Replaced with `process_awk_output()` helper (see Fixed section below).
|
|
149
|
+
> **Why minor, not patch:** Multiple new read-only CLI verbs (`classify`, `conform`, `audit`, `scan`, `currency`) and new authored feature surfaces (the audit-profile data spec, the registry datum, the CI-only evidence emit). Per SemVer this is a minor bump. No CLI command was renamed or removed; the change is purely additive and the published tarball stays zero-runtime-dependency.
|
|
324
150
|
|
|
325
|
-
###
|
|
151
|
+
### Added
|
|
326
152
|
|
|
327
|
-
|
|
153
|
+
- **`classify` verb + `audit-profile/v1` data-spec (PP-PLAN-040 Phase 0+1) (#53).** `audit-harness classify [repo]` (`scripts/classify.py`, stdlib-only) is a read-only repository classifier: it detects the UNION of repo-type + Claude-artifact classifications, resolves the gate set against the canonical `schemas/audit-profile/registry.v1.json` datum, records `registry_hash`, and emits an `audit-profile/v1` value to stdout — **never writes to the repo**. The `audit-profile/v1` schema is closed, versioned, and hash-bearing, mirroring `gate-result/v1`; its four invariants: classifications are a UNION (not a winner), `unresolved[]` is the only Claude-refinable surface, `waived ⇒ disabled` (allOf-enforced), `registry_hash` makes a profile reproducible. Safety levers: an `INDETERMINATE` result class (infra failure ≠ policy failure), per-command timeout supervision via `AUDIT_HARNESS_TIMEOUT`, the `AUDIT_HARNESS_DISABLE=1` kill-switch, and an engineer-owned `.audit-harness.yml` override. `schemas/` now ships in the npm package (`files`).
|
|
154
|
+
- **`conform` verb + bundled content-addressed schemas (PP-PLAN-040 Phase 2) (#54).** `audit-harness conform [repo]` (`scripts/conform.py`, stdlib + PyYAML): for every `dimension: conformance` gate in the repo's `audit-profile/v1`, locates the artifact(s) and emits a `gate-result/v1` row — never writes, never live-fetches. Bundled content-addressed schemas (`schemas/conform/v1/`: `skillmd-frontmatter`, `mcp-config`, `plugin-manifest`, `agent-frontmatter`) form the deterministic structural floor, checked by an embedded subset validator (not ajv) for reproducible signed evidence; each schema's sha256 is recorded in the row's `policy_hash`. Genuinely-external formats shell out (OpenAPI → `spectral`, GitHub Action → `yamllint`); a missing tool produces ADVISORY indeterminate, never a false FAIL. Advisory-first; `--strict` (or an engineer-promoted blocking gate) turns a violation into FAIL.
|
|
155
|
+
- **`audit` testing-depth gate-runner (PP-PLAN-040 Phase 3 / E5) (#56).** `audit-harness audit [repo]` (`scripts/audit.py`, stdlib): for every `dimension: testing-depth` gate, runs the bundled `crap` scorer and per-pyramid-layer presence heuristics (unit/integration/e2e/smoke/perf/a11y/contract/migration/property-based/fuzz/sanitizers). Layer present → PASS; absent → ADVISORY(warn); not statically assessable → ADVISORY indeterminate. `--fast` (default, presence heuristics only) / `--deep` (adds crap-score) / `--strict` (gap on a blocking gate → FAIL). Deliberately does NOT execute the repo's test suite — running untrusted suites is the repo's own CI's job.
|
|
156
|
+
- **`scan` security/hygiene/skill-quality gate-runner (PP-PLAN-040 Phase 4 / E6) (#57).** `audit-harness scan [repo]` (`scripts/scan.py`, stdlib): for every `dimension: security | hygiene | skill-quality` gate, emits a `gate-result/v1` row via three strategies — local (deterministic README presence), shell-out (gitleaks / osv-scanner / semgrep / syft / markdownlint / lychee; clean → PASS, findings → ADVISORY(error), absent → ADVISORY indeterminate), and consume (`skill-behavioral` ingests a j-rig Evidence Bundle verdict via `--jrig-verdict`). Advisory-first; `--strict` turns a finding/gap into FAIL. **Security note:** on first run this gate caught — and this release redacts from HEAD — a PyPI publish token pasted as a literal value in `python/PUBLISH.md`. The value remains in git history and must be rotated at the registry (tracked separately); the doc now carries a placeholder.
|
|
157
|
+
- **`currency` advisory upstream-currency report (PP-PLAN-040 Phase 5 / E7) (#58).** `audit-harness currency` (`scripts/currency.py`, stdlib): reads the per-upstream-identity pin relation (`schemas/currency/pins.v1.json`) and reports which pins are themselves stale (`checked_at` older than the pin's staleness window). No exit-code authority (always exit 0), no live-fetch, no auto-fix — `/sync-testing-harness` consumes the report to open advisory bump PRs; it never reddens a build. `--today YYYY-MM-DD` makes reports reproducible.
|
|
158
|
+
- **Registry projection + FP-rate harness (PP-PLAN-040 E2: c2b + c2e) (#55).** `audit-harness gen-layer-applicability` projects `schemas/audit-profile/registry.v1.json` into `schemas/audit-profile/layer-applicability.md` (the doc is now a projection of the registry datum, not a hand-maintained parallel source — CI gate `layer-applicability-drift` enforces it). `audit-harness fp-rate` measures each gate's false-positive / false-negative rate over a labeled corpus — the metric that gates advisory→blocking promotion. `docs/gate-promotion.md` documents the FP-rate ≤ 5% promotion bar.
|
|
159
|
+
- **CI-only signed evidence emit for the intent-eval-dashboard (nr75.12) (#59).** `ci/emit-evidence.ts` + `ci/assemble-manifest.ts` run the real deterministic self-gate (`harness-hash --verify`), shape it into a kernel `gate-result/v1` + `EvidenceBundle` (fail-closed against `@intentsolutions/core`), cosign-sign the canonical bytes (Fulcio OIDC + Rekor), and assemble the `report-manifest.json` the dashboard reports hub (labs.intentsolutions.io) re-verifies at ingest. Zero-dep guarantee preserved: the emitter lives in `ci/` (excluded from `package.json#files`) and the kernel is installed CI-only via `npm i --no-save`.
|
|
328
160
|
|
|
329
|
-
|
|
330
|
-
- **Verification**: deliberate-failure test against a feature with `Scenario: ... \n And ...` produces exit code 1 + summary `0 warning(s), 1 error(s)` (was: exit 0 + `0 warning(s), 0 error(s)` while still printing the ERROR line). Clean feature still exits 0.
|
|
331
|
-
- **Separate-scope finding**: the third awk script contains a stray top-level `prev_blank = 1` that awk treats as an always-true pattern, triggering its default print-every-line action. That's a pre-existing cosmetic issue (extra noise in script output) but not a counter bug — filed as deferred scope.
|
|
161
|
+
### Changed
|
|
332
162
|
|
|
333
|
-
|
|
163
|
+
- **Finished the `intent-audit-harness` rename in public contributor docs (#52).**
|
|
334
164
|
|
|
335
|
-
|
|
165
|
+
## [1.1.5] - 2026-06-03
|
|
336
166
|
|
|
337
|
-
- `
|
|
338
|
-
- `version.txt`
|
|
339
|
-
- `python/pyproject.toml`
|
|
340
|
-
- `python/src/intent_audit_harness/__init__.py`
|
|
341
|
-
- `rust/Cargo.toml`
|
|
167
|
+
> **Why patch, not minor:** No new CLI commands, no new flags, no API change, no script behavior change. This is release-engineering + metadata: the publish pipeline that ships the existing `1.1.x` code, plus URL corrections for the repo rename, plus the install.sh glob fix. The pinned policy scripts (`.harness-hash`) are untouched.
|
|
342
168
|
|
|
343
|
-
###
|
|
169
|
+
### Added
|
|
344
170
|
|
|
345
|
-
|
|
171
|
+
- **npm release pipeline (closes the publish-pipeline gap).** This is the first release published to npm via CI with Sigstore provenance. Until now the repo had **no release workflow** — npm was stuck at `0.1.0` while the code (and every other manifest) had advanced through `1.0.0` → `1.1.4`, four minors of CHANGELOG-documented work that never reached consumers. `npm install @intentsolutions/audit-harness` resolved to the stale `0.1.0` tarball. New `.github/workflows/release.yml` mirrors the provenance approach of `intent-eval-core`'s release workflow, adapted for this zero-dependency polyglot CLI (no pnpm, no lockfile, no TS build). Triggers on `push` of a `v*.*.*` tag and on `workflow_dispatch`, sets `id-token: write` for npm/Sigstore OIDC, verifies the pushed tag matches `package.json#version`, runs the `--version` self-check + `escape-scan.sh --staged`, then `npm publish --provenance --access public`.
|
|
172
|
+
- **README badge row.** npm-version, License Apache-2.0, and Sigstore-provenance shields under the H1 (mirrors the `intent-eval-core` badge row). The "Part of the Intent Eval Platform" cross-link line is preserved.
|
|
346
173
|
|
|
347
|
-
###
|
|
174
|
+
### Changed
|
|
348
175
|
|
|
349
|
-
|
|
176
|
+
- **Version bumped to v1.1.5 across all 5 manifests.** Per the `version-canonical-check` CI gate (v1.0.2 PR #35). `package.json` (canonical), `version.txt`, `python/pyproject.toml`, `python/src/intent_audit_harness/__init__.py`, and `rust/Cargo.toml` all report `1.1.5`.
|
|
350
177
|
|
|
351
|
-
###
|
|
178
|
+
### Fixed
|
|
352
179
|
|
|
353
|
-
- `
|
|
354
|
-
- `
|
|
355
|
-
- `python3 -m py_compile scripts/crap-score.py` → exit 0
|
|
356
|
-
- `bash scripts/harness-hash.sh --verify` → harness-hash: OK after `--init`
|
|
357
|
-
- CI shellcheck job will now block on any future warning — try staging `cmd $var` (unquoted expansion) to verify the gate fires
|
|
180
|
+
- **Package metadata + `install.sh` URLs for the `intent-audit-harness` repo rename.** The GitHub repo was renamed `audit-harness` → `intent-audit-harness`, but the metadata still pointed at the old path. `package.json` (`homepage`, `repository.url`, `bugs.url`), `python/pyproject.toml` + `rust/Cargo.toml` project-URL fields, `python/src/intent_audit_harness/__init__.py` docstring source-link, `README.md` (the `curl … install.sh` line + two "Related" skill links), and `install.sh` (the `REPO=` variable, usage-comment URLs, re-run hint, and the default `VERSION` bumped `v0.1.0` → `v1.1.5`) were all repointed to the renamed repo.
|
|
181
|
+
- **`install.sh` tarball-path glob broke after the rename.** The GitHub archive tarball unpacks as `<repo>-<version>/`, which became `intent-audit-harness-1.1.5/` after the rename. The unpack-dir detection used `find … -name 'audit-harness-*'`, and `-name` matches the basename with no implicit leading wildcard, so it matched **nothing** under the new prefix — every vendored install would have failed. Changed the glob to `-name '*audit-harness-*'` (leading wildcard), matching both the current `intent-audit-harness-*` name and legacy `audit-harness-*` tags.
|
|
358
182
|
|
|
359
|
-
|
|
183
|
+
## [1.1.4] - 2026-05-25
|
|
360
184
|
|
|
361
|
-
|
|
185
|
+
> **Why patch, not minor:** Pure cleanup release: dead-code removal, perf microoptimization, bug fixes for cross-call inconsistencies, CI version pin. No new CLI commands, no new flags, no API change. AAR: `000-docs/009-AA-AACR-v1.1.4-cleanup-bundle-2026-05-25.md`.
|
|
362
186
|
|
|
363
|
-
|
|
187
|
+
### Changed
|
|
364
188
|
|
|
365
|
-
-
|
|
366
|
-
-
|
|
367
|
-
-
|
|
189
|
+
- **`gherkin-lint.sh process_awk_output()` collapsed to a single awk pass (Gemini #38 follow-up).** Closes `iah-gherkin-single-awk-opt` (P3). v1.1.2 introduced `process_awk_output()` with two awk subprocesses per call; v1.1.4 collapses to a single awk pass, halving the awk fork count (4 callsites × 2 subprocesses → 4). Verified with a mixed WARN+ERROR test.
|
|
190
|
+
- **Shellcheck CI job version-pinned (parity with ruff v1.1.3).** Closes `iah-shellcheck-version-pin` (P3). v1.1.2 installed shellcheck via `apt-get` which pulls whatever Ubuntu's runner image ships; v1.1.4 pins to `v0.10.0` downloaded from the koalaman/shellcheck GitHub releases so runner-image upgrades can't silently activate new rules. CI prints `shellcheck --version` for the audit trail.
|
|
191
|
+
- **Version bumped to v1.1.4 across all 5 manifests** and **`.harness-hash` regenerated** (2 of 9 pinned-file hashes change: `gherkin-lint.sh` + `crap-score.py`).
|
|
368
192
|
|
|
369
|
-
|
|
193
|
+
### Fixed
|
|
370
194
|
|
|
371
|
-
|
|
195
|
+
- **`gherkin-lint.sh` `prev_blank` print-every-line noise (Gemini #71 review chain).** Closes `iah-gherkin-prev-blank-noise` (P2). The third awk block (the And-at-scenario-start checker) opened with a bare `prev_blank = 1` expression that awk interpreted as an always-true pattern with implicit `{ print }` — flooding stdout with every line of every feature file alongside the intentional ERROR printf. `prev_blank` was never read anywhere; both touches were removed so the block produces ONLY the targeted ERROR line.
|
|
196
|
+
- **`crap-score.py` exclusion sets deduplicated via an `EXCLUDED_DIRS` constant (Gemini #71 review).** Closes `iah-crap-score-exclusion-dedup` (P2). Two separate sets with overlapping intent but divergent contents — `ignore` in `score_python()` (had `reports`, lacked `.next`/`.nuxt`/`.cache`) and `prune` in `main()` (had `.next`/`.nuxt`/`.cache`, lacked `reports`) — caused real asymmetric skips. Extracted to a single module-level `EXCLUDED_DIRS` union referenced by both call sites.
|
|
372
197
|
|
|
373
|
-
|
|
198
|
+
## [1.1.3] - 2026-05-25
|
|
374
199
|
|
|
375
|
-
|
|
376
|
-
- **`scripts/crap-score.py`** (missing `go` PATH guard): `score_go()` called `run(["go", "test", "-coverprofile=...", ...])` without first checking that `go` is on PATH, so on systems without Go installed the subprocess raised `FileNotFoundError` and aborted the whole CRAP pass. Wraps the call in the existing `which_or_none("go")` pattern already used for `radon`, `gocyclo`, and the downstream `go tool cover` invocation.
|
|
377
|
-
- **`scripts/crap-score.py`** (rglob walk pruning): the `--json` input-hash computation walked every file under `root` via `rglob("*")`, only filtering `node_modules` / `.venv` after the directory had been traversed. Replaces with `os.walk` + `dirs[:] = [...]` in-place pruning, skipping `.git`, `node_modules`, `.venv`/`venv`, `__pycache__`, `dist`, `build`, `target`, `.tox`, `.mypy_cache`, `.pytest_cache`, `.next`, `.nuxt`, `.cache`. Major perf win on large repos; no behavioral change to the resulting hash for repos without pruned-extension files under those directories.
|
|
378
|
-
- **`scripts/emit-evidence.sh`** (shell→Python path injection): `python3 -c "import json, sys; print(json.load(open('$PKG_JSON'))['version'])"` interpolated the shell variable directly into the Python source. Paths containing single quotes (or arbitrary characters in adversarial cases) broke the parse. Now passes `$PKG_JSON` via `sys.argv[1]` — `python3 -c "import json, sys; print(json.load(open(sys.argv[1]))['version'])" "$PKG_JSON"` — moving the path through the safe argv channel.
|
|
379
|
-
- **`scripts/bias-count.sh`** (per-file sha256sum fork): `find ... -exec sha256sum {} \;` spawned one `sha256sum` process per matched file. Changes the terminator to `+` so `find` batches arguments into one (or few) sha256sum invocations. Perf win on test suites with many files; output identical because the downstream `sort | sha256sum` step normalizes.
|
|
380
|
-
- **`scripts/harness-hash.sh`** (cross-platform sha256sum): GNU coreutils ships `sha256sum`, macOS ships `shasum -a 256`. Adds detection at script top selecting whichever is available into a `SHA256_CMD` bash array, falling back with a clear error if neither is on PATH. Both produce identical `<hash> <file>` output, so the manifest format and downstream `awk` parsing are byte-equivalent. Enables engineer-local runs on macOS without forcing every contributor to install coreutils.
|
|
200
|
+
> **Why patch, not minor:** Pure lint-gate addition + dead-code removal. No new CLI commands, no new flags, no API change. AAR: `000-docs/008-AA-AACR-ruff-iep-P6-2026-05-24.md`.
|
|
381
201
|
|
|
382
|
-
###
|
|
202
|
+
### Added
|
|
383
203
|
|
|
384
|
-
|
|
204
|
+
- **Ruff CI gate against own-code Python (IEP Convergence Debt Plan Priority 6 Phase A2).** Closes `iah-ruff` (P1). New `ci.yml` job `ruff (Python lint)` runs `ruff check` (version-pinned to 0.15.4 per the shellcheck-version-pin lesson) against the own-code Python surface. Ruleset `select = ["B", "E", "F"]` — pyflakes (F), pycodestyle errors (E), and flake8-bugbear (B) per Gemini PR #39 review. Line length 120. New `ruff.toml` at repo root scopes lint to `scripts/*.py` + the CLI files and excludes the bundled-content mirrors (stale-sync tracked separately).
|
|
385
205
|
|
|
386
|
-
|
|
387
|
-
- `version.txt`
|
|
388
|
-
- `python/pyproject.toml`
|
|
389
|
-
- `python/src/intent_audit_harness/__init__.py`
|
|
390
|
-
- `rust/Cargo.toml`
|
|
206
|
+
### Changed
|
|
391
207
|
|
|
392
|
-
|
|
208
|
+
- **Long-line reformat in `scripts/crap-score.py`.** The 155-char `ignore` set literal reformatted into a multi-line set literal under the 120-char limit. Cosmetic; no behavior change.
|
|
209
|
+
- **Version bumped to v1.1.3 across all 5 manifests** and **`.harness-hash` regenerated** (1 of 9 pinned-file hashes change: `crap-score.py`).
|
|
393
210
|
|
|
394
|
-
|
|
211
|
+
### Removed
|
|
395
212
|
|
|
396
|
-
|
|
213
|
+
- **3 ruff-surfaced dead-code findings.** `crap-score.py`: a redundant local `import hashlib, os` inside the `if args.json:` block (shadowing the used module-level `import os`, F401) was removed and `hashlib` moved to module-level imports per Gemini PR #39; and a dead local `metrics = …` in `score_rust()` (F841). `cli.py`: a dead `import os` (F401, zero `os.*` usages).
|
|
397
214
|
|
|
398
|
-
|
|
215
|
+
## [1.1.2] - 2026-05-24
|
|
399
216
|
|
|
400
|
-
|
|
217
|
+
> **Why patch, not minor:** Pure dead-code removal + a CI policy tightening. No new CLI commands, no new flags, no API change, no behavioral change for any consumer. AAR: `000-docs/007-AA-AACR-shellcheck-hard-fail-iep-P6-2026-05-24.md`.
|
|
401
218
|
|
|
402
|
-
|
|
219
|
+
### Changed
|
|
403
220
|
|
|
404
|
-
|
|
221
|
+
- **Shellcheck CI gate flipped from tolerant to hard-fail (IEP Convergence Debt Plan Priority 6 Phase A1).** Closes `iah-shellcheck-hard-fail` (P1). The shellcheck job previously ran `shellcheck scripts/*.sh || true` — findings were logged but never blocked the PR. The `|| true` suffix is removed: any shellcheck finding (warning or error) now blocks the build. The locked precondition was v1.1.1 (PR #37), which addressed the 6 Gemini-flagged robustness findings.
|
|
222
|
+
- **Version bumped to v1.1.2 across all 5 manifests** and **`.harness-hash` regenerated** (3 of 9 pinned-file hashes change).
|
|
405
223
|
|
|
406
|
-
###
|
|
224
|
+
### Removed
|
|
407
225
|
|
|
408
|
-
|
|
226
|
+
- **3 pieces of dead code surfaced by the harder shellcheck gate.** `bias-count.sh`: `declare -A PATTERN_COUNTS` + its per-call assignment (SC2034 — populated, never read). `emit-evidence.sh`: `INPUT_HASH_HEX=$(…)` (SC2034 — computed, never read; vestige of an earlier cosign integration). `gherkin-lint.sh`: the `err()` helper (SC2317 — zero call sites), replaced with `process_awk_output()`.
|
|
409
227
|
|
|
410
|
-
|
|
228
|
+
### Fixed
|
|
411
229
|
|
|
412
|
-
|
|
230
|
+
- **`gherkin-lint.sh` awk subprocess undercount (silent-failure class bug; Gemini PR #38 review).** The awk-fallback path printed `WARN`/`ERROR` lines via `awk printf`, but those subprocesses never incremented the parent shell's `WARN_COUNT`/`ERROR_COUNT` — the summary said "0 warnings, 0 errors" while errors were actively printed and the exit code stayed 0. Exactly the silent-failure class the linter exists to surface elsewhere. The new `process_awk_output()` helper wraps each awk subprocess, counts `WARN`/`ERROR` lines via inline awk, increments the bash counters, then re-prints. Verified: a deliberate failure now exits 1 with `0 warning(s), 1 error(s)`.
|
|
413
231
|
|
|
414
|
-
|
|
232
|
+
## [1.1.1] - 2026-05-23
|
|
415
233
|
|
|
416
|
-
|
|
417
|
-
- **`.harness-hash-extra-patterns`** (NEW, audit-harness repo root): pins `scripts/*.sh`, `scripts/*.py`, `bin/audit-harness.js`, and the extras file itself (preventing silent edits to the self-pinning scope).
|
|
418
|
-
- **`.harness-hash`** (NEW, audit-harness repo root): 9-file manifest produced by `bash scripts/harness-hash.sh --init`. Committed to main.
|
|
419
|
-
- **`.github/workflows/ci.yml`**: `audit-harness list` + `harness-hash --verify` self-check steps drop `|| true` suffixes. Hard-fail in place. Comment block updated.
|
|
234
|
+
> **Why patch, not minor:** Pure bug + portability fixes. No new flags, no new commands, no policy change, no breaking change to the manifest format. These scripts are now vendored into `intent-eval-lab` (PR #67); landing the fixes before the rollout reaches more repos avoids re-publishing buggy vendored copies.
|
|
420
235
|
|
|
421
|
-
###
|
|
236
|
+
### Fixed
|
|
422
237
|
|
|
423
|
-
|
|
238
|
+
- **6 script robustness + portability fixes (IEP Convergence Debt Plan Priority 3).** Closes `iah-script-robustness-upstream` (P2). Addresses the 6 medium-severity Gemini findings surfaced when the scripts were vendored into `intent-eval-lab` (PR #67). All fixes are upstream-only — zero CLI surface, runtime-dep, or policy change:
|
|
239
|
+
- **`escape-scan.sh`** (mktemp leak): adds `trap 'rm -f "$DIFF_SRC"' EXIT` after each `mktemp` so the temp file is removed on every exit path (matters most when escape-scan runs as a local git hook).
|
|
240
|
+
- **`crap-score.py`** (missing `go` PATH guard): `score_go()` now wraps the `go test` call in the existing `which_or_none("go")` pattern, so a system without Go no longer raises `FileNotFoundError` and aborts the whole CRAP pass.
|
|
241
|
+
- **`crap-score.py`** (rglob walk pruning): the `--json` input-hash walk now uses `os.walk` + in-place `dirs[:]` pruning (skipping `.git`, `node_modules`, `.venv`/`venv`, `__pycache__`, `dist`, `build`, `target`, `.tox`, `.mypy_cache`, `.pytest_cache`, `.next`, `.nuxt`, `.cache`) — a major perf win on large repos with no hash change for clean repos.
|
|
242
|
+
- **`emit-evidence.sh`** (shell→Python path injection): the package-version read now passes `$PKG_JSON` via `sys.argv[1]` instead of interpolating the shell variable into the Python source, so paths containing single quotes no longer break the parse.
|
|
243
|
+
- **`bias-count.sh`** (per-file sha256sum fork): `find … -exec sha256sum {} \;` changed to `… +` so `find` batches arguments into one (or few) invocations — output identical (the downstream `sort | sha256sum` normalizes).
|
|
244
|
+
- **`harness-hash.sh`** (cross-platform sha256sum): adds detection selecting `sha256sum` (GNU) or `shasum -a 256` (macOS) into a `SHA256_CMD` array, enabling engineer-local runs on macOS without coreutils.
|
|
424
245
|
|
|
425
|
-
###
|
|
246
|
+
### Changed
|
|
426
247
|
|
|
427
|
-
|
|
248
|
+
- **Version bumped to v1.1.1 across all 5 manifests** and **`.harness-hash` regenerated** (4 of 9 pinned-file hashes change). AAR: `000-docs/006-AA-AACR-script-robustness-upstream-iep-P3-2026-05-23.md`.
|
|
428
249
|
|
|
429
|
-
|
|
250
|
+
## [1.1.0] - 2026-05-22
|
|
430
251
|
|
|
431
|
-
|
|
252
|
+
> **Why minor, not patch:** The `.harness-hash-extra-patterns` mechanism is a new authored feature surface — repos that opt in get a new capability. Before this release the audit-harness CI workflow could not enforce its own policy; a silent edit to `escape-scan.sh` (the gate that REFUSES threshold-lowering changes) would pass CI. That is the failure mode this release closes.
|
|
432
253
|
|
|
433
|
-
###
|
|
254
|
+
### Added
|
|
434
255
|
|
|
435
|
-
Per
|
|
256
|
+
- **Per-repo `.harness-hash-extra-patterns` mechanism + audit-harness self-pin (IEP Convergence Debt Plan Priority 3).** Closes `iah-self-pin` (P1). The harness's own policy-enforcement surface (`scripts/*.sh` + `scripts/*.py` + `bin/audit-harness.js`) is now hash-pinned at the repo root. CI's `audit-harness list` + `harness-hash --verify` self-check steps flip from `|| true` exit-3 tolerance to hard-fail: any byte change to a pinned policy file without a fresh `--init` + commit of the regenerated `.harness-hash` exits 2 (HARNESS_TAMPERED) and blocks the PR.
|
|
257
|
+
- **`scripts/harness-hash.sh`** (new): reads an optional `.harness-hash-extra-patterns` file at the repo root and appends its lines to the default PATTERNS array. Backward-compatible — repos without the file get exactly the previous behavior.
|
|
258
|
+
- **`.harness-hash-extra-patterns`** (new): pins `scripts/*.sh`, `scripts/*.py`, `bin/audit-harness.js`, and the extras file itself.
|
|
259
|
+
- **`.harness-hash`** (new): 9-file manifest produced by `bash scripts/harness-hash.sh --init`, committed to main.
|
|
260
|
+
- **`.github/workflows/ci.yml`**: the self-check steps drop their `|| true` suffixes.
|
|
436
261
|
|
|
437
|
-
|
|
262
|
+
### Changed
|
|
438
263
|
|
|
439
|
-
|
|
264
|
+
- **Version bumped to v1.1.0 across all 5 manifests.** Per the `version-canonical-check` CI gate landed in v1.0.2 (PR #35). AAR: `000-docs/005-AA-AACR-iah-self-pin-iep-P3-2026-05-22.md`.
|
|
440
265
|
|
|
441
|
-
|
|
266
|
+
## [1.0.2] - 2026-05-21
|
|
442
267
|
|
|
443
|
-
|
|
268
|
+
### Changed
|
|
444
269
|
|
|
445
|
-
- `package.json
|
|
446
|
-
-
|
|
447
|
-
- `python/pyproject.toml`: version `0.1.0` → `1.0.2`; license `MIT` → `Apache-2.0`; PyPI classifier updated to "License :: OSI Approved :: Apache Software License"; `[tool.hatch.build.targets.sdist].include` adds `/LICENSE` + `/NOTICE` per Apache-2.0 § 4
|
|
448
|
-
- `python/src/intent_audit_harness/__init__.py`: `__version__` `0.1.0` → `1.0.2`
|
|
449
|
-
- `rust/Cargo.toml`: version `0.1.0` → `1.0.2`; license `MIT` → `Apache-2.0`; `include` adds `NOTICE` per Apache-2.0 § 4
|
|
450
|
-
- `rust/Cargo.lock`: package entry version `1.0.1` → `1.0.2` (file is gitignored but the working-tree state is consistent for cargo builds)
|
|
451
|
-
- `.github/workflows/ci.yml`: NEW `version-canonical-check` job — fails if any of the 5 tracked version locations diverge from `package.json`, or if any non-npm manifest carries a non-`Apache-2.0` license. The gate also includes a robustness check for `rust/Cargo.lock` (currently gitignored; no-ops gracefully when the file isn't present in CI checkout).
|
|
270
|
+
- **Polyglot manifest alignment + Apache-2.0 NOTICE inclusion in distributions (IEP Convergence Debt Plan Priority 3).** Aligned all polyglot manifests at version `1.0.2`, bumping from npm `v1.0.1` → `v1.0.2` (rather than aligning the PyPI/crates wrappers to npm's `v1.0.1`) so all four registries publish lockstep from this release forward — preserving the immutability of the already-shipped npm `v1.0.1` tarball. Per-file: `package.json` `1.0.1` → `1.0.2`; `version.txt` `0.2.0` → `1.0.2`; `python/pyproject.toml` `0.1.0` → `1.0.2` (license `MIT` → `Apache-2.0`, classifier updated, sdist `include` adds `/LICENSE` + `/NOTICE`); `python/src/intent_audit_harness/__init__.py` `__version__` → `1.0.2`; `rust/Cargo.toml` `0.1.0` → `1.0.2` (license `MIT` → `Apache-2.0`, `include` adds `NOTICE`); `rust/Cargo.lock` package entry `1.0.1` → `1.0.2`.
|
|
271
|
+
- Folded NOTICE-file inclusion into the Python sdist + Rust crate distributions per Apache-2.0 § 4. No CLI surface or runtime behavior changes — pure metadata + packaging alignment.
|
|
452
272
|
|
|
453
|
-
|
|
273
|
+
### Added
|
|
454
274
|
|
|
455
|
-
|
|
275
|
+
- **`version-canonical-check` CI job (#35).** Fails if any of the 5 tracked version locations diverge from `package.json`, or if any non-npm manifest carries a non-`Apache-2.0` license. Includes a robustness check for the gitignored `rust/Cargo.lock`. Closes `iah-version-drift`, `iah-license-drift`, `iah-version-canonical-check`. AAR: `000-docs/004-AA-AACR-polyglot-version-license-alignment-2026-05-21.md`.
|
|
456
276
|
|
|
457
|
-
|
|
458
|
-
- **PyPI + crates.io** users: this is the first published `v1.0.2` and the first published Apache-2.0 release on these registries. The prior published `0.1.0` artifacts pre-date the `v1.0.0` Apache-2.0 relicense and remain available under their original MIT terms (registry tarballs are immutable). From `v1.0.2` forward all four registries publish lockstep at the same SemVer.
|
|
277
|
+
## [1.0.1] - 2026-05-20
|
|
459
278
|
|
|
460
|
-
|
|
279
|
+
### Fixed
|
|
461
280
|
|
|
462
|
-
|
|
281
|
+
- **NOTICE in the published tarball.** Added `NOTICE` to `package.json#files` so the file ships in the npm tarball alongside `LICENSE`. Per Apache 2.0 § 4, derivatives must carry the NOTICE file's attribution text if one exists in the source. `v1.0.0` shipped the relicense to Apache 2.0 but the tarball only carried `LICENSE` — this corrects that omission. No code, behavior, CLI, or dependency changes — packaging-only patch.
|
|
463
282
|
|
|
464
|
-
|
|
283
|
+
## [1.0.0] - 2026-05-19
|
|
465
284
|
|
|
466
|
-
|
|
285
|
+
### Changed
|
|
467
286
|
|
|
468
|
-
|
|
287
|
+
- **Relicensed from MIT to Apache 2.0 (BREAKING) (#32).** Deliberate alignment with the rest of the Intent Eval Platform ecosystem (`intent-eval-lab`, `intent-eval-core`) so every repo ships under a single OSI-approved license with explicit patent-grant language. Existing `0.x` releases on npm remain available under their original MIT terms (npm tarballs are immutable); all releases `>= 1.0.0` are Apache 2.0. README license section updated with a backward-compat note. No code, CLI surface, behavior, or runtime-dependency changes — license-only bump cut as MAJOR for legal clarity and consumer-review signaling.
|
|
288
|
+
- **Terminology: matcher-map → Intentional Mapping (per ISEDC v2).**
|
|
469
289
|
|
|
470
|
-
###
|
|
290
|
+
### Added
|
|
471
291
|
|
|
472
|
-
- **
|
|
473
|
-
- Existing `0.x` releases on npm remain available under their original MIT terms (npm tarballs are immutable). All releases `>= 1.0.0` are Apache 2.0.
|
|
474
|
-
- Added `NOTICE` file per Apache 2.0 best practice with copyright attribution and license summary.
|
|
475
|
-
- README license section updated to reflect the change with a backward-compat note.
|
|
292
|
+
- **`NOTICE` file** per Apache 2.0 best practice with copyright attribution and license summary.
|
|
476
293
|
|
|
477
|
-
|
|
294
|
+
## [0.3.0] - 2026-05-12
|
|
478
295
|
|
|
479
|
-
|
|
296
|
+
> Documented for completeness — the `--json` + `emit-evidence` work landed in the
|
|
297
|
+
> source tree as the v0.3.0 milestone but a `v0.3.0` git tag was never cut; the next
|
|
298
|
+
> published tag was `v1.0.0`. Kept here so the Milestone-2 capability set is not lost.
|
|
299
|
+
>
|
|
300
|
+
> **Notes:**
|
|
301
|
+
>
|
|
302
|
+
> - **No breaking changes.** Pre-v0.3.0 callers see identical text-mode output and exit codes; `--json` is purely additive.
|
|
303
|
+
> - **CISO gate (per ISEDC v1 Q1, 2026-05-10):** pushing a signed Statement to Rekor against `evals.intentsolutions.io/gate-result/v1` is BLOCKED until DNSSEC + CAA records are verified on the namespace.
|
|
480
304
|
|
|
481
|
-
### Added
|
|
305
|
+
### Added
|
|
482
306
|
|
|
483
|
-
- `--json` flag on every gate (`escape-scan`, `harness-hash --verify`, `arch`, `bias`,
|
|
484
|
-
|
|
485
|
-
|
|
486
|
-
-
|
|
487
|
-
augments it with `timestamp`, `runner`, `commit_sha`, and emits a complete
|
|
488
|
-
[in-toto Statement v1](https://github.com/in-toto/attestation/blob/main/spec/v1/statement.md)
|
|
489
|
-
with `predicateType` `https://evals.intentsolutions.io/gate-result/v1` per
|
|
490
|
-
[`evidence-bundle/v0.1.0-draft/SPEC.md`](https://github.com/jeremylongshore/intent-eval-lab/blob/main/specs/evidence-bundle/v0.1.0-draft/SPEC.md).
|
|
491
|
-
Optional `--sign` (cosign keyless or `--key`), `--rekor-url` for transparency-log push.
|
|
492
|
-
OTel `agent.rollout.gate.evaluated` event when `AUDIT_HARNESS_OTEL=1` or
|
|
493
|
-
`OTEL_EXPORTER_OTLP_ENDPOINT` set (best-effort no-op otherwise).
|
|
494
|
-
- `SEMVER.md` — explicit SemVer commitment doc covering exit codes, stream contracts,
|
|
495
|
-
and the predicate URI freeze.
|
|
496
|
-
- `tests/regression/run-regression.sh` — backward-compat regression suite. 11 checks
|
|
497
|
-
across text-mode parity, `--json` stream separation, schema validation, and the
|
|
498
|
-
`emit-evidence` pipeline.
|
|
499
|
-
- CI: `regression` job in `.github/workflows/ci.yml` runs the regression suite on every PR.
|
|
307
|
+
- **Evidence Bundle emission (Milestone 2 of the build journey).** A `--json` flag on every gate (`escape-scan`, `harness-hash --verify`, `arch`, `bias`, `gherkin-lint`, `crap`) emits a machine-readable gate-result envelope to stdout while preserving the existing human-readable text on stderr; exit codes unchanged.
|
|
308
|
+
- **`emit-evidence` subcommand.** Reads a gate-result envelope from stdin (or `--input`), augments it with `timestamp`, `runner`, `commit_sha`, and emits a complete [in-toto Statement v1](https://github.com/in-toto/attestation/blob/main/spec/v1/statement.md) with `predicateType` `https://evals.intentsolutions.io/gate-result/v1`. Optional `--sign` (cosign keyless or `--key`) + `--rekor-url`. OTel `agent.rollout.gate.evaluated` event when `AUDIT_HARNESS_OTEL=1` or `OTEL_EXPORTER_OTLP_ENDPOINT` is set.
|
|
309
|
+
- **`SEMVER.md`** — explicit SemVer commitment doc covering exit codes, stream contracts, and the predicate-URI freeze.
|
|
310
|
+
- **`tests/regression/run-regression.sh`** — backward-compat regression suite (11 checks across text-mode parity, `--json` stream separation, schema validation, and the `emit-evidence` pipeline), wired into a `regression` CI job.
|
|
500
311
|
|
|
501
312
|
### Changed
|
|
502
313
|
|
|
503
|
-
-
|
|
504
|
-
-
|
|
505
|
-
(the prior single-line `{"tool","status","violations","log"}` was internal — no
|
|
506
|
-
documented adopter parsed it).
|
|
507
|
-
|
|
508
|
-
### Notes
|
|
314
|
+
- **`bin/audit-harness.js`** dispatcher exposes the new `emit-evidence` subcommand.
|
|
315
|
+
- **`scripts/arch-check.sh`** `--json` output reshaped to the gate-result envelope shape.
|
|
509
316
|
|
|
510
|
-
|
|
511
|
-
codes. The `--json` flag is purely additive.
|
|
512
|
-
- **CISO gate (per ISEDC v1 Q1, 2026-05-10):** pushing a signed Statement to Rekor
|
|
513
|
-
against `evals.intentsolutions.io/gate-result/v1` is BLOCKED until DNSSEC + CAA
|
|
514
|
-
records are verified on the namespace. The script supports unsigned envelope
|
|
515
|
-
emission until that gate clears (tracked in `intent-eval-lab/.beads/` as `iel-4zr`).
|
|
516
|
-
- **Plan reference:** `~/.claude/plans/se-the-council-bubbly-frog.md` Milestone 2.
|
|
317
|
+
## [0.2.0] - 2026-05-10
|
|
517
318
|
|
|
518
|
-
|
|
319
|
+
### Added
|
|
519
320
|
|
|
520
|
-
-
|
|
521
|
-
- docs: fill baseline OSS governance gaps via /repo-dress (closes #10) (29a8520)
|
|
522
|
-
- docs: Part 2 Workstream A upgrade landscape (c967f3e)
|
|
523
|
-
- docs(CLAUDE.md): add three-repo convergence section (b8255a3)
|
|
524
|
-
- infra: convergence Phase A.0 + A — bd init, GH templates, CI workflow, design notes (8f30db4)
|
|
525
|
-
- bd init: initialize beads issue tracking (ffc7597)
|
|
526
|
-
- feat: add PyPI and crates.io wrappers for audit-harness (9b97217)
|
|
321
|
+
- **PyPI and crates.io wrappers for audit-harness** (9b97217) — the polyglot trifecta (npm + PyPI + crates) begins here.
|
|
527
322
|
|
|
528
|
-
|
|
323
|
+
### Changed
|
|
529
324
|
|
|
530
|
-
|
|
531
|
-
and
|
|
325
|
+
- **Filled baseline OSS governance gaps via `/repo-dress` (#11).** Completed the `/repo-dress` 21-file canon, including the `release.yml` workflow (#15).
|
|
326
|
+
- **Convergence Phase A.0 + A scaffolding** (8f30db4) — bd issue-tracking init, GitHub issue templates, CI workflow, and the three-repo convergence design notes / CLAUDE.md section (b8255a3, ffc7597).
|
|
327
|
+
- **Part 2 Workstream A upgrade-landscape docs (#9).**
|
|
532
328
|
|
|
533
|
-
## [0.1.0]
|
|
329
|
+
## [0.1.0] - 2026-04-21
|
|
534
330
|
|
|
535
331
|
Initial release. Extracted from the `audit-tests` Claude Code skill v7.0.0 to enable in-repo enforcement without global skill installation.
|
|
536
332
|
|
|
333
|
+
> **Key design decisions:**
|
|
334
|
+
>
|
|
335
|
+
> - **Scripts stay as shell/python** — not a TypeScript port; battle-tested, language-portable, minimal dependencies.
|
|
336
|
+
> - **Thin Node CLI** — `bin/audit-harness.js` is a dispatcher only; all logic lives in `scripts/`.
|
|
337
|
+
> - **Policy-driven thresholds** — `escape-scan.sh` reads floors from `tests/TESTING.md` in the target repo, not from the script source.
|
|
338
|
+
> - **Zero runtime dependencies** beyond Node 18+, bash, and Python 3 (only if using `crap`).
|
|
339
|
+
|
|
537
340
|
### Added
|
|
538
341
|
|
|
539
|
-
-
|
|
540
|
-
-
|
|
541
|
-
-
|
|
542
|
-
-
|
|
543
|
-
-
|
|
544
|
-
-
|
|
545
|
-
-
|
|
546
|
-
-
|
|
547
|
-
|
|
548
|
-
|
|
549
|
-
|
|
550
|
-
|
|
551
|
-
|
|
552
|
-
|
|
553
|
-
|
|
342
|
+
- **`audit-harness verify`** — SHA-256 hash verification for pinned policy files.
|
|
343
|
+
- **`audit-harness init`** — initialize / re-init the `.harness-hash` manifest.
|
|
344
|
+
- **`audit-harness list`** — list pinned files.
|
|
345
|
+
- **`audit-harness escape-scan`** — detect AI escape patterns in a diff (coverage-threshold lowering, test deletion, architecture bypasses, test-skip markers).
|
|
346
|
+
- **`audit-harness arch`** — dispatch the language-appropriate architecture checker (dependency-cruiser / import-linter / ArchUnit / deptrac / arch-go).
|
|
347
|
+
- **`audit-harness bias`** — count common test-bias patterns.
|
|
348
|
+
- **`audit-harness gherkin-lint`** — advisory Gherkin quality check.
|
|
349
|
+
- **`audit-harness crap`** — CRAP (Complexity × Coverage) scorer for Python, JS/TS, Go, Rust.
|
|
350
|
+
|
|
351
|
+
[Unreleased]: https://github.com/jeremylongshore/intent-audit-harness/compare/v1.2.2...HEAD
|
|
352
|
+
[1.2.2]: https://github.com/jeremylongshore/intent-audit-harness/compare/v1.2.1...v1.2.2
|
|
353
|
+
[1.2.1]: https://github.com/jeremylongshore/intent-audit-harness/compare/v1.2.0...v1.2.1
|
|
354
|
+
[1.2.0]: https://github.com/jeremylongshore/intent-audit-harness/compare/v1.1.8...v1.2.0
|
|
355
|
+
[1.1.8]: https://github.com/jeremylongshore/intent-audit-harness/compare/v1.1.7...v1.1.8
|
|
356
|
+
[1.1.7]: https://github.com/jeremylongshore/intent-audit-harness/compare/v1.1.6...v1.1.7
|
|
357
|
+
[1.1.6]: https://github.com/jeremylongshore/intent-audit-harness/compare/v1.1.5...v1.1.6
|
|
358
|
+
[1.1.5]: https://github.com/jeremylongshore/intent-audit-harness/compare/v1.1.4...v1.1.5
|
|
359
|
+
[1.1.4]: https://github.com/jeremylongshore/intent-audit-harness/compare/v1.1.3...v1.1.4
|
|
360
|
+
[1.1.3]: https://github.com/jeremylongshore/intent-audit-harness/compare/v1.1.2...v1.1.3
|
|
361
|
+
[1.1.2]: https://github.com/jeremylongshore/intent-audit-harness/compare/v1.1.1...v1.1.2
|
|
362
|
+
[1.1.1]: https://github.com/jeremylongshore/intent-audit-harness/compare/v1.1.0...v1.1.1
|
|
363
|
+
[1.1.0]: https://github.com/jeremylongshore/intent-audit-harness/compare/v1.0.2...v1.1.0
|
|
364
|
+
[1.0.2]: https://github.com/jeremylongshore/intent-audit-harness/compare/v1.0.1...v1.0.2
|
|
365
|
+
[1.0.1]: https://github.com/jeremylongshore/intent-audit-harness/compare/v1.0.0...v1.0.1
|
|
366
|
+
[1.0.0]: https://github.com/jeremylongshore/intent-audit-harness/compare/v0.2.0...v1.0.0
|
|
367
|
+
[0.3.0]: https://github.com/jeremylongshore/intent-audit-harness/compare/v0.2.0...v1.0.0
|
|
368
|
+
[0.2.0]: https://github.com/jeremylongshore/intent-audit-harness/compare/v0.1.0...v0.2.0
|
|
369
|
+
[0.1.0]: https://github.com/jeremylongshore/intent-audit-harness/releases/tag/v0.1.0
|
package/README.md
CHANGED
|
@@ -10,7 +10,7 @@ Deterministic test-enforcement toolkit. Companion to the `audit-tests` and `impl
|
|
|
10
10
|
|
|
11
11
|
## What it is
|
|
12
12
|
|
|
13
|
-
A small CLI
|
|
13
|
+
A small CLI dispatching 17 deterministic commands (shell + stdlib-Python scripts):
|
|
14
14
|
|
|
15
15
|
| Command | Purpose |
|
|
16
16
|
|---|---|
|
|
@@ -18,10 +18,19 @@ A small CLI wrapping 6 deterministic scripts:
|
|
|
18
18
|
| `audit-harness init` | Pin the current state of engineer-owned policy files |
|
|
19
19
|
| `audit-harness list` | Show pinned files |
|
|
20
20
|
| `audit-harness escape-scan --staged` | Detect AI attempts to lower test thresholds, delete tests, bypass architecture rules |
|
|
21
|
+
| `audit-harness cred-gate` | Provider-credential PASS/FAIL gate — FAIL if a declared secret, provider-key shape, or serialized env leaks into the artifact about to be signed |
|
|
21
22
|
| `audit-harness arch` | Run language-appropriate architecture-rule checker (dependency-cruiser / import-linter / ArchUnit / deptrac / arch-go) |
|
|
22
23
|
| `audit-harness bias` | Count common test-bias patterns |
|
|
23
24
|
| `audit-harness gherkin-lint` | Advisory Gherkin quality check |
|
|
24
25
|
| `audit-harness crap` | CRAP (Complexity × Coverage) scorer — Python, Go, JS/TS, Rust |
|
|
26
|
+
| `audit-harness emit-evidence` | Wrap a gate-result JSON envelope in an in-toto Statement v1 (predicate `gate-result/v1`) |
|
|
27
|
+
| `audit-harness classify` | Read-only repo classifier → an `audit-profile/v1` value (never writes) |
|
|
28
|
+
| `audit-harness conform` | Read-only conformance gate-runner → `gate-result/v1` rows against bundled content-addressed schemas |
|
|
29
|
+
| `audit-harness audit` | Read-only testing-depth gate-runner → coverage presence per pyramid layer + crap-score |
|
|
30
|
+
| `audit-harness scan` | Read-only security/hygiene/skill-quality gate-runner (gitleaks / osv-scanner / Semgrep / syft / markdownlint / lychee) |
|
|
31
|
+
| `audit-harness fp-rate` | Measure each gate's false-positive / false-negative rate over a labeled corpus |
|
|
32
|
+
| `audit-harness currency` | Advisory poll-freshness report over the per-upstream pin relation |
|
|
33
|
+
| `audit-harness gen-layer-applicability` | Project the canonical audit-profile registry into `layer-applicability.md` |
|
|
25
34
|
|
|
26
35
|
## Install
|
|
27
36
|
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@intentsolutions/audit-harness",
|
|
3
|
-
"version": "1.2.
|
|
3
|
+
"version": "1.2.3",
|
|
4
4
|
"description": "Deterministic test-enforcement harness — escape-scan, hash-pinning, CRAP, architecture checks, bias detection, Gherkin lint. Companion to the audit-tests and implement-tests Claude Code skills.",
|
|
5
5
|
"license": "Apache-2.0",
|
|
6
6
|
"author": "Jeremy Longshore <jeremy@intentsolutions.io>",
|
package/scripts/emit-evidence.sh
CHANGED
|
@@ -138,30 +138,95 @@ TIMESTAMP=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
|
|
|
138
138
|
STATEMENT=$(GATE_JSON="$GATE_JSON" PREDICATE_URI="$PREDICATE_URI" STATEMENT_TYPE="$STATEMENT_TYPE" \
|
|
139
139
|
RUNNER="$RUNNER" COMMIT_SHA="$COMMIT_SHA" TIMESTAMP="$TIMESTAMP" \
|
|
140
140
|
python3 - <<'PY'
|
|
141
|
-
import json, os, sys
|
|
141
|
+
import json, os, re, sys
|
|
142
142
|
|
|
143
143
|
gate = json.loads(os.environ["GATE_JSON"])
|
|
144
144
|
|
|
145
|
+
# Kernel _common.schema.json#/$defs/semver
|
|
146
|
+
_SEMVER_RE = re.compile(r"^[0-9]+\.[0-9]+\.[0-9]+(-[A-Za-z0-9.-]+)?(\+[A-Za-z0-9.-]+)?$")
|
|
147
|
+
|
|
145
148
|
required = ["gate_id", "result", "input_hash", "policy_hash"]
|
|
146
149
|
missing = [k for k in required if k not in gate]
|
|
147
150
|
if missing:
|
|
148
151
|
sys.stderr.write(f"emit-evidence: gate-result missing required keys: {missing}\n")
|
|
149
152
|
sys.exit(1)
|
|
150
153
|
|
|
151
|
-
#
|
|
154
|
+
# Build the canonical gate-result/v1 predicate body (Blueprint B § 7.4 / kernel
|
|
155
|
+
# GateResultV1Schema). The inbound gate JSON is the legacy/draft envelope
|
|
156
|
+
# (gate_id/result/policy_hash/input_hash[/metadata]); map + synthesize the
|
|
157
|
+
# canonical fields. The kernel schema FORBIDS additionalProperties, so the legacy
|
|
158
|
+
# `result`/`timestamp` keys are REPLACED, not augmented. Mirrors the kernel-valid
|
|
159
|
+
# self-gate emitter ci/emit-evidence.ts:buildGateResult.
|
|
160
|
+
metadata = gate.get("metadata") or {}
|
|
161
|
+
|
|
162
|
+
# result (legacy UPPERCASE) / gate_decision (canonical) -> closed enum.
|
|
163
|
+
_DECISION_MAP = {"pass": "pass", "fail": "fail", "advisory": "advisory", "error": "error"}
|
|
164
|
+
decision_raw = str(gate.get("gate_decision", gate.get("result", ""))).strip().lower()
|
|
165
|
+
gate_decision = _DECISION_MAP.get(decision_raw, "error")
|
|
166
|
+
|
|
167
|
+
# gate_name: kebab-case short name; fall back to the last ':' segment of gate_id.
|
|
168
|
+
gate_name = gate.get("gate_name") or gate["gate_id"].rsplit(":", 1)[-1]
|
|
169
|
+
|
|
170
|
+
# gate_version: SemVer; fall back to the runner's semver (<tool>@X.Y.Z). The
|
|
171
|
+
# kernel pattern is strict, so a non-SemVer runner suffix (e.g. '@unknown')
|
|
172
|
+
# degrades to 0.0.0 rather than emitting a row that fails kernel validation.
|
|
173
|
+
gate_version = gate.get("gate_version")
|
|
174
|
+
if not gate_version:
|
|
175
|
+
_runner = os.environ["RUNNER"]
|
|
176
|
+
gate_version = _runner.split("@", 1)[1] if "@" in _runner else ""
|
|
177
|
+
if not _SEMVER_RE.match(str(gate_version)):
|
|
178
|
+
gate_version = "0.0.0"
|
|
179
|
+
|
|
180
|
+
# gate_reasons: empty array permitted ONLY for unconditional pass; otherwise >=1.
|
|
181
|
+
reasons = gate.get("gate_reasons")
|
|
182
|
+
if not reasons:
|
|
183
|
+
if gate_decision == "pass":
|
|
184
|
+
reasons = []
|
|
185
|
+
else:
|
|
186
|
+
reasons = [str(metadata.get("reason") or gate.get("failure_mode")
|
|
187
|
+
or f"{gate_name}: {gate_decision}")]
|
|
188
|
+
|
|
189
|
+
# coverage: BOTH arrays REQUIRED. Pass an inbound coverage through only when both
|
|
190
|
+
# keys are present AND lists (a half-populated dict would fail kernel validation);
|
|
191
|
+
# otherwise synthesize. An indeterminate row records the dimension as skipped.
|
|
192
|
+
_cov = gate.get("coverage")
|
|
193
|
+
if (isinstance(_cov, dict)
|
|
194
|
+
and isinstance(_cov.get("dimensions_evaluated"), list)
|
|
195
|
+
and isinstance(_cov.get("dimensions_skipped"), list)):
|
|
196
|
+
coverage = {"dimensions_evaluated": _cov["dimensions_evaluated"],
|
|
197
|
+
"dimensions_skipped": _cov["dimensions_skipped"]}
|
|
198
|
+
else:
|
|
199
|
+
_dim = str(metadata.get("kind") or gate_name)
|
|
200
|
+
if metadata.get("indeterminate"):
|
|
201
|
+
coverage = {"dimensions_evaluated": [], "dimensions_skipped": [_dim]}
|
|
202
|
+
else:
|
|
203
|
+
coverage = {"dimensions_evaluated": [_dim], "dimensions_skipped": []}
|
|
204
|
+
|
|
205
|
+
# policy_ref: `sha256:<hex>:<path>` — append an artifact/schema path to policy_hash.
|
|
206
|
+
policy_ref = gate.get("policy_ref")
|
|
207
|
+
if not policy_ref:
|
|
208
|
+
_path = metadata.get("artifact_path") or metadata.get("schema_id") or ".harness-hash"
|
|
209
|
+
policy_ref = f'{gate["policy_hash"]}:{_path}'
|
|
210
|
+
|
|
152
211
|
predicate = {
|
|
153
|
-
"gate_id":
|
|
154
|
-
"
|
|
155
|
-
"
|
|
156
|
-
"
|
|
157
|
-
"
|
|
158
|
-
"
|
|
159
|
-
"
|
|
212
|
+
"gate_id": gate["gate_id"],
|
|
213
|
+
"gate_name": gate_name,
|
|
214
|
+
"gate_version": gate_version,
|
|
215
|
+
"gate_decision": gate_decision,
|
|
216
|
+
"gate_reasons": reasons,
|
|
217
|
+
"coverage": coverage,
|
|
218
|
+
"policy_ref": policy_ref,
|
|
219
|
+
"policy_hash": gate["policy_hash"],
|
|
220
|
+
"input_hash": gate["input_hash"],
|
|
221
|
+
"evaluated_at": os.environ["TIMESTAMP"],
|
|
222
|
+
"runner": os.environ["RUNNER"],
|
|
223
|
+
"commit_sha": os.environ["COMMIT_SHA"],
|
|
160
224
|
}
|
|
161
225
|
|
|
162
|
-
# Carry forward optional fields
|
|
163
|
-
for opt in ("metadata", "failure_mode", "advisory_severity"
|
|
164
|
-
|
|
226
|
+
# Carry forward optional canonical fields only (schema forbids unknown keys).
|
|
227
|
+
for opt in ("metadata", "failure_mode", "advisory_severity", "cost_record_ref",
|
|
228
|
+
"replay_fidelity_level", "coverage_detail"):
|
|
229
|
+
if gate.get(opt) is not None:
|
|
165
230
|
predicate[opt] = gate[opt]
|
|
166
231
|
|
|
167
232
|
# Subject naming: subject.name MUST equal predicate.gate_id (SPEC § 6 R8)
|