npm - agent-gov-core - Versions diffs - 0.7.0 → 0.7.1 - Mend

agent-gov-core 0.7.0 → 0.7.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (7) hide show

package/CHANGELOG.md CHANGED Viewed

@@ -2,6 +2,36 @@
 All notable changes to this project will be documented here. The format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). Under v1.0, minor versions may include breaking changes — see [CONTRIBUTING.md](./CONTRIBUTING.md#backwards-compatibility) for the rules.
+## [0.7.1] — 2026-05-22
+Contract-hardening patch. Two external inspection rounds (Gemini + Cody) surfaced five P0/P1 contract bugs in shipped code and one packaging fix. All addressed here. No new features; no new public exports.
+### Fixed (correctness)
+- **`applyExceptions` is now order-independent.** Previously the FIRST matching rule won — a stale expired rule listed before a broader active rule would incorrectly surface the finding as expired instead of being suppressed by the active rule. Now ALL matching rules are collected; the finding is suppressed when any matching rule is active, and only re-surfaces (with downgrade) when every matching rule has expired.
+- **`mergeFindings` rejects tool/finding mismatches.** Previously a forged `scope_trail` report containing `policy_mesh.*` findings would merge silently with wrong provenance. `validateReport` already rejected this; the merge path was more permissive. Now mismatches land in `invalidFindings[]` while the rest of the report still passes through.
+- **`normalizeMcpCommand` no longer collides on whitespace/delimiter args.** `['a b']` and `['a', 'b']` previously produced the same canonical `args=a b` because of space-joining. Same shape for env: `{A:'1|B=2'}` and `{A:'1', B:'2'}` collided under pipe-joining. Both now use JSON encoding so distinct inputs produce distinct canonicals. **This changes the canonical-string format**; PolicyMesh's `mcp_command_mismatch` will now correctly detect previously-conflated MCP configs.
+- **`createReport` clamps a downward-rating override upward to the implied max.** Previously `createReport({rating: 'low', findings: [critical-finding]})` returned a report that `validateReport` would then reject — the constructor and validator disagreed. Now createReport's output always round-trips through validateReport. Upward overrides (rating > implied) are still honored.
+- **`applyExceptions` pathPrefix now normalizes Windows backslashes and requires segment boundaries.** A finding with `src\app.ts` (Windows) now matches a `src/` prefix; a prefix `src/app` no longer over-suppresses `src/application.ts` (the match must land on a `/` boundary or be the exact path).
+### Changed (visible to consumers)
+- **MCP canonical-string format**: `args` and `env` now serialize as JSON. Existing PolicyMesh test fixtures may need updates if they pin the exact canonical (most don't — they pin server-identity-equivalence). Golden tests in `test/golden.test.mjs` updated to the new format.
+### Packaging
+- `docs/` directory is now included in the npm tarball. The README's link to `docs/INTEROP-OTEL.md` no longer 404s on the npm landing page.
+### Cleanup
+- `candidateTool` in `merge.ts` now delegates to `isToolKind` from `finding.ts` instead of carrying a hardcoded tool-list regex. Removes the fourth lockstep duplication of the ToolKind enum.
+### Tests
+- 230 total, up from 220. 10 new regression cases: order-independent exception application, all-expired downgrade chain, Windows-backslash path normalization, segment-aware prefix boundary, mergeFindings tool-mismatch rejection, MCP args whitespace collision, MCP env delimiter collision, MCP env order-independence under JSON encoding, createReport rating clamp, createReport round-trip-validates contract.
+### Skipped vs Cursor inspection
+- Gemini #3 (secret-pattern boundary anchors): proposed fix didn't actually fix the example given (`my-transaction-id-AIza<35>` has `-` as boundary character, so a boundary anchor still allows the match). Held for further design.
+- Gemini #4 (hex token vs `GITHUB_SHA`): operationally rare given current consumer scanning paths; document-only follow-up.
+- Cursor's README/package.json description / CONTRIBUTING module list refresh: pending follow-up doc PR.
 ## [0.7.0] — 2026-05-22
 **The pre-v1.0 consolidation release.** Bundles everything that was queued for v0.6.0 (report envelope + merge layer + OTel GenAI interop) plus two universal detectors promoted from consumer repos: `matchSecret` (from PolicyMesh) and `applyExceptions` (unifying PolicyMesh's `subject` and TaskBound's `allow_paths` shapes).

package/dist/exceptions.js CHANGED Viewed

@@ -41,35 +41,59 @@ export function applyExceptions(findings, exceptions, now = new Date()) {
     let suppressed = 0;
     let expired = 0;
     for (const finding of findings) {
-        const match = findMatchingException(finding, exceptions);
-        if (!match) {
+        // Collect ALL matching rules — order independence is required by contract.
+        // A finding is suppressed when any matching rule is active; only when
+        // every matching rule has expired does the finding re-surface as expired.
+        // Previously the first match won, so a stale rule listed before an
+        // active broader rule incorrectly surfaced expired alerts.
+        const matches = findAllMatchingExceptions(finding, exceptions);
+        if (matches.length === 0) {
             result.push(finding);
             continue;
         }
-        if (match.expires && isExpired(match.expires, now)) {
-            result.push(downgradeExpired(finding, match));
-            expired++;
-        }
-        else {
+        const activeMatch = matches.find((m) => !m.expires || !isExpired(m.expires, now));
+        if (activeMatch) {
             suppressed++;
+            continue;
         }
+        // Every matching rule has expired. Use the first match for reason text.
+        result.push(downgradeExpired(finding, matches[0]));
+        expired++;
     }
     return { findings: result, suppressed, expired };
 }
-function findMatchingException(finding, exceptions) {
+function findAllMatchingExceptions(finding, exceptions) {
+    const out = [];
     for (const exc of exceptions) {
         if (exc.kind !== finding.kind)
             continue;
         if (exc.salientKey !== undefined && exc.salientKey !== finding.salientKey)
             continue;
-        if (exc.pathPrefix !== undefined) {
-            const file = finding.location?.file;
-            if (!file || !file.startsWith(exc.pathPrefix))
-                continue;
-        }
-        return exc;
+        if (exc.pathPrefix !== undefined && !pathPrefixMatches(finding.location?.file, exc.pathPrefix))
+            continue;
+        out.push(exc);
     }
-    return undefined;
+    return out;
+}
+/**
+ * Segment-aware path-prefix match. Normalizes Windows backslashes to forward
+ * slashes on BOTH sides so a finding's `src\app.ts` matches a `src/` prefix.
+ * Requires the prefix match to land on a segment boundary OR be the exact
+ * full path — so prefix `src/app` does NOT match `src/application.ts`.
+ */
+function pathPrefixMatches(file, prefix) {
+    if (!file)
+        return false;
+    const fileNorm = file.replace(/\\/g, '/');
+    const prefixNorm = prefix.replace(/\\/g, '/');
+    if (!fileNorm.startsWith(prefixNorm))
+        return false;
+    // Exact match, prefix ends with `/`, or next char is `/` — all valid boundaries.
+    if (fileNorm.length === prefixNorm.length)
+        return true;
+    if (prefixNorm.endsWith('/'))
+        return true;
+    return fileNorm[prefixNorm.length] === '/';
 }
 function isExpired(expires, now) {
     const parsed = new Date(expires);

package/dist/mcp.js CHANGED Viewed

@@ -29,15 +29,20 @@ export function normalizeMcpCommand(spec) {
         parts.push(`cmd=${normalizeExecutable(spec.command)}`);
     }
     const args = spec.args ?? [];
-    parts.push(`args=${canonicalizeArgs(args).join(' ')}`);
+    // JSON-encode the canonicalized args so a token containing whitespace
+    // (`['a b']`) doesn't collide with two tokens (`['a', 'b']`). Both would
+    // previously serialize to `args=a b` and PolicyMesh would treat genuinely
+    // different MCP commands as the same server.
+    parts.push(`args=${JSON.stringify(canonicalizeArgs(args))}`);
     if (spec.cwd) {
         parts.push(`cwd=${normalizePath(spec.cwd)}`);
     }
     if (spec.env) {
-        const env = Object.entries(spec.env)
-            .map(([k, v]) => `${k}=${v}`)
-            .sort();
-        parts.push(`env=${env.join('|')}`);
+        // JSON-encode sorted (key, value) pairs so a value containing `|` or `=`
+        // (`{A: '1|B=2'}`) doesn't collide with multiple entries (`{A: '1', B: '2'}`).
+        // Sorted by key for order-independence.
+        const env = Object.entries(spec.env).sort(([a], [b]) => (a < b ? -1 : a > b ? 1 : 0));
+        parts.push(`env=${JSON.stringify(env)}`);
     }
     return parts.join('\n');
 }

package/dist/merge.js CHANGED Viewed

@@ -70,6 +70,22 @@ export function mergeFindings(reports, opts = {}) {
                 });
                 continue;
             }
+            // Cross-check: a finding's tool must match the envelope's tool. Otherwise
+            // the merge would attribute a foreign-tool finding to this report's
+            // source provenance, breaking the meta-reviewer's audit trail.
+            // validateReport enforces this strictly; the merge path was previously
+            // more permissive — which let a forged report through.
+            if (finding.tool !== report.tool) {
+                invalidFindings.push({
+                    reportIndex: i,
+                    findingIndex: j,
+                    tool: report.tool,
+                    errors: [
+                        `finding.tool '${finding.tool}' does not match report.tool '${report.tool}'`,
+                    ],
+                });
+                continue;
+            }
             if (rankSeverity(finding.severity) < thresholdRank) {
                 droppedBelowThreshold++;
                 continue;
@@ -121,9 +137,10 @@ function candidateTool(value) {
     if (value === null || typeof value !== 'object')
         return undefined;
     const t = value.tool;
-    return typeof t === 'string' && /^(scope_trail|policy_mesh|capability_echo|task_bound|session_trail)$/.test(t)
-        ? t
-        : undefined;
+    // Defer to isToolKind from finding.ts — the single source of truth for the
+    // ToolKind enum. Avoids a hardcoded regex drifting from the TS union, the
+    // schema, and TOOL_KINDS.
+    return isToolKind(t) ? t : undefined;
 }
 /**
  * Envelope-only structural check. Unlike `validateReport`, this does NOT

package/dist/report.js CHANGED Viewed

@@ -18,10 +18,22 @@ export const REPORT_SCHEMA_VERSION = '1.0';
  * });
  */
 export function createReport(spec) {
+    // Rating policy: caller-supplied rating is honored only when it's at or
+    // above the implied max severity. Otherwise it's clamped upward to the
+    // implied max so createReport never returns a report that validateReport
+    // would reject. Upward overrides (rating > implied) are still allowed —
+    // a tool may legitimately escalate by policy.
+    const impliedRating = maxSeverity(spec.findings);
+    const supplied = spec.rating;
+    const rating = supplied === undefined
+        ? impliedRating
+        : severityRank(supplied) >= severityRank(impliedRating)
+            ? supplied
+            : impliedRating;
     const report = {
         schemaVersion: REPORT_SCHEMA_VERSION,
         tool: spec.tool,
-        rating: spec.rating ?? maxSeverity(spec.findings),
+        rating,
         findings: spec.findings,
     };
     if (spec.toolVersion !== undefined)

package/docs/INTEROP-OTEL.md ADDED Viewed

@@ -0,0 +1,64 @@
+# Interop: OpenTelemetry GenAI Semantic Conventions
+`agent-gov-core` and the [OpenTelemetry GenAI semantic conventions](https://opentelemetry.io/docs/specs/semconv/gen-ai/) solve adjacent problems:
+| | OpenTelemetry GenAI | agent-gov-core |
+|---|---|---|
+| **Domain** | Runtime trace observability | Static-analysis governance findings |
+| **Question answered** | What did the agent do? | What's wrong with what the agent did (or might do)? |
+| **Unit of work** | Span / trace | Finding / Report |
+| **Lifetime** | Real-time, ephemeral | Persisted, reviewable in PR |
+| **Audience** | SREs, on-call engineers | Code reviewers, security teams |
+They're complementary. A team running OTel-instrumented agents can pair runtime traces with governance findings against the same conversation, then correlate by ID across the two systems.
+## Recommended cross-walk
+| OpenTelemetry `gen_ai.*` attribute | agent-gov-core field | Notes |
+|---|---|---|
+| `gen_ai.conversation.id` | `Report.conversationId` | Same string — pass through directly. v0.6.0 added `conversationId` as an optional `Report` field for this purpose. |
+| `gen_ai.agent.name` | `Report.tool` | Loose match — OTel's "agent name" is whatever the application calls it. Our `tool` is one of five governance tools. If a consumer emits both, the OTel agent name is the *subject*, our tool is the *reviewer*. |
+| `gen_ai.workflow.name` | `MergedReport` (no field today) | When `mergeFindings` rolls up N tool reports for one PR/conversation, that's structurally a workflow. We don't carry a workflow name field yet — a future `MergedReport.workflowName` could match. |
+| `gen_ai.operation.name` | n/a | OTel has `create_agent`, `invoke_agent`, `invoke_workflow`. We're not a tracer; we don't emit operation spans. |
+| `error.type` | `ConfigParseError.name` / Finding `data.errorType` | OTel's `error.type` is stable across all of OTel and stays the right field name for any error class identifier we surface to observability consumers. |
+| `gen_ai.tool.definitions` | The data ScopeTrail / PolicyMesh *parse from* `.mcp.json` etc. | We extract this; OTel emits it as a span attribute. Same content, different transport. |
+| `gen_ai.usage.*tokens` | n/a | Runtime telemetry, not governance. |
+| `gen_ai.input.messages` / `gen_ai.output.messages` | n/a | Runtime telemetry. SessionTrail reviews *transcripts*, not active message streams. |
+## Why we don't adopt the OTel namespace ourselves
+1. **Different shape.** OTel attributes are flat key-value pairs on a span. Our `Finding` is a structured object with severity, location, and a namespaced `kind`. Forcing one onto the other loses information either way.
+2. **Different stability lifecycle.** OTel GenAI attributes are marked `Development` (their pre-stable tier) and may still churn. Our schema needs to freeze at v1.0 with explicit semver guarantees for consumer tools.
+3. **Different validation contract.** OTel attributes are "best effort, observability tools must tolerate missing fields." Our schema is strict (`additionalProperties: false`) because consumer detectors depend on field presence.
+`Report.conversationId` is the one bridge field — same string on both sides, no transform, opt-in.
+## How to bridge in practice
+```ts
+// In a consumer tool that also emits OTel traces:
+import { trace } from '@opentelemetry/api';
+import { createReport, mergeFindings } from 'agent-gov-core';
+const span = trace.getActiveSpan();
+const conversationId = span?.spanContext().traceState?.get('conversation.id');
+const report = createReport({
+  tool: 'scope_trail',
+  conversationId,           // ← OTel's gen_ai.conversation.id, same value
+  findings: collectedFindings,
+});
+```
+Now an observability backend correlating by `conversation.id` can pull both the OTel traces (what the agent did) and the governance report (what was risky about it) for the same agent session.
+## Future considerations
+- **`MergedReport.workflowName`** — would map to `gen_ai.workflow.name`. Useful when the meta-reviewer is invoked across multiple tool runs that share a workflow context (e.g. a multi-PR review).
+- **OTel span emission from the meta-reviewer** — `mergeFindings` could optionally emit a span with `gen_ai.operation.name = "review_workflow"` and findings as span events. Held for v1.x — current `mergeFindings` deliberately has no observability dependencies.
+## References
+- [OpenTelemetry GenAI semantic conventions](https://opentelemetry.io/docs/specs/semconv/gen-ai/)
+- [Agent spans specifically](https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-agent-spans/)
+- [`error.type` general convention](https://opentelemetry.io/docs/specs/semconv/attributes-registry/error/)

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "agent-gov-core",
-  "version": "0.7.0",
+  "version": "0.7.1",
   "description": "Shared primitives for the AI-agent governance suite: Finding schema, JSONC/TOML readers, line locators, MCP command normalization, shell tokenization, and GitHub Action helpers.",
   "type": "module",
   "main": "./dist/index.js",
@@ -21,6 +21,7 @@
     "dist/**/*.js",
     "dist/**/*.d.ts",
     "schemas",
+    "docs",
     "LICENSE",
     "README.md",
     "CHANGELOG.md"