npm - watchmyagents - Versions diffs - 1.0.0 → 1.0.2 - Mend

watchmyagents 1.0.0 → 1.0.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (10) hide show

package/README.md +6 -6
package/SECURITY.md +15 -5
package/package.json +4 -4
package/scripts/fetch-anthropic.js +25 -7
package/scripts/{anonymize.js → signals.js} +16 -10
package/scripts/upload-fortress.js +8 -3
package/src/anonymizer.js +78 -8
package/src/logger.js +7 -0
package/src/sources/anthropic-managed.js +135 -10
package/src/sources/contract.js +13 -0

package/README.md CHANGED Viewed

@@ -122,18 +122,18 @@ wma-fetch (--agent-id <agent_id> | --all-agents) [--session-id <sess_id>] [--sin
 | `--interval 5m` | Poll interval in watch mode (default `5m`; accepts `30s`/`1h`/…) |
 | `--upload` | In watch mode, anonymize each new window and ship signals to Fortress (needs `WMA_API_KEY` + `WMA_FORTRESS_BASE_URL` + `WMA_SIGNALS_SALT`). Raw stays local. |
 | `--discovery-since 7d` | Window for discovering NEW sessions (default `7d`). Sessions already being tracked are re-fetched regardless of age, so long-running ones never drop out. |
-| `--send-agent-names` | Opt-in: send the human agent name as the Fortress `display_name`. Default sends the agent id only (the name may contain client/project info). |
+| `--no-send-agent-names` | Opt-out: send only the agent id as the Fortress `display_name`. **By default, the human agent name** (sanitized) is sent so dashboards/decisions stay legible. Pass this flag if your agent names themselves carry client/project info you'd rather keep pseudonymized. |
 | `--api-key sk-ant-…` | Override the `ANTHROPIC_API_KEY` env var. **Discouraged** — visible in shell history & process list. Prefer the env var. |
 Logs land in `./watchmyagents-logs/<agent_id>/<date>.ndjson` (file mode `0600`, dir `0700`).
-### `wma-anonymize` — preview what would leave your machine
+### `wma-signals` — preview what would leave your machine
 Produces the anonymized signals payload (counts, latencies, salted IoC hashes, sequence histograms — no raw URLs/commands/prompts) that future WMA cloud features would ship. Useful to verify Containment compliance and to test the format.
 ```bash
 export WMA_SIGNALS_SALT="$(node -e 'console.log(require("crypto").randomBytes(16).toString("hex"))')"
-wma-anonymize ./watchmyagents-logs
+wma-signals ./watchmyagents-logs
 # → JSON on stdout. Add --out signals.json to write to file.
 ```
@@ -146,7 +146,7 @@ Anonymizes your local NDJSON and POSTs the resulting payload to the WMA Fortress
 ```bash
 export WMA_API_KEY="wma_..."                    # from Fortress dashboard → Settings → API Keys
 export WMA_FORTRESS_URL="https://<your-project>.supabase.co/functions/v1/ingest-signals"
-export WMA_SIGNALS_SALT="..."                   # same salt as wma-anonymize
+export WMA_SIGNALS_SALT="..."                   # same salt as wma-signals
 wma-upload-fortress --agent-id agent_01ABC... [--display-name "My agent"]
 # → POSTs the anonymized payload. Server returns signal_id + agent_id.
@@ -155,7 +155,7 @@ wma-upload-fortress --agent-id agent_01ABC... [--display-name "My agent"]
 wma-upload-fortress --agent-id agent_xxx --dry-run
 ```
-**What is sent:** the anonymized signals payload (counts, latencies, salted IoC hashes, sequences — same as `wma-anonymize` output), the agent's **`classification`** when the daemon has it (`{agent_type, confidence, stage}` — anonymized metadata, never raw content), **plus the routing identifiers**: `provider` (e.g., `"anthropic-managed"` — added in v1.0 for the multi-framework SDK), `native_agent_id` (the canonical provider-agnostic field), `anthropic_agent_id` (kept for backwards compat with existing Fortress instances; will be dropped once Fortress migrates), `parent_agent_id` (`null` for root agents — populated for sub-agents detected via OpenAI Agents handoffs, CrewAI manager mode, Hermes Agent `spawn_subagent`, LangGraph sub-graphs), `composition_pattern` (`"solo" | "hierarchy" | "graph" | "peer"` — defaults to `"solo"` for Anthropic until thread-message detection lands), `enforcement_mode` (`"sync_confirm" | "sync_interrupt" | "detect_only"` — the strongest enforcement capability the Source provides; Fortress greys out Shield UI for `detect_only` agents to prevent UI/runtime mismatch), and a `display_name`. The agent id is required so Fortress can associate signals with the right agent; `display_name` defaults to the **human-readable agent name** (sanitized to strip control chars) for UX in the dashboard — pass `--no-send-agent-names` to keep it pseudonymized (sends the agent id instead) if your agent names themselves carry sensitive client/project info.
+**What is sent:** the anonymized signals payload (counts, latencies, salted IoC hashes, sequences — same as `wma-signals` output), the agent's **`classification`** when the daemon has it (`{agent_type, confidence, stage}` — anonymized metadata, never raw content), **plus the routing identifiers**: `provider` (e.g., `"anthropic-managed"` — added in v1.0 for the multi-framework SDK), `native_agent_id` (the canonical provider-agnostic field), `anthropic_agent_id` (kept for backwards compat with existing Fortress instances; will be dropped once Fortress migrates), `parent_agent_id` (`null` for root agents — populated for sub-agents detected via OpenAI Agents handoffs, CrewAI manager mode, Hermes Agent `spawn_subagent`, LangGraph sub-graphs), `composition_pattern` (`"solo" | "hierarchy" | "graph" | "peer"` — defaults to `"solo"` for Anthropic until thread-message detection lands), `enforcement_mode` (`"sync_confirm" | "sync_interrupt" | "detect_only"` — the strongest enforcement capability the Source provides; Fortress greys out Shield UI for `detect_only` agents to prevent UI/runtime mismatch), and a `display_name`. The agent id is required so Fortress can associate signals with the right agent; `display_name` defaults to the **human-readable agent name** (sanitized to strip control chars) for UX in the dashboard — pass `--no-send-agent-names` to keep it pseudonymized (sends the agent id instead) if your agent names themselves carry sensitive client/project info.
 **What is NOT sent:** raw prompts, raw URLs/commands/queries, raw agent responses, raw error messages. All payload content stays on your machine.
 The endpoint auto-registers the agent on the first upload if it doesn't exist in Fortress yet — no manual onboarding needed for new agents.
@@ -247,7 +247,7 @@ WatchMyAgents is built so that **your prompts and outputs never have to leave yo
 |---|---|
 | **Your machine** (`./watchmyagents-logs/`) | Full NDJSON with all prompts, tool inputs, agent outputs. `chmod 600` on every file. |
 | **Anthropic API** | Where the agent runs. WMA pulls events via the public REST API only. |
-| **WMA Fortress** (opt-in, only with `--upload` / `wma-upload-fortress` / `wma-shield --policies-source fortress`) | The **anonymized signals** payload (counts, timings, salted hashes, sequences) + routing identifiers: `provider` (e.g. `"anthropic-managed"`), `native_agent_id`, `anthropic_agent_id` (legacy alias), and `display_name` (defaults to the agent id; the human agent name only with `--send-agent-names`). Shield enforcement **decisions** (hashed session/event/input fingerprints — never raw values). **Never** raw prompts, URLs, commands, or outputs. |
+| **WMA Fortress** (opt-in, only with `--upload` / `wma-upload-fortress` / `wma-shield --policies-source fortress`) | The **anonymized signals** payload (counts, timings, salted hashes, sequences) + routing identifiers: `provider` (e.g. `"anthropic-managed"`), `native_agent_id`, `anthropic_agent_id` (legacy alias), and `display_name` (defaults to the **human agent name** for dashboard UX — pass `--no-send-agent-names` to opt out and send only the agent id). Shield enforcement **decisions** (hashed session/event/input fingerprints — never raw values). **Never** raw prompts, URLs, commands, or outputs. |
 This is the "local-first" guarantee: **raw payloads never leave your machine.** Cloud upload is opt-in and carries only anonymized metadata + the agent id/name needed to route it.

package/SECURITY.md CHANGED Viewed

@@ -30,10 +30,21 @@ WMA needs your Anthropic API key to call the Managed Agents REST API on your beh
 ### What WMA does NOT do
-- ❌ Does not phone home, telemetry, analytics, or usage reporting
-- ❌ Does not send any data to WMA-controlled servers
-- ❌ Does not store, log, or transmit your Anthropic API key anywhere except `api.anthropic.com`
-- ❌ Does not require an account, signup, or license key
+- ❌ No phone-home, no usage analytics, no silent telemetry — WMA never opens a network connection to a WMA-controlled endpoint on its own.
+- ❌ Does not store, log, or transmit your Anthropic API key anywhere except `api.anthropic.com`.
+- ❌ Does not require an account, signup, or license key.
+### Fortress upload — strictly opt-in
+Since v0.5.0, WMA supports an **opt-in** cloud component (WMA Fortress) for teams who want a multi-agent dashboard + cross-fleet Guardian analysis. The upload only happens when you explicitly invoke `--upload` on `wma-fetch`, run `wma-upload-fortress`, or run `wma-shield --policies-source fortress`. The defaults across all CLIs are zero-cloud — your machine stays the only place raw data ever exists.
+What goes to Fortress when you opt in:
+- ✅ The **anonymized signals payload** (counts, latencies, salted IoC hashes, sequences, classification metadata) — see [`docs/CONTAINMENT.md`](docs/CONTAINMENT.md) for the bit-exact contract and the 6 invariant tests that lock it down.
+- ✅ Routing identifiers (`provider`, `native_agent_id`, optionally the human `display_name` — see `--no-send-agent-names` to opt this out).
+What does **NOT** go to Fortress, ever:
+- ❌ Raw prompts, agent outputs, tool inputs, tool outputs, error message text, raw URLs, raw commands, raw queries — these stay in your local `watchmyagents-logs/`.
+- ❌ Your Anthropic API key. Fortress authenticates with a separate `WMA_API_KEY` issued from your Fortress account and never sees `ANTHROPIC_API_KEY`.
 ## Threat model
@@ -56,7 +67,6 @@ WMA combines **two complementary layers**:
 - **Pre-installation activity.** Shield only enforces from the moment it attaches forward. Past events are not retroactively replayed or re-evaluated.
 - **A malicious policy file.** Shield's policy engine refuses obviously unsafe regex patterns (e.g. catastrophic backtracking) and truncates inputs before regex tests to mitigate ReDoS. But a user-controlled policy file remains a code-adjacent input — treat it as you would treat sourcecode.
 - **A compromised Anthropic API.** WMA trusts the events delivered by Anthropic. This is out of scope.
-- **A compromised Anthropic API.** WMA trusts the events delivered by Anthropic. This is out of scope.
 ## Supply chain

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "watchmyagents",
-  "version": "1.0.0",
+  "version": "1.0.2",
   "description": "Security observability + real-time policy enforcement for AI agents. Local-first NDJSON capture with a continuous Watch daemon that auto-uploads anonymized signals, Shield CLI that blocks policy violations live (with policies pulled from Fortress cloud), anonymizer producing signals-only payloads, bidirectional sync with WatchMyAgents Fortress, and one-command install as an always-on launchd/systemd service — closing the recursive Watch→Guardian→Shield security loop.",
   "type": "module",
   "files": [
@@ -8,7 +8,7 @@
     "scripts/inspect.js",
     "scripts/fetch-anthropic.js",
     "scripts/shield.js",
-    "scripts/anonymize.js",
+    "scripts/signals.js",
     "scripts/upload-fortress.js",
     "scripts/service.js",
     "scripts/agents.js",
@@ -20,7 +20,7 @@
     "wma-inspect": "scripts/inspect.js",
     "wma-fetch": "scripts/fetch-anthropic.js",
     "wma-shield": "scripts/shield.js",
-    "wma-anonymize": "scripts/anonymize.js",
+    "wma-signals": "scripts/signals.js",
     "wma-upload-fortress": "scripts/upload-fortress.js",
     "wma-service": "scripts/service.js",
     "wma-agents": "scripts/agents.js"
@@ -30,7 +30,7 @@
     "inspect": "node scripts/inspect.js",
     "fetch": "node scripts/fetch-anthropic.js",
     "shield": "node scripts/shield.js",
-    "anonymize": "node scripts/anonymize.js",
+    "signals": "node scripts/signals.js",
     "upload-fortress": "node scripts/upload-fortress.js",
     "service": "node scripts/service.js",
     "agents": "node scripts/agents.js"

package/scripts/fetch-anthropic.js CHANGED Viewed

@@ -34,7 +34,7 @@ import { classifyAgentType } from '../src/typology.js';
 import { aggregate, buildFeatures } from '../src/typology-features.js';
 import {
   getAgent, listAgents, listSessions, fetchSessionEntries, fetchRawEvents,
-  AnthropicManagedSource,
+  AnthropicManagedSource, effectiveEnforcementMode,
 } from '../src/sources/anthropic-managed.js';
 function parseArgs(argv) {
@@ -111,7 +111,7 @@ function postJson(url, headers, body) {
 // `classification` (optional) carries the agent's typology — Fortress upserts
 // agent_type/confidence/stage on the agent row so the typology badge + the
 // apply-template flow fill themselves with no manual click.
-async function uploadSignals(uploadCtx, agentId, displayName, entries, classification) {
+async function uploadSignals(uploadCtx, agentId, displayName, entries, classification, enforcementMode) {
   const agg = new SignalsAggregator({ salt: uploadCtx.salt });
   for (const e of entries) agg.add(e);
   const sig = agg.finalize();
@@ -132,16 +132,18 @@ async function uploadSignals(uploadCtx, agentId, displayName, entries, classific
   // so old Fortress instances still recognize the upload. Once the
   // Lovable-deployed ingest-signals migrates, future SDK releases will
   // stop emitting `anthropic_agent_id`.
-  // PR-D: enforcement_mode is read CANONICALLY from the Source's static
-  // declaration so it stays in sync with the actual capability of the
-  // adapter — never re-declared inline.
+  // PR-D / v1.0.1 F-2: enforcement_mode is the EFFECTIVE per-agent mode
+  // (sync_confirm only if the agent has permission_policy: always_ask on
+  // at least one tool; sync_interrupt otherwise). Falls back to the
+  // Source's static MAX capability if the resolution failed upstream —
+  // legacy behavior, but flags a warning in the daemon log.
   const body = JSON.stringify({
     provider: AnthropicManagedSource.providerName,
     native_agent_id: agentId,
     anthropic_agent_id: agentId,
     parent_agent_id,
     composition_pattern,
-    enforcement_mode: AnthropicManagedSource.enforcementMode,
+    enforcement_mode: enforcementMode || AnthropicManagedSource.enforcementMode,
     display_name: displayName,
     window_start: sig.window_start,
     window_end: sig.window_end,
@@ -250,6 +252,10 @@ async function runWatch({ apiKey, resolveAgents, fleet, logDir, intervalMs, wind
   const sessionAgent = new Map();// sessionId → { agentId, model, displayName }
   const priors = new Map();      // agentId → previous classification (threads the
                                   // typology state machine across upload cycles)
+  // F-2: cache the effective enforcement mode per agent. One getAgent call
+  // per agent per daemon run (until the entry is evicted). Refreshed only
+  // if upload fails — agent permission_policy doesn't change mid-flight.
+  const enforcementModes = new Map(); // agentId → 'sync_confirm' | 'sync_interrupt'
   const ac = new AbortController();
   const shutdown = () => { info('shutting down…'); ac.abort(); };
@@ -319,7 +325,19 @@ async function runWatch({ apiKey, resolveAgents, fleet, logDir, intervalMs, wind
             classification = { agent_type: cls.classified_type, confidence: cls.confidence, stage: cls.stage };
           } catch (e) { warn(`  classification skipped: ${e.message}`); }
-          const resp = await uploadSignals(uploadCtx, ag.agentId, sendNames ? ag.displayName : ag.agentId, fresh, classification);
+          // F-2: resolve the effective enforcement mode for this agent
+          // (cached across cycles). On failure, fall back to the static
+          // provider max so the upload still succeeds.
+          let mode = enforcementModes.get(ag.agentId);
+          if (!mode) {
+            try {
+              mode = await effectiveEnforcementMode(apiKey, ag.agentId);
+              enforcementModes.set(ag.agentId, mode);
+            } catch (e) {
+              warn(`  enforcement_mode resolution failed for ${ag.agentId}: ${e.message} (falling back to provider max)`);
+            }
+          }
+          const resp = await uploadSignals(uploadCtx, ag.agentId, sendNames ? ag.displayName : ag.agentId, fresh, classification, mode);
           if (resp?.signal_id) {
             const cTag = classification ? ` · type ${classification.agent_type} (${Math.round(classification.confidence * 100)}%, ${classification.stage})` : '';
             info(`  ↑ signals uploaded (signal_id ${resp.signal_id})${cTag}`);

package/scripts/{anonymize.js → signals.js} RENAMED Viewed

@@ -1,9 +1,15 @@
 #!/usr/bin/env node
-// wma-anonymize — produce the anonymized signals payload that Watch would
-// send to Fortress, for inspection / verification.
+// wma-signals — build the signals payload that Watch would send to
+// Fortress, for inspection / verification.
+//
+// (Renamed from `wma-anonymize` in v1.0.1. The script's job is to PRODUCE
+//  the signals payload; anonymization is a property of that payload,
+//  guaranteed by the underlying SignalsAggregator. The new name aligns
+//  with the rest of the product vocabulary: SignalsAggregator,
+//  ingest-signals Edge Function, signals.payload shape.)
 //
 // Usage:
-//   wma-anonymize <path-to-ndjson-or-dir> [--salt <hex>] [--out <file>]
+//   wma-signals <path-to-ndjson-or-dir> [--salt <hex>] [--out <file>]
 //
 // The `--salt` argument MUST be a stable per-customer secret. Using a
 // random salt each run means hashes won't correlate across runs (useless
@@ -56,11 +62,11 @@ async function main() {
   const args = parseArgs(process.argv.slice(2));
   if (!args._target) {
-    die(`usage: wma-anonymize <path> [--salt <hex>] [--out <file>]
+    die(`usage: wma-signals <path> [--salt <hex>] [--out <file>]
-Reads Watch NDJSON logs and produces the anonymized signals payload
-that would be sent to Fortress. Use this to inspect exactly what
-leaves your machine BEFORE any upload feature is enabled.
+Builds the signals payload that Watch would send to Fortress, from
+local NDJSON logs. Use this to inspect exactly what leaves your
+machine BEFORE any upload feature is enabled.
 Required: --salt <hex> or WMA_SIGNALS_SALT env var (per-customer secret).
 If you don't have one, generate: node -e "console.log(require('crypto').randomBytes(16).toString('hex'))"
@@ -73,8 +79,8 @@ and save it in .env.local.`);
         '       generate one with: node -e "console.log(require(\'crypto\').randomBytes(16).toString(\'hex\'))"');
   }
   if (args.salt) {
-    process.stderr.write('[wma-anonymize] warning: --salt on the command line is visible in shell history.\n' +
-                         '                Prefer: export WMA_SIGNALS_SALT=...\n');
+    process.stderr.write('[wma-signals] warning: --salt on the command line is visible in shell history.\n' +
+                         '              Prefer: export WMA_SIGNALS_SALT=...\n');
   }
   if (salt.length < 16) {
     die('error: salt too short (need ≥16 hex chars / ≥8 bytes of entropy)');
@@ -102,7 +108,7 @@ and save it in .env.local.`);
   const json = JSON.stringify(signals, null, 2);
   if (args.out) {
     await writeFile(resolve(args.out), json + '\n', { encoding: 'utf8', mode: 0o600 });
-    process.stderr.write(`[wma-anonymize] wrote ${args.out} (${signals._meta.entries_processed} entries processed)\n`);
+    process.stderr.write(`[wma-signals] wrote ${args.out} (${signals._meta.entries_processed} entries processed)\n`);
   } else {
     process.stdout.write(json + '\n');
   }

package/scripts/upload-fortress.js CHANGED Viewed

@@ -4,7 +4,7 @@
 //
 // Composable with the rest of the SDK:
 //   wma-fetch  →  ./watchmyagents-logs/<agent_id>/<date>.ndjson   (local capture)
-//   wma-anonymize  →  signals payload (Containment: no raw content)
+//   wma-signals  →  signals payload (Containment: no raw content)
 //   wma-upload-fortress  →  POST signals to https://<project>.supabase.co/functions/v1/ingest-signals
 //
 // Usage:
@@ -183,8 +183,13 @@ async function main() {
   // is a one-shot post-hoc tool — it has no per-entry context to derive
   // hierarchy from, so it sends defaults (solo / null) until a future
   // adapter writes those fields into the local NDJSON.
-  // PR-D: enforcement_mode read from the Source class so any change to
-  // the adapter's capability automatically reflects in the payload.
+  // PR-D / v1.0.1 F-2: enforcement_mode set to the provider's MAX
+  // capability (sync_confirm). The continuous Watch daemon
+  // (wma-fetch --watch --upload) resolves the EFFECTIVE per-agent mode
+  // via effectiveEnforcementMode(), but this one-shot uploader has no
+  // ANTHROPIC_API_KEY in scope so it cannot make the live getAgent
+  // call. Best-effort: send the max; the daemon's subsequent uploads
+  // will correct the value once it resolves.
   const body = {
     provider: AnthropicManagedSource.providerName,
     native_agent_id: agentId,

package/src/anonymizer.js CHANGED Viewed

@@ -37,6 +37,27 @@ const HASHABLE_INPUT_FIELDS = ['url', 'query', 'command', 'path', 'file_path'];
 // Tool types whose inputs we want to hash for IoC tracking
 const TOOL_ACTIONS = new Set(['tool_use', 'mcp_tool_use', 'custom_tool_use']);
+// Well-known vendor built-in tool names that are SAFE to keep in clear in the
+// signals payload. They are documented by the vendor, common across customers,
+// and the operator NEEDS them legible in the dashboard ("3 web_search calls
+// in 10 minutes" is the actionable signal). Anything not on this list is
+// considered customer-controlled (custom tool, MCP tool with a customer-chosen
+// name like "client_acme_export") and gets hashed before egress.
+//
+// To add a built-in: only confirmed-public-by-vendor names — never speculative
+// matches. When in doubt, hash.
+const WELL_KNOWN_TOOLS = new Set([
+  // Anthropic Managed Agents
+  'web_search', 'web_fetch', 'bash', 'code_execution',
+  'str_replace_editor', 'str_replace_based_edit_tool',
+  'computer', 'computer_use_20250124', 'computer_use_20241022',
+  'text_editor', 'text_editor_20250124', 'text_editor_20241022',
+  // OpenAI Agents / Responses
+  'web_search_preview', 'file_search', 'computer_use_preview', 'code_interpreter',
+  // Common framework primitives
+  'function', 'retrieval',
+]);
 // ── Hash helpers ─────────────────────────────────────────────────────────
 /**
@@ -56,6 +77,30 @@ export function generateSalt() {
   return randomBytes(16).toString('hex');
 }
+// ── Tool name normalization (Containment hardening, v1.0.1 F-3) ────────
+/**
+ * Return the canonical tool-name token that's safe to ship to Fortress.
+ *
+ * - For documented vendor built-ins (WELL_KNOWN_TOOLS) the name is kept
+ *   in clear so dashboards and Guardian policies can reason about the
+ *   tool by its public identifier.
+ * - For anything else (customer-defined functions, MCP tools whose name
+ *   is set by the customer's MCP server, e.g. "client_acme_export"),
+ *   the name is salted-SHA256-hashed with a `tool_hash:` prefix so it
+ *   cannot leak project/client identifiers.
+ *
+ * Empty / null tool names return null.
+ */
+export function normalizeToolName(toolName, salt) {
+  if (toolName == null) return null;
+  const s = String(toolName);
+  if (s.length === 0) return null;
+  if (WELL_KNOWN_TOOLS.has(s)) return s;
+  if (!salt) throw new Error('normalizeToolName requires a salt to hash custom tool names');
+  return 'tool_hash:' + createHash('sha256').update(salt).update(s).digest('hex').slice(0, 32);
+}
 // ── Single-entry extractor: what hashable IoCs are in this entry? ────────
 function extractIocs(entry, salt) {
@@ -89,6 +134,13 @@ export class SignalsAggregator {
     this.entryCount = 0;
     this._prevActionType = null;
     this._prevSessionId = null;
+    // v1.0.2 F-6b — opaque session ids active in this window. Shipped to
+    // Fortress in the payload as `session_ids[]` so an operator looking at
+    // a Shield decision in the dashboard can grep their LOCAL NDJSON by
+    // session_id immediately (forensics short-circuit). The Anthropic
+    // session_id is a non-semantic token like `sess_01XaNB…` — same
+    // sensitivity class as `agent_id`, which we already transmit.
+    this.seenSessions = new Set();              // unique session_ids
   }
   add(entry) {
@@ -102,6 +154,13 @@ export class SignalsAggregator {
       if (!this.windowEnd || ts > this.windowEnd) this.windowEnd = ts;
     }
+    // F-6b — collect every distinct session_id encountered in the window.
+    // Stays opaque (no string transformation), bounded by the natural
+    // number of sessions in the window.
+    if (typeof entry.session_id === 'string' && entry.session_id.length > 0) {
+      this.seenSessions.add(entry.session_id);
+    }
     // Counts
     const at = entry.action_type || 'unknown';
     this.counts[at] = (this.counts[at] || 0) + 1;
@@ -115,15 +174,21 @@ export class SignalsAggregator {
     this._prevActionType = at;
     this._prevSessionId = entry.session_id || null;
-    // Tools
+    // Tools — Containment (v1.0.1 F-3): well-known vendor built-ins keep
+    // their public name; customer-defined / MCP tool names get hashed so
+    // no client-identifying string ("client_acme_export") leaks via the
+    // tool_counts / tool_latencies / error_rate maps.
     if (entry.tool_name && TOOL_ACTIONS.has(at)) {
-      this.toolCounts[entry.tool_name] = (this.toolCounts[entry.tool_name] || 0) + 1;
-      if (entry.status === 'error') {
-        this.toolErrors[entry.tool_name] = (this.toolErrors[entry.tool_name] || 0) + 1;
-      }
-      if (typeof entry.duration_ms === 'number') {
-        if (!this.toolLatencies[entry.tool_name]) this.toolLatencies[entry.tool_name] = [];
-        this.toolLatencies[entry.tool_name].push(entry.duration_ms);
+      const toolKey = normalizeToolName(entry.tool_name, this.salt);
+      if (toolKey) {
+        this.toolCounts[toolKey] = (this.toolCounts[toolKey] || 0) + 1;
+        if (entry.status === 'error') {
+          this.toolErrors[toolKey] = (this.toolErrors[toolKey] || 0) + 1;
+        }
+        if (typeof entry.duration_ms === 'number') {
+          if (!this.toolLatencies[toolKey]) this.toolLatencies[toolKey] = [];
+          this.toolLatencies[toolKey].push(entry.duration_ms);
+        }
       }
       // Extract & hash IoCs from this tool's input
       for (const h of extractIocs(entry, this.salt)) this.iocHashes.add(h);
@@ -182,6 +247,11 @@ export class SignalsAggregator {
         sequences_top10: sequencesTop,
         stop_reasons: this.stopReasons,
         tokens_total: this.tokensTotal,
+        // F-6c — opaque session ids active in this window, sorted for
+        // determinism. Operator forensic chain:
+        //   Fortress decision → window_start/end + session_ids → grep
+        //   the local NDJSON of the affected agent → full raw context.
+        session_ids: [...this.seenSessions].sort(),
       },
       _meta: {
         entries_processed: this.entryCount,

package/src/logger.js CHANGED Viewed

@@ -13,6 +13,8 @@ import { assertSafePathSegment } from './validate.js';
 const EXPORT_FIELDS = [
   'id', 'agent_id', 'parent_agent_id', 'composition_pattern',
   'provider', 'timestamp', 'action_type',
+  // v1.0.2 F-6a — Anthropic-style sub-agent discriminators preserved locally
+  'session_thread_id', 'agent_name',
   'tool_name', 'duration_ms', 'tokens_used',
   'input_tokens', 'output_tokens', 'cache_read_tokens', 'cache_creation_tokens',
   'cost_usd', 'model',
@@ -60,6 +62,11 @@ export class Logger {
       // populates these on the event, and the Logger threads them through.
       parent_agent_id: e.parent_agent_id ?? null,
       composition_pattern: e.composition_pattern || 'solo',
+      // v1.0.2 F-6a: Anthropic-style discriminators preserved LOCAL ONLY
+      // (never sent raw to Fortress — SignalsAggregator derives the
+      // aggregated session_ids list from these at finalize time).
+      session_thread_id: e.session_thread_id ?? null,
+      agent_name: e.agent_name ?? null,
       provider: e.provider || e.framework || 'generic',
       timestamp: e.timestamp || new Date().toISOString(),
       action_type: e.action_type || 'tool_call',

package/src/sources/anthropic-managed.js CHANGED Viewed

@@ -17,7 +17,11 @@
 import { request } from 'node:https';
 import { URLSearchParams } from 'node:url';
-import { Source, PROVIDERS, ENFORCEMENT_MODES } from './contract.js';
+import { Source, PROVIDERS, ENFORCEMENT_MODES, ACTION_TYPES } from './contract.js';
+import {
+  getAgentConfig, detectAlwaysAsk,
+  confirmAllow, confirmDeny, interruptSession,
+} from '../shield/enforce.js';
 const API_HOST = 'api.anthropic.com';
 const BETA = 'managed-agents-2026-04-01';
@@ -181,6 +185,13 @@ export async function* fetchSessionEntries({ apiKey, agentId, sessionId, model }
     if (!RELEVANT.has(ev.type)) continue;
     const type = ev.type;
     const ts = ev.processed_at || ev.created_at || new Date().toISOString();
+    // v1.0.2 F-6a: capture Anthropic's own discriminators on EVERY event,
+    // not just thread_message_*. session_thread_id + agent_name are how
+    // the vendor itself tells parent activity from sub-agent activity.
+    // Preserved LOCALLY (NDJSON) only — never sent raw to Fortress.
+    const session_thread_id = ev.session_thread_id ?? null;
+    const agent_name = ev.agent_name ?? null;
+    const subAgentMeta = { session_thread_id, agent_name };
     const tsMillis = tsMs(ev);
     if (type === 'span.model_request_start') {
@@ -197,6 +208,7 @@ export async function* fetchSessionEntries({ apiKey, agentId, sessionId, model }
       const cw = u.cache_creation_input_tokens || 0;
       yield {
         ...base,
+        ...subAgentMeta,
         id: ev.id,
         action_type: 'llm_call',
         tool_name: null,
@@ -216,6 +228,7 @@ export async function* fetchSessionEntries({ apiKey, agentId, sessionId, model }
     if (type === 'user.message') {
       yield {
         ...base,
+        ...subAgentMeta,
         id: ev.id,
         action_type: 'user_message',
         tool_name: null,
@@ -230,6 +243,7 @@ export async function* fetchSessionEntries({ apiKey, agentId, sessionId, model }
     if (type === 'user.interrupt') {
       yield {
         ...base,
+        ...subAgentMeta,
         id: ev.id,
         action_type: 'user_interrupt',
         tool_name: null,
@@ -245,6 +259,7 @@ export async function* fetchSessionEntries({ apiKey, agentId, sessionId, model }
       const denied = ev.result === 'deny';
       yield {
         ...base,
+        ...subAgentMeta,
         id: ev.id,
         action_type: 'tool_confirmation',
         tool_name: null,
@@ -261,6 +276,7 @@ export async function* fetchSessionEntries({ apiKey, agentId, sessionId, model }
     if (type === 'user.custom_tool_result') {
       yield {
         ...base,
+        ...subAgentMeta,
         id: ev.id,
         action_type: 'custom_tool_result',
         tool_name: null,
@@ -276,6 +292,7 @@ export async function* fetchSessionEntries({ apiKey, agentId, sessionId, model }
     if (type === 'agent.message') {
       yield {
         ...base,
+        ...subAgentMeta,
         id: ev.id,
         action_type: 'message',
         tool_name: null,
@@ -290,6 +307,7 @@ export async function* fetchSessionEntries({ apiKey, agentId, sessionId, model }
     if (type === 'agent.thinking') {
       yield {
         ...base,
+        ...subAgentMeta,
         id: ev.id,
         action_type: 'thinking',
         tool_name: null,
@@ -317,6 +335,7 @@ export async function* fetchSessionEntries({ apiKey, agentId, sessionId, model }
       const isError = ev.is_error === true;
       yield {
         ...base,
+        ...subAgentMeta,
         id: ev.id,
         action_type: start?.isMcp ? 'mcp_tool_use' : 'tool_use',
         tool_name: start?.name || 'unknown',
@@ -333,6 +352,7 @@ export async function* fetchSessionEntries({ apiKey, agentId, sessionId, model }
     if (type === 'agent.custom_tool_use') {
       yield {
         ...base,
+        ...subAgentMeta,
         id: ev.id,
         action_type: 'custom_tool_use',
         tool_name: ev.name || 'unknown',
@@ -347,6 +367,7 @@ export async function* fetchSessionEntries({ apiKey, agentId, sessionId, model }
     if (type === 'agent.thread_context_compacted') {
       yield {
         ...base,
+        ...subAgentMeta,
         id: ev.id,
         action_type: 'context_compacted',
         tool_name: null,
@@ -366,6 +387,7 @@ export async function* fetchSessionEntries({ apiKey, agentId, sessionId, model }
       const direction = type.endsWith('_sent') ? 'sent' : 'received';
       yield {
         ...base,
+        ...subAgentMeta,
         id: ev.id,
         action_type: `thread_message_${direction}`,
         tool_name: null,
@@ -387,6 +409,7 @@ export async function* fetchSessionEntries({ apiKey, agentId, sessionId, model }
       const { id: _id, type: _type, processed_at: _pa, created_at: _ca, ...changes } = ev;
       yield {
         ...base,
+        ...subAgentMeta,
         id: ev.id,
         action_type: 'config_change',
         tool_name: null,
@@ -401,6 +424,7 @@ export async function* fetchSessionEntries({ apiKey, agentId, sessionId, model }
     if (type === 'session.thread_created') {
       yield {
         ...base,
+        ...subAgentMeta,
         id: ev.id,
         action_type: 'thread_created',
         tool_name: null,
@@ -418,6 +442,7 @@ export async function* fetchSessionEntries({ apiKey, agentId, sessionId, model }
     if (type === 'session.error') {
       yield {
         ...base,
+        ...subAgentMeta,
         id: ev.id,
         action_type: 'session_error',
         tool_name: null,
@@ -439,6 +464,7 @@ export async function* fetchSessionEntries({ apiKey, agentId, sessionId, model }
       const fatal = state === 'terminated';
       yield {
         ...base,
+        ...subAgentMeta,
         id: ev.id,
         action_type: 'state_transition',
         tool_name: null,
@@ -459,6 +485,27 @@ export async function* fetchSessionEntries({ apiKey, agentId, sessionId, model }
   }
 }
+// ────────────────────────────────────────────────────────────────────────
+// effectiveEnforcementMode — F-2 of the Codex v1.0.1 audit
+// ────────────────────────────────────────────────────────────────────────
+// AnthropicManagedSource.enforcementMode is the PROVIDER'S MAX capability
+// (sync_confirm). But the EFFECTIVE mode for a given agent depends on
+// whether at least one of its tools/toolsets has permission_policy =
+// always_ask. When none does, Shield can only interrupt AFTER a violating
+// tool ran, not block before — that's sync_interrupt territory.
+//
+// This helper resolves the per-agent effective mode from the live agent
+// config so the value shipped to Fortress matches what Shield can
+// actually do for THIS agent. Without this, Fortress can mis-display
+// "sync_confirm" UI on an agent that's only interrupt-capable, leading
+// the operator to deploy Shield policies that won't pre-block.
+export async function effectiveEnforcementMode(apiKey, agentId) {
+  const agentConfig = await getAgentConfig(apiKey, agentId);
+  return detectAlwaysAsk(agentConfig)
+    ? ENFORCEMENT_MODES.SYNC_CONFIRM
+    : ENFORCEMENT_MODES.SYNC_INTERRUPT;
+}
 function extractText(content) {
   if (typeof content === 'string') return content;
   if (Array.isArray(content)) {
@@ -490,6 +537,23 @@ export class AnthropicManagedSource extends Source {
     super({ apiKey });
     if (!apiKey) throw new Error('AnthropicManagedSource requires an apiKey');
     this.apiKey = apiKey;
+    // Per-agent effective enforcement mode cache. One getAgent call per
+    // agent across the lifetime of the Source instance.
+    this._modeCache = new Map();
+  }
+  /**
+   * Resolve the effective enforcement mode for an agent and cache the
+   * answer. Useful internally for enforce() to choose between
+   * pre-execution confirmation (always_ask agents) and post-hoc
+   * interrupt (default agents).
+   */
+  async _getEffectiveModeFor(agentId) {
+    const cached = this._modeCache.get(agentId);
+    if (cached) return cached;
+    const mode = await effectiveEnforcementMode(this.apiKey, agentId);
+    this._modeCache.set(agentId, mode);
+    return mode;
   }
   /**
@@ -521,16 +585,77 @@ export class AnthropicManagedSource extends Source {
   }
   /**
-   * Enforce a policy decision against a pending action.
+   * Enforce a policy decision against a pending action — v1.0.1 F-4.
+   *
+   * Routes through the right Anthropic event depending on the agent's
+   * effective enforcement mode:
+   *   - sync_confirm  (agent has at least one tool with always_ask):
+   *       'allow' → user.tool_confirmation { result: allow }
+   *       'deny'  → user.tool_confirmation { result: deny }   (pre-execution block)
+   *   - sync_interrupt (no always_ask available):
+   *       'allow' → no-op (nothing to confirm — the tool already ran or
+   *                 will run without a gate)
+   *       'deny'  → user.interrupt + optional follow-up message
+   *                 (post-hoc termination)
+   *
+   * Returns { enforced: boolean, mode: string, native_response?: object }
+   * where `mode` describes the path taken so the caller can log it.
    *
-   * PR-A scaffold: the actual `user.tool_confirmation` / `user.interrupt`
-   * HTTP call currently lives in scripts/shield.js, which talks to the
-   * Anthropic API directly. Migrating that into this method is PR-D — at
-   * which point this body will POST the decision via the SSE/HTTP control
-   * channel. For PR-A, the method exists to satisfy the contract;
-   * Shield does not call it yet.
+   * @param {object} action    A WMAAction (must carry session_id and id)
+   * @param {object} decision  { decision: 'allow'|'deny', reason?: string }
    */
-  async enforce(action, decision) { // eslint-disable-line no-unused-vars
-    throw new Error('AnthropicManagedSource.enforce() — Shield migration pending PR-D (scripts/shield.js still handles enforcement directly)');
+  async enforce(action, decision) {
+    if (!action || typeof action !== 'object') {
+      throw new Error('enforce(action, decision): action must be a WMAAction object');
+    }
+    if (!action.session_id) {
+      throw new Error('enforce(action, decision): action.session_id is required');
+    }
+    if (!action.agent_id) {
+      throw new Error('enforce(action, decision): action.agent_id is required');
+    }
+    if (!decision || (decision.decision !== 'allow' && decision.decision !== 'deny')) {
+      throw new Error(`enforce(action, decision): decision must be 'allow' or 'deny' (got ${decision?.decision})`);
+    }
+    const mode = await this._getEffectiveModeFor(action.agent_id);
+    const isToolUse = action.action_type === ACTION_TYPES.TOOL_USE
+      || action.action_type === ACTION_TYPES.MCP_TOOL_USE
+      || action.action_type === ACTION_TYPES.CUSTOM_TOOL_USE;
+    // Path 1 — pre-execution confirmation when the agent supports it AND
+    // the pending action is a tool_use (only kind we can pre-block).
+    if (mode === ENFORCEMENT_MODES.SYNC_CONFIRM && isToolUse && action.id) {
+      if (decision.decision === 'allow') {
+        const res = await confirmAllow({
+          apiKey: this.apiKey,
+          sessionId: action.session_id,
+          toolUseId: action.id,
+        });
+        return { enforced: true, mode: 'confirm_allow', native_response: res };
+      }
+      const res = await confirmDeny({
+        apiKey: this.apiKey,
+        sessionId: action.session_id,
+        toolUseId: action.id,
+        denyMessage: decision.reason,
+      });
+      return { enforced: true, mode: 'confirm_deny', native_response: res };
+    }
+    // Path 2 — post-hoc interrupt. The only enforcement available when
+    // the agent has no always_ask tools, OR for non-tool actions we
+    // can't pre-block.
+    if (decision.decision === 'deny') {
+      const res = await interruptSession({
+        apiKey: this.apiKey,
+        sessionId: action.session_id,
+        followUpMessage: decision.reason,
+      });
+      return { enforced: true, mode: 'interrupt', native_response: res };
+    }
+    // Allow + no pre-gate available = nothing to do at the SDK level.
+    return { enforced: false, mode: 'no_op', reason: 'no pre-execution gate available for this action' };
   }
 }

package/src/sources/contract.js CHANGED Viewed

@@ -127,6 +127,19 @@ export const PROVIDERS = Object.freeze({
 //  * SUB-AGENT FIELDS (PR-C — see WMAAction.parent_agent_id):
 //  * @property {string|null} parent_agent_id        Null for root agents
 //  * @property {string|null} composition_pattern    From COMPOSITION_PATTERNS
+//  *
+//  * MULTI-AGENT DISCRIMINATORS (v1.0.2 F-6a — preserved LOCALLY only,
+//  * never sent raw to Fortress; the SignalsAggregator derives the
+//  * aggregated session_ids list from them at finalize time):
+//  * @property {string|null} session_thread_id      The thread the event happened in.
+//  *                                                For frameworks where one session can
+//  *                                                host multiple threads/sub-agents
+//  *                                                (Anthropic Task tool, future similar
+//  *                                                designs), this is how the vendor
+//  *                                                itself discriminates "parent vs sub".
+//  * @property {string|null} agent_name             The human-named emitter of this event
+//  *                                                (the parent agent OR a sub-agent
+//  *                                                running inside the parent's session).
 //  */
 const REQUIRED_FIELDS = ['id', 'provider', 'agent_id', 'session_id', 'action_type', 'timestamp', 'status'];