watchmyagents 1.0.0 → 1.0.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -122,18 +122,18 @@ wma-fetch (--agent-id <agent_id> | --all-agents) [--session-id <sess_id>] [--sin
122
122
  | `--interval 5m` | Poll interval in watch mode (default `5m`; accepts `30s`/`1h`/…) |
123
123
  | `--upload` | In watch mode, anonymize each new window and ship signals to Fortress (needs `WMA_API_KEY` + `WMA_FORTRESS_BASE_URL` + `WMA_SIGNALS_SALT`). Raw stays local. |
124
124
  | `--discovery-since 7d` | Window for discovering NEW sessions (default `7d`). Sessions already being tracked are re-fetched regardless of age, so long-running ones never drop out. |
125
- | `--send-agent-names` | Opt-in: send the human agent name as the Fortress `display_name`. Default sends the agent id only (the name may contain client/project info). |
125
+ | `--no-send-agent-names` | Opt-out: send only the agent id as the Fortress `display_name`. **By default, the human agent name** (sanitized) is sent so dashboards/decisions stay legible. Pass this flag if your agent names themselves carry client/project info you'd rather keep pseudonymized. |
126
126
  | `--api-key sk-ant-…` | Override the `ANTHROPIC_API_KEY` env var. **Discouraged** — visible in shell history & process list. Prefer the env var. |
127
127
 
128
128
  Logs land in `./watchmyagents-logs/<agent_id>/<date>.ndjson` (file mode `0600`, dir `0700`).
129
129
 
130
- ### `wma-anonymize` — preview what would leave your machine
130
+ ### `wma-signals` — preview what would leave your machine
131
131
 
132
132
  Produces the anonymized signals payload (counts, latencies, salted IoC hashes, sequence histograms — no raw URLs/commands/prompts) that future WMA cloud features would ship. Useful to verify Containment compliance and to test the format.
133
133
 
134
134
  ```bash
135
135
  export WMA_SIGNALS_SALT="$(node -e 'console.log(require("crypto").randomBytes(16).toString("hex"))')"
136
- wma-anonymize ./watchmyagents-logs
136
+ wma-signals ./watchmyagents-logs
137
137
  # → JSON on stdout. Add --out signals.json to write to file.
138
138
  ```
139
139
 
@@ -146,7 +146,7 @@ Anonymizes your local NDJSON and POSTs the resulting payload to the WMA Fortress
146
146
  ```bash
147
147
  export WMA_API_KEY="wma_..." # from Fortress dashboard → Settings → API Keys
148
148
  export WMA_FORTRESS_URL="https://<your-project>.supabase.co/functions/v1/ingest-signals"
149
- export WMA_SIGNALS_SALT="..." # same salt as wma-anonymize
149
+ export WMA_SIGNALS_SALT="..." # same salt as wma-signals
150
150
 
151
151
  wma-upload-fortress --agent-id agent_01ABC... [--display-name "My agent"]
152
152
  # → POSTs the anonymized payload. Server returns signal_id + agent_id.
@@ -155,7 +155,7 @@ wma-upload-fortress --agent-id agent_01ABC... [--display-name "My agent"]
155
155
  wma-upload-fortress --agent-id agent_xxx --dry-run
156
156
  ```
157
157
 
158
- **What is sent:** the anonymized signals payload (counts, latencies, salted IoC hashes, sequences — same as `wma-anonymize` output), the agent's **`classification`** when the daemon has it (`{agent_type, confidence, stage}` — anonymized metadata, never raw content), **plus the routing identifiers**: `provider` (e.g., `"anthropic-managed"` — added in v1.0 for the multi-framework SDK), `native_agent_id` (the canonical provider-agnostic field), `anthropic_agent_id` (kept for backwards compat with existing Fortress instances; will be dropped once Fortress migrates), `parent_agent_id` (`null` for root agents — populated for sub-agents detected via OpenAI Agents handoffs, CrewAI manager mode, Hermes Agent `spawn_subagent`, LangGraph sub-graphs), `composition_pattern` (`"solo" | "hierarchy" | "graph" | "peer"` — defaults to `"solo"` for Anthropic until thread-message detection lands), `enforcement_mode` (`"sync_confirm" | "sync_interrupt" | "detect_only"` — the strongest enforcement capability the Source provides; Fortress greys out Shield UI for `detect_only` agents to prevent UI/runtime mismatch), and a `display_name`. The agent id is required so Fortress can associate signals with the right agent; `display_name` defaults to the **human-readable agent name** (sanitized to strip control chars) for UX in the dashboard — pass `--no-send-agent-names` to keep it pseudonymized (sends the agent id instead) if your agent names themselves carry sensitive client/project info.
158
+ **What is sent:** the anonymized signals payload (counts, latencies, salted IoC hashes, sequences — same as `wma-signals` output), the agent's **`classification`** when the daemon has it (`{agent_type, confidence, stage}` — anonymized metadata, never raw content), **plus the routing identifiers**: `provider` (e.g., `"anthropic-managed"` — added in v1.0 for the multi-framework SDK), `native_agent_id` (the canonical provider-agnostic field), `anthropic_agent_id` (kept for backwards compat with existing Fortress instances; will be dropped once Fortress migrates), `parent_agent_id` (`null` for root agents — populated for sub-agents detected via OpenAI Agents handoffs, CrewAI manager mode, Hermes Agent `spawn_subagent`, LangGraph sub-graphs), `composition_pattern` (`"solo" | "hierarchy" | "graph" | "peer"` — defaults to `"solo"` for Anthropic until thread-message detection lands), `enforcement_mode` (`"sync_confirm" | "sync_interrupt" | "detect_only"` — the strongest enforcement capability the Source provides; Fortress greys out Shield UI for `detect_only` agents to prevent UI/runtime mismatch), and a `display_name`. The agent id is required so Fortress can associate signals with the right agent; `display_name` defaults to the **human-readable agent name** (sanitized to strip control chars) for UX in the dashboard — pass `--no-send-agent-names` to keep it pseudonymized (sends the agent id instead) if your agent names themselves carry sensitive client/project info.
159
159
  **What is NOT sent:** raw prompts, raw URLs/commands/queries, raw agent responses, raw error messages. All payload content stays on your machine.
160
160
 
161
161
  The endpoint auto-registers the agent on the first upload if it doesn't exist in Fortress yet — no manual onboarding needed for new agents.
@@ -247,7 +247,7 @@ WatchMyAgents is built so that **your prompts and outputs never have to leave yo
247
247
  |---|---|
248
248
  | **Your machine** (`./watchmyagents-logs/`) | Full NDJSON with all prompts, tool inputs, agent outputs. `chmod 600` on every file. |
249
249
  | **Anthropic API** | Where the agent runs. WMA pulls events via the public REST API only. |
250
- | **WMA Fortress** (opt-in, only with `--upload` / `wma-upload-fortress` / `wma-shield --policies-source fortress`) | The **anonymized signals** payload (counts, timings, salted hashes, sequences) + routing identifiers: `provider` (e.g. `"anthropic-managed"`), `native_agent_id`, `anthropic_agent_id` (legacy alias), and `display_name` (defaults to the agent id; the human agent name only with `--send-agent-names`). Shield enforcement **decisions** (hashed session/event/input fingerprints — never raw values). **Never** raw prompts, URLs, commands, or outputs. |
250
+ | **WMA Fortress** (opt-in, only with `--upload` / `wma-upload-fortress` / `wma-shield --policies-source fortress`) | The **anonymized signals** payload (counts, timings, salted hashes, sequences) + routing identifiers: `provider` (e.g. `"anthropic-managed"`), `native_agent_id`, `anthropic_agent_id` (legacy alias), and `display_name` (defaults to the **human agent name** for dashboard UX pass `--no-send-agent-names` to opt out and send only the agent id). Shield enforcement **decisions** (hashed session/event/input fingerprints — never raw values). **Never** raw prompts, URLs, commands, or outputs. |
251
251
 
252
252
  This is the "local-first" guarantee: **raw payloads never leave your machine.** Cloud upload is opt-in and carries only anonymized metadata + the agent id/name needed to route it.
253
253
 
package/SECURITY.md CHANGED
@@ -30,10 +30,21 @@ WMA needs your Anthropic API key to call the Managed Agents REST API on your beh
30
30
 
31
31
  ### What WMA does NOT do
32
32
 
33
- - ❌ Does not phone home, telemetry, analytics, or usage reporting
34
- - ❌ Does not send any data to WMA-controlled servers
35
- - ❌ Does not store, log, or transmit your Anthropic API key anywhere except `api.anthropic.com`
36
- - ❌ Does not require an account, signup, or license key
33
+ - ❌ No phone-home, no usage analytics, no silent telemetry — WMA never opens a network connection to a WMA-controlled endpoint on its own.
34
+ - ❌ Does not store, log, or transmit your Anthropic API key anywhere except `api.anthropic.com`.
35
+ - ❌ Does not require an account, signup, or license key.
36
+
37
+ ### Fortress upload — strictly opt-in
38
+
39
+ Since v0.5.0, WMA supports an **opt-in** cloud component (WMA Fortress) for teams who want a multi-agent dashboard + cross-fleet Guardian analysis. The upload only happens when you explicitly invoke `--upload` on `wma-fetch`, run `wma-upload-fortress`, or run `wma-shield --policies-source fortress`. The defaults across all CLIs are zero-cloud — your machine stays the only place raw data ever exists.
40
+
41
+ What goes to Fortress when you opt in:
42
+ - ✅ The **anonymized signals payload** (counts, latencies, salted IoC hashes, sequences, classification metadata) — see [`docs/CONTAINMENT.md`](docs/CONTAINMENT.md) for the bit-exact contract and the 6 invariant tests that lock it down.
43
+ - ✅ Routing identifiers (`provider`, `native_agent_id`, optionally the human `display_name` — see `--no-send-agent-names` to opt this out).
44
+
45
+ What does **NOT** go to Fortress, ever:
46
+ - ❌ Raw prompts, agent outputs, tool inputs, tool outputs, error message text, raw URLs, raw commands, raw queries — these stay in your local `watchmyagents-logs/`.
47
+ - ❌ Your Anthropic API key. Fortress authenticates with a separate `WMA_API_KEY` issued from your Fortress account and never sees `ANTHROPIC_API_KEY`.
37
48
 
38
49
  ## Threat model
39
50
 
@@ -56,7 +67,6 @@ WMA combines **two complementary layers**:
56
67
  - **Pre-installation activity.** Shield only enforces from the moment it attaches forward. Past events are not retroactively replayed or re-evaluated.
57
68
  - **A malicious policy file.** Shield's policy engine refuses obviously unsafe regex patterns (e.g. catastrophic backtracking) and truncates inputs before regex tests to mitigate ReDoS. But a user-controlled policy file remains a code-adjacent input — treat it as you would treat sourcecode.
58
69
  - **A compromised Anthropic API.** WMA trusts the events delivered by Anthropic. This is out of scope.
59
- - **A compromised Anthropic API.** WMA trusts the events delivered by Anthropic. This is out of scope.
60
70
 
61
71
  ## Supply chain
62
72
 
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "watchmyagents",
3
- "version": "1.0.0",
3
+ "version": "1.0.2",
4
4
  "description": "Security observability + real-time policy enforcement for AI agents. Local-first NDJSON capture with a continuous Watch daemon that auto-uploads anonymized signals, Shield CLI that blocks policy violations live (with policies pulled from Fortress cloud), anonymizer producing signals-only payloads, bidirectional sync with WatchMyAgents Fortress, and one-command install as an always-on launchd/systemd service — closing the recursive Watch→Guardian→Shield security loop.",
5
5
  "type": "module",
6
6
  "files": [
@@ -8,7 +8,7 @@
8
8
  "scripts/inspect.js",
9
9
  "scripts/fetch-anthropic.js",
10
10
  "scripts/shield.js",
11
- "scripts/anonymize.js",
11
+ "scripts/signals.js",
12
12
  "scripts/upload-fortress.js",
13
13
  "scripts/service.js",
14
14
  "scripts/agents.js",
@@ -20,7 +20,7 @@
20
20
  "wma-inspect": "scripts/inspect.js",
21
21
  "wma-fetch": "scripts/fetch-anthropic.js",
22
22
  "wma-shield": "scripts/shield.js",
23
- "wma-anonymize": "scripts/anonymize.js",
23
+ "wma-signals": "scripts/signals.js",
24
24
  "wma-upload-fortress": "scripts/upload-fortress.js",
25
25
  "wma-service": "scripts/service.js",
26
26
  "wma-agents": "scripts/agents.js"
@@ -30,7 +30,7 @@
30
30
  "inspect": "node scripts/inspect.js",
31
31
  "fetch": "node scripts/fetch-anthropic.js",
32
32
  "shield": "node scripts/shield.js",
33
- "anonymize": "node scripts/anonymize.js",
33
+ "signals": "node scripts/signals.js",
34
34
  "upload-fortress": "node scripts/upload-fortress.js",
35
35
  "service": "node scripts/service.js",
36
36
  "agents": "node scripts/agents.js"
@@ -34,7 +34,7 @@ import { classifyAgentType } from '../src/typology.js';
34
34
  import { aggregate, buildFeatures } from '../src/typology-features.js';
35
35
  import {
36
36
  getAgent, listAgents, listSessions, fetchSessionEntries, fetchRawEvents,
37
- AnthropicManagedSource,
37
+ AnthropicManagedSource, effectiveEnforcementMode,
38
38
  } from '../src/sources/anthropic-managed.js';
39
39
 
40
40
  function parseArgs(argv) {
@@ -111,7 +111,7 @@ function postJson(url, headers, body) {
111
111
  // `classification` (optional) carries the agent's typology — Fortress upserts
112
112
  // agent_type/confidence/stage on the agent row so the typology badge + the
113
113
  // apply-template flow fill themselves with no manual click.
114
- async function uploadSignals(uploadCtx, agentId, displayName, entries, classification) {
114
+ async function uploadSignals(uploadCtx, agentId, displayName, entries, classification, enforcementMode) {
115
115
  const agg = new SignalsAggregator({ salt: uploadCtx.salt });
116
116
  for (const e of entries) agg.add(e);
117
117
  const sig = agg.finalize();
@@ -132,16 +132,18 @@ async function uploadSignals(uploadCtx, agentId, displayName, entries, classific
132
132
  // so old Fortress instances still recognize the upload. Once the
133
133
  // Lovable-deployed ingest-signals migrates, future SDK releases will
134
134
  // stop emitting `anthropic_agent_id`.
135
- // PR-D: enforcement_mode is read CANONICALLY from the Source's static
136
- // declaration so it stays in sync with the actual capability of the
137
- // adapter never re-declared inline.
135
+ // PR-D / v1.0.1 F-2: enforcement_mode is the EFFECTIVE per-agent mode
136
+ // (sync_confirm only if the agent has permission_policy: always_ask on
137
+ // at least one tool; sync_interrupt otherwise). Falls back to the
138
+ // Source's static MAX capability if the resolution failed upstream —
139
+ // legacy behavior, but flags a warning in the daemon log.
138
140
  const body = JSON.stringify({
139
141
  provider: AnthropicManagedSource.providerName,
140
142
  native_agent_id: agentId,
141
143
  anthropic_agent_id: agentId,
142
144
  parent_agent_id,
143
145
  composition_pattern,
144
- enforcement_mode: AnthropicManagedSource.enforcementMode,
146
+ enforcement_mode: enforcementMode || AnthropicManagedSource.enforcementMode,
145
147
  display_name: displayName,
146
148
  window_start: sig.window_start,
147
149
  window_end: sig.window_end,
@@ -250,6 +252,10 @@ async function runWatch({ apiKey, resolveAgents, fleet, logDir, intervalMs, wind
250
252
  const sessionAgent = new Map();// sessionId → { agentId, model, displayName }
251
253
  const priors = new Map(); // agentId → previous classification (threads the
252
254
  // typology state machine across upload cycles)
255
+ // F-2: cache the effective enforcement mode per agent. One getAgent call
256
+ // per agent per daemon run (until the entry is evicted). Refreshed only
257
+ // if upload fails — agent permission_policy doesn't change mid-flight.
258
+ const enforcementModes = new Map(); // agentId → 'sync_confirm' | 'sync_interrupt'
253
259
 
254
260
  const ac = new AbortController();
255
261
  const shutdown = () => { info('shutting down…'); ac.abort(); };
@@ -319,7 +325,19 @@ async function runWatch({ apiKey, resolveAgents, fleet, logDir, intervalMs, wind
319
325
  classification = { agent_type: cls.classified_type, confidence: cls.confidence, stage: cls.stage };
320
326
  } catch (e) { warn(` classification skipped: ${e.message}`); }
321
327
 
322
- const resp = await uploadSignals(uploadCtx, ag.agentId, sendNames ? ag.displayName : ag.agentId, fresh, classification);
328
+ // F-2: resolve the effective enforcement mode for this agent
329
+ // (cached across cycles). On failure, fall back to the static
330
+ // provider max so the upload still succeeds.
331
+ let mode = enforcementModes.get(ag.agentId);
332
+ if (!mode) {
333
+ try {
334
+ mode = await effectiveEnforcementMode(apiKey, ag.agentId);
335
+ enforcementModes.set(ag.agentId, mode);
336
+ } catch (e) {
337
+ warn(` enforcement_mode resolution failed for ${ag.agentId}: ${e.message} (falling back to provider max)`);
338
+ }
339
+ }
340
+ const resp = await uploadSignals(uploadCtx, ag.agentId, sendNames ? ag.displayName : ag.agentId, fresh, classification, mode);
323
341
  if (resp?.signal_id) {
324
342
  const cTag = classification ? ` · type ${classification.agent_type} (${Math.round(classification.confidence * 100)}%, ${classification.stage})` : '';
325
343
  info(` ↑ signals uploaded (signal_id ${resp.signal_id})${cTag}`);
@@ -1,9 +1,15 @@
1
1
  #!/usr/bin/env node
2
- // wma-anonymizeproduce the anonymized signals payload that Watch would
3
- // send to Fortress, for inspection / verification.
2
+ // wma-signalsbuild the signals payload that Watch would send to
3
+ // Fortress, for inspection / verification.
4
+ //
5
+ // (Renamed from `wma-anonymize` in v1.0.1. The script's job is to PRODUCE
6
+ // the signals payload; anonymization is a property of that payload,
7
+ // guaranteed by the underlying SignalsAggregator. The new name aligns
8
+ // with the rest of the product vocabulary: SignalsAggregator,
9
+ // ingest-signals Edge Function, signals.payload shape.)
4
10
  //
5
11
  // Usage:
6
- // wma-anonymize <path-to-ndjson-or-dir> [--salt <hex>] [--out <file>]
12
+ // wma-signals <path-to-ndjson-or-dir> [--salt <hex>] [--out <file>]
7
13
  //
8
14
  // The `--salt` argument MUST be a stable per-customer secret. Using a
9
15
  // random salt each run means hashes won't correlate across runs (useless
@@ -56,11 +62,11 @@ async function main() {
56
62
  const args = parseArgs(process.argv.slice(2));
57
63
 
58
64
  if (!args._target) {
59
- die(`usage: wma-anonymize <path> [--salt <hex>] [--out <file>]
65
+ die(`usage: wma-signals <path> [--salt <hex>] [--out <file>]
60
66
 
61
- Reads Watch NDJSON logs and produces the anonymized signals payload
62
- that would be sent to Fortress. Use this to inspect exactly what
63
- leaves your machine BEFORE any upload feature is enabled.
67
+ Builds the signals payload that Watch would send to Fortress, from
68
+ local NDJSON logs. Use this to inspect exactly what leaves your
69
+ machine BEFORE any upload feature is enabled.
64
70
 
65
71
  Required: --salt <hex> or WMA_SIGNALS_SALT env var (per-customer secret).
66
72
  If you don't have one, generate: node -e "console.log(require('crypto').randomBytes(16).toString('hex'))"
@@ -73,8 +79,8 @@ and save it in .env.local.`);
73
79
  ' generate one with: node -e "console.log(require(\'crypto\').randomBytes(16).toString(\'hex\'))"');
74
80
  }
75
81
  if (args.salt) {
76
- process.stderr.write('[wma-anonymize] warning: --salt on the command line is visible in shell history.\n' +
77
- ' Prefer: export WMA_SIGNALS_SALT=...\n');
82
+ process.stderr.write('[wma-signals] warning: --salt on the command line is visible in shell history.\n' +
83
+ ' Prefer: export WMA_SIGNALS_SALT=...\n');
78
84
  }
79
85
  if (salt.length < 16) {
80
86
  die('error: salt too short (need ≥16 hex chars / ≥8 bytes of entropy)');
@@ -102,7 +108,7 @@ and save it in .env.local.`);
102
108
  const json = JSON.stringify(signals, null, 2);
103
109
  if (args.out) {
104
110
  await writeFile(resolve(args.out), json + '\n', { encoding: 'utf8', mode: 0o600 });
105
- process.stderr.write(`[wma-anonymize] wrote ${args.out} (${signals._meta.entries_processed} entries processed)\n`);
111
+ process.stderr.write(`[wma-signals] wrote ${args.out} (${signals._meta.entries_processed} entries processed)\n`);
106
112
  } else {
107
113
  process.stdout.write(json + '\n');
108
114
  }
@@ -4,7 +4,7 @@
4
4
  //
5
5
  // Composable with the rest of the SDK:
6
6
  // wma-fetch → ./watchmyagents-logs/<agent_id>/<date>.ndjson (local capture)
7
- // wma-anonymize → signals payload (Containment: no raw content)
7
+ // wma-signals → signals payload (Containment: no raw content)
8
8
  // wma-upload-fortress → POST signals to https://<project>.supabase.co/functions/v1/ingest-signals
9
9
  //
10
10
  // Usage:
@@ -183,8 +183,13 @@ async function main() {
183
183
  // is a one-shot post-hoc tool — it has no per-entry context to derive
184
184
  // hierarchy from, so it sends defaults (solo / null) until a future
185
185
  // adapter writes those fields into the local NDJSON.
186
- // PR-D: enforcement_mode read from the Source class so any change to
187
- // the adapter's capability automatically reflects in the payload.
186
+ // PR-D / v1.0.1 F-2: enforcement_mode set to the provider's MAX
187
+ // capability (sync_confirm). The continuous Watch daemon
188
+ // (wma-fetch --watch --upload) resolves the EFFECTIVE per-agent mode
189
+ // via effectiveEnforcementMode(), but this one-shot uploader has no
190
+ // ANTHROPIC_API_KEY in scope so it cannot make the live getAgent
191
+ // call. Best-effort: send the max; the daemon's subsequent uploads
192
+ // will correct the value once it resolves.
188
193
  const body = {
189
194
  provider: AnthropicManagedSource.providerName,
190
195
  native_agent_id: agentId,
package/src/anonymizer.js CHANGED
@@ -37,6 +37,27 @@ const HASHABLE_INPUT_FIELDS = ['url', 'query', 'command', 'path', 'file_path'];
37
37
  // Tool types whose inputs we want to hash for IoC tracking
38
38
  const TOOL_ACTIONS = new Set(['tool_use', 'mcp_tool_use', 'custom_tool_use']);
39
39
 
40
+ // Well-known vendor built-in tool names that are SAFE to keep in clear in the
41
+ // signals payload. They are documented by the vendor, common across customers,
42
+ // and the operator NEEDS them legible in the dashboard ("3 web_search calls
43
+ // in 10 minutes" is the actionable signal). Anything not on this list is
44
+ // considered customer-controlled (custom tool, MCP tool with a customer-chosen
45
+ // name like "client_acme_export") and gets hashed before egress.
46
+ //
47
+ // To add a built-in: only confirmed-public-by-vendor names — never speculative
48
+ // matches. When in doubt, hash.
49
+ const WELL_KNOWN_TOOLS = new Set([
50
+ // Anthropic Managed Agents
51
+ 'web_search', 'web_fetch', 'bash', 'code_execution',
52
+ 'str_replace_editor', 'str_replace_based_edit_tool',
53
+ 'computer', 'computer_use_20250124', 'computer_use_20241022',
54
+ 'text_editor', 'text_editor_20250124', 'text_editor_20241022',
55
+ // OpenAI Agents / Responses
56
+ 'web_search_preview', 'file_search', 'computer_use_preview', 'code_interpreter',
57
+ // Common framework primitives
58
+ 'function', 'retrieval',
59
+ ]);
60
+
40
61
  // ── Hash helpers ─────────────────────────────────────────────────────────
41
62
 
42
63
  /**
@@ -56,6 +77,30 @@ export function generateSalt() {
56
77
  return randomBytes(16).toString('hex');
57
78
  }
58
79
 
80
+ // ── Tool name normalization (Containment hardening, v1.0.1 F-3) ────────
81
+
82
+ /**
83
+ * Return the canonical tool-name token that's safe to ship to Fortress.
84
+ *
85
+ * - For documented vendor built-ins (WELL_KNOWN_TOOLS) the name is kept
86
+ * in clear so dashboards and Guardian policies can reason about the
87
+ * tool by its public identifier.
88
+ * - For anything else (customer-defined functions, MCP tools whose name
89
+ * is set by the customer's MCP server, e.g. "client_acme_export"),
90
+ * the name is salted-SHA256-hashed with a `tool_hash:` prefix so it
91
+ * cannot leak project/client identifiers.
92
+ *
93
+ * Empty / null tool names return null.
94
+ */
95
+ export function normalizeToolName(toolName, salt) {
96
+ if (toolName == null) return null;
97
+ const s = String(toolName);
98
+ if (s.length === 0) return null;
99
+ if (WELL_KNOWN_TOOLS.has(s)) return s;
100
+ if (!salt) throw new Error('normalizeToolName requires a salt to hash custom tool names');
101
+ return 'tool_hash:' + createHash('sha256').update(salt).update(s).digest('hex').slice(0, 32);
102
+ }
103
+
59
104
  // ── Single-entry extractor: what hashable IoCs are in this entry? ────────
60
105
 
61
106
  function extractIocs(entry, salt) {
@@ -89,6 +134,13 @@ export class SignalsAggregator {
89
134
  this.entryCount = 0;
90
135
  this._prevActionType = null;
91
136
  this._prevSessionId = null;
137
+ // v1.0.2 F-6b — opaque session ids active in this window. Shipped to
138
+ // Fortress in the payload as `session_ids[]` so an operator looking at
139
+ // a Shield decision in the dashboard can grep their LOCAL NDJSON by
140
+ // session_id immediately (forensics short-circuit). The Anthropic
141
+ // session_id is a non-semantic token like `sess_01XaNB…` — same
142
+ // sensitivity class as `agent_id`, which we already transmit.
143
+ this.seenSessions = new Set(); // unique session_ids
92
144
  }
93
145
 
94
146
  add(entry) {
@@ -102,6 +154,13 @@ export class SignalsAggregator {
102
154
  if (!this.windowEnd || ts > this.windowEnd) this.windowEnd = ts;
103
155
  }
104
156
 
157
+ // F-6b — collect every distinct session_id encountered in the window.
158
+ // Stays opaque (no string transformation), bounded by the natural
159
+ // number of sessions in the window.
160
+ if (typeof entry.session_id === 'string' && entry.session_id.length > 0) {
161
+ this.seenSessions.add(entry.session_id);
162
+ }
163
+
105
164
  // Counts
106
165
  const at = entry.action_type || 'unknown';
107
166
  this.counts[at] = (this.counts[at] || 0) + 1;
@@ -115,15 +174,21 @@ export class SignalsAggregator {
115
174
  this._prevActionType = at;
116
175
  this._prevSessionId = entry.session_id || null;
117
176
 
118
- // Tools
177
+ // Tools — Containment (v1.0.1 F-3): well-known vendor built-ins keep
178
+ // their public name; customer-defined / MCP tool names get hashed so
179
+ // no client-identifying string ("client_acme_export") leaks via the
180
+ // tool_counts / tool_latencies / error_rate maps.
119
181
  if (entry.tool_name && TOOL_ACTIONS.has(at)) {
120
- this.toolCounts[entry.tool_name] = (this.toolCounts[entry.tool_name] || 0) + 1;
121
- if (entry.status === 'error') {
122
- this.toolErrors[entry.tool_name] = (this.toolErrors[entry.tool_name] || 0) + 1;
123
- }
124
- if (typeof entry.duration_ms === 'number') {
125
- if (!this.toolLatencies[entry.tool_name]) this.toolLatencies[entry.tool_name] = [];
126
- this.toolLatencies[entry.tool_name].push(entry.duration_ms);
182
+ const toolKey = normalizeToolName(entry.tool_name, this.salt);
183
+ if (toolKey) {
184
+ this.toolCounts[toolKey] = (this.toolCounts[toolKey] || 0) + 1;
185
+ if (entry.status === 'error') {
186
+ this.toolErrors[toolKey] = (this.toolErrors[toolKey] || 0) + 1;
187
+ }
188
+ if (typeof entry.duration_ms === 'number') {
189
+ if (!this.toolLatencies[toolKey]) this.toolLatencies[toolKey] = [];
190
+ this.toolLatencies[toolKey].push(entry.duration_ms);
191
+ }
127
192
  }
128
193
  // Extract & hash IoCs from this tool's input
129
194
  for (const h of extractIocs(entry, this.salt)) this.iocHashes.add(h);
@@ -182,6 +247,11 @@ export class SignalsAggregator {
182
247
  sequences_top10: sequencesTop,
183
248
  stop_reasons: this.stopReasons,
184
249
  tokens_total: this.tokensTotal,
250
+ // F-6c — opaque session ids active in this window, sorted for
251
+ // determinism. Operator forensic chain:
252
+ // Fortress decision → window_start/end + session_ids → grep
253
+ // the local NDJSON of the affected agent → full raw context.
254
+ session_ids: [...this.seenSessions].sort(),
185
255
  },
186
256
  _meta: {
187
257
  entries_processed: this.entryCount,
package/src/logger.js CHANGED
@@ -13,6 +13,8 @@ import { assertSafePathSegment } from './validate.js';
13
13
  const EXPORT_FIELDS = [
14
14
  'id', 'agent_id', 'parent_agent_id', 'composition_pattern',
15
15
  'provider', 'timestamp', 'action_type',
16
+ // v1.0.2 F-6a — Anthropic-style sub-agent discriminators preserved locally
17
+ 'session_thread_id', 'agent_name',
16
18
  'tool_name', 'duration_ms', 'tokens_used',
17
19
  'input_tokens', 'output_tokens', 'cache_read_tokens', 'cache_creation_tokens',
18
20
  'cost_usd', 'model',
@@ -60,6 +62,11 @@ export class Logger {
60
62
  // populates these on the event, and the Logger threads them through.
61
63
  parent_agent_id: e.parent_agent_id ?? null,
62
64
  composition_pattern: e.composition_pattern || 'solo',
65
+ // v1.0.2 F-6a: Anthropic-style discriminators preserved LOCAL ONLY
66
+ // (never sent raw to Fortress — SignalsAggregator derives the
67
+ // aggregated session_ids list from these at finalize time).
68
+ session_thread_id: e.session_thread_id ?? null,
69
+ agent_name: e.agent_name ?? null,
63
70
  provider: e.provider || e.framework || 'generic',
64
71
  timestamp: e.timestamp || new Date().toISOString(),
65
72
  action_type: e.action_type || 'tool_call',
@@ -17,7 +17,11 @@
17
17
 
18
18
  import { request } from 'node:https';
19
19
  import { URLSearchParams } from 'node:url';
20
- import { Source, PROVIDERS, ENFORCEMENT_MODES } from './contract.js';
20
+ import { Source, PROVIDERS, ENFORCEMENT_MODES, ACTION_TYPES } from './contract.js';
21
+ import {
22
+ getAgentConfig, detectAlwaysAsk,
23
+ confirmAllow, confirmDeny, interruptSession,
24
+ } from '../shield/enforce.js';
21
25
 
22
26
  const API_HOST = 'api.anthropic.com';
23
27
  const BETA = 'managed-agents-2026-04-01';
@@ -181,6 +185,13 @@ export async function* fetchSessionEntries({ apiKey, agentId, sessionId, model }
181
185
  if (!RELEVANT.has(ev.type)) continue;
182
186
  const type = ev.type;
183
187
  const ts = ev.processed_at || ev.created_at || new Date().toISOString();
188
+ // v1.0.2 F-6a: capture Anthropic's own discriminators on EVERY event,
189
+ // not just thread_message_*. session_thread_id + agent_name are how
190
+ // the vendor itself tells parent activity from sub-agent activity.
191
+ // Preserved LOCALLY (NDJSON) only — never sent raw to Fortress.
192
+ const session_thread_id = ev.session_thread_id ?? null;
193
+ const agent_name = ev.agent_name ?? null;
194
+ const subAgentMeta = { session_thread_id, agent_name };
184
195
  const tsMillis = tsMs(ev);
185
196
 
186
197
  if (type === 'span.model_request_start') {
@@ -197,6 +208,7 @@ export async function* fetchSessionEntries({ apiKey, agentId, sessionId, model }
197
208
  const cw = u.cache_creation_input_tokens || 0;
198
209
  yield {
199
210
  ...base,
211
+ ...subAgentMeta,
200
212
  id: ev.id,
201
213
  action_type: 'llm_call',
202
214
  tool_name: null,
@@ -216,6 +228,7 @@ export async function* fetchSessionEntries({ apiKey, agentId, sessionId, model }
216
228
  if (type === 'user.message') {
217
229
  yield {
218
230
  ...base,
231
+ ...subAgentMeta,
219
232
  id: ev.id,
220
233
  action_type: 'user_message',
221
234
  tool_name: null,
@@ -230,6 +243,7 @@ export async function* fetchSessionEntries({ apiKey, agentId, sessionId, model }
230
243
  if (type === 'user.interrupt') {
231
244
  yield {
232
245
  ...base,
246
+ ...subAgentMeta,
233
247
  id: ev.id,
234
248
  action_type: 'user_interrupt',
235
249
  tool_name: null,
@@ -245,6 +259,7 @@ export async function* fetchSessionEntries({ apiKey, agentId, sessionId, model }
245
259
  const denied = ev.result === 'deny';
246
260
  yield {
247
261
  ...base,
262
+ ...subAgentMeta,
248
263
  id: ev.id,
249
264
  action_type: 'tool_confirmation',
250
265
  tool_name: null,
@@ -261,6 +276,7 @@ export async function* fetchSessionEntries({ apiKey, agentId, sessionId, model }
261
276
  if (type === 'user.custom_tool_result') {
262
277
  yield {
263
278
  ...base,
279
+ ...subAgentMeta,
264
280
  id: ev.id,
265
281
  action_type: 'custom_tool_result',
266
282
  tool_name: null,
@@ -276,6 +292,7 @@ export async function* fetchSessionEntries({ apiKey, agentId, sessionId, model }
276
292
  if (type === 'agent.message') {
277
293
  yield {
278
294
  ...base,
295
+ ...subAgentMeta,
279
296
  id: ev.id,
280
297
  action_type: 'message',
281
298
  tool_name: null,
@@ -290,6 +307,7 @@ export async function* fetchSessionEntries({ apiKey, agentId, sessionId, model }
290
307
  if (type === 'agent.thinking') {
291
308
  yield {
292
309
  ...base,
310
+ ...subAgentMeta,
293
311
  id: ev.id,
294
312
  action_type: 'thinking',
295
313
  tool_name: null,
@@ -317,6 +335,7 @@ export async function* fetchSessionEntries({ apiKey, agentId, sessionId, model }
317
335
  const isError = ev.is_error === true;
318
336
  yield {
319
337
  ...base,
338
+ ...subAgentMeta,
320
339
  id: ev.id,
321
340
  action_type: start?.isMcp ? 'mcp_tool_use' : 'tool_use',
322
341
  tool_name: start?.name || 'unknown',
@@ -333,6 +352,7 @@ export async function* fetchSessionEntries({ apiKey, agentId, sessionId, model }
333
352
  if (type === 'agent.custom_tool_use') {
334
353
  yield {
335
354
  ...base,
355
+ ...subAgentMeta,
336
356
  id: ev.id,
337
357
  action_type: 'custom_tool_use',
338
358
  tool_name: ev.name || 'unknown',
@@ -347,6 +367,7 @@ export async function* fetchSessionEntries({ apiKey, agentId, sessionId, model }
347
367
  if (type === 'agent.thread_context_compacted') {
348
368
  yield {
349
369
  ...base,
370
+ ...subAgentMeta,
350
371
  id: ev.id,
351
372
  action_type: 'context_compacted',
352
373
  tool_name: null,
@@ -366,6 +387,7 @@ export async function* fetchSessionEntries({ apiKey, agentId, sessionId, model }
366
387
  const direction = type.endsWith('_sent') ? 'sent' : 'received';
367
388
  yield {
368
389
  ...base,
390
+ ...subAgentMeta,
369
391
  id: ev.id,
370
392
  action_type: `thread_message_${direction}`,
371
393
  tool_name: null,
@@ -387,6 +409,7 @@ export async function* fetchSessionEntries({ apiKey, agentId, sessionId, model }
387
409
  const { id: _id, type: _type, processed_at: _pa, created_at: _ca, ...changes } = ev;
388
410
  yield {
389
411
  ...base,
412
+ ...subAgentMeta,
390
413
  id: ev.id,
391
414
  action_type: 'config_change',
392
415
  tool_name: null,
@@ -401,6 +424,7 @@ export async function* fetchSessionEntries({ apiKey, agentId, sessionId, model }
401
424
  if (type === 'session.thread_created') {
402
425
  yield {
403
426
  ...base,
427
+ ...subAgentMeta,
404
428
  id: ev.id,
405
429
  action_type: 'thread_created',
406
430
  tool_name: null,
@@ -418,6 +442,7 @@ export async function* fetchSessionEntries({ apiKey, agentId, sessionId, model }
418
442
  if (type === 'session.error') {
419
443
  yield {
420
444
  ...base,
445
+ ...subAgentMeta,
421
446
  id: ev.id,
422
447
  action_type: 'session_error',
423
448
  tool_name: null,
@@ -439,6 +464,7 @@ export async function* fetchSessionEntries({ apiKey, agentId, sessionId, model }
439
464
  const fatal = state === 'terminated';
440
465
  yield {
441
466
  ...base,
467
+ ...subAgentMeta,
442
468
  id: ev.id,
443
469
  action_type: 'state_transition',
444
470
  tool_name: null,
@@ -459,6 +485,27 @@ export async function* fetchSessionEntries({ apiKey, agentId, sessionId, model }
459
485
  }
460
486
  }
461
487
 
488
+ // ────────────────────────────────────────────────────────────────────────
489
+ // effectiveEnforcementMode — F-2 of the Codex v1.0.1 audit
490
+ // ────────────────────────────────────────────────────────────────────────
491
+ // AnthropicManagedSource.enforcementMode is the PROVIDER'S MAX capability
492
+ // (sync_confirm). But the EFFECTIVE mode for a given agent depends on
493
+ // whether at least one of its tools/toolsets has permission_policy =
494
+ // always_ask. When none does, Shield can only interrupt AFTER a violating
495
+ // tool ran, not block before — that's sync_interrupt territory.
496
+ //
497
+ // This helper resolves the per-agent effective mode from the live agent
498
+ // config so the value shipped to Fortress matches what Shield can
499
+ // actually do for THIS agent. Without this, Fortress can mis-display
500
+ // "sync_confirm" UI on an agent that's only interrupt-capable, leading
501
+ // the operator to deploy Shield policies that won't pre-block.
502
+ export async function effectiveEnforcementMode(apiKey, agentId) {
503
+ const agentConfig = await getAgentConfig(apiKey, agentId);
504
+ return detectAlwaysAsk(agentConfig)
505
+ ? ENFORCEMENT_MODES.SYNC_CONFIRM
506
+ : ENFORCEMENT_MODES.SYNC_INTERRUPT;
507
+ }
508
+
462
509
  function extractText(content) {
463
510
  if (typeof content === 'string') return content;
464
511
  if (Array.isArray(content)) {
@@ -490,6 +537,23 @@ export class AnthropicManagedSource extends Source {
490
537
  super({ apiKey });
491
538
  if (!apiKey) throw new Error('AnthropicManagedSource requires an apiKey');
492
539
  this.apiKey = apiKey;
540
+ // Per-agent effective enforcement mode cache. One getAgent call per
541
+ // agent across the lifetime of the Source instance.
542
+ this._modeCache = new Map();
543
+ }
544
+
545
+ /**
546
+ * Resolve the effective enforcement mode for an agent and cache the
547
+ * answer. Useful internally for enforce() to choose between
548
+ * pre-execution confirmation (always_ask agents) and post-hoc
549
+ * interrupt (default agents).
550
+ */
551
+ async _getEffectiveModeFor(agentId) {
552
+ const cached = this._modeCache.get(agentId);
553
+ if (cached) return cached;
554
+ const mode = await effectiveEnforcementMode(this.apiKey, agentId);
555
+ this._modeCache.set(agentId, mode);
556
+ return mode;
493
557
  }
494
558
 
495
559
  /**
@@ -521,16 +585,77 @@ export class AnthropicManagedSource extends Source {
521
585
  }
522
586
 
523
587
  /**
524
- * Enforce a policy decision against a pending action.
588
+ * Enforce a policy decision against a pending action — v1.0.1 F-4.
589
+ *
590
+ * Routes through the right Anthropic event depending on the agent's
591
+ * effective enforcement mode:
592
+ * - sync_confirm (agent has at least one tool with always_ask):
593
+ * 'allow' → user.tool_confirmation { result: allow }
594
+ * 'deny' → user.tool_confirmation { result: deny } (pre-execution block)
595
+ * - sync_interrupt (no always_ask available):
596
+ * 'allow' → no-op (nothing to confirm — the tool already ran or
597
+ * will run without a gate)
598
+ * 'deny' → user.interrupt + optional follow-up message
599
+ * (post-hoc termination)
600
+ *
601
+ * Returns { enforced: boolean, mode: string, native_response?: object }
602
+ * where `mode` describes the path taken so the caller can log it.
525
603
  *
526
- * PR-A scaffold: the actual `user.tool_confirmation` / `user.interrupt`
527
- * HTTP call currently lives in scripts/shield.js, which talks to the
528
- * Anthropic API directly. Migrating that into this method is PR-D — at
529
- * which point this body will POST the decision via the SSE/HTTP control
530
- * channel. For PR-A, the method exists to satisfy the contract;
531
- * Shield does not call it yet.
604
+ * @param {object} action A WMAAction (must carry session_id and id)
605
+ * @param {object} decision { decision: 'allow'|'deny', reason?: string }
532
606
  */
533
- async enforce(action, decision) { // eslint-disable-line no-unused-vars
534
- throw new Error('AnthropicManagedSource.enforce() Shield migration pending PR-D (scripts/shield.js still handles enforcement directly)');
607
+ async enforce(action, decision) {
608
+ if (!action || typeof action !== 'object') {
609
+ throw new Error('enforce(action, decision): action must be a WMAAction object');
610
+ }
611
+ if (!action.session_id) {
612
+ throw new Error('enforce(action, decision): action.session_id is required');
613
+ }
614
+ if (!action.agent_id) {
615
+ throw new Error('enforce(action, decision): action.agent_id is required');
616
+ }
617
+ if (!decision || (decision.decision !== 'allow' && decision.decision !== 'deny')) {
618
+ throw new Error(`enforce(action, decision): decision must be 'allow' or 'deny' (got ${decision?.decision})`);
619
+ }
620
+
621
+ const mode = await this._getEffectiveModeFor(action.agent_id);
622
+ const isToolUse = action.action_type === ACTION_TYPES.TOOL_USE
623
+ || action.action_type === ACTION_TYPES.MCP_TOOL_USE
624
+ || action.action_type === ACTION_TYPES.CUSTOM_TOOL_USE;
625
+
626
+ // Path 1 — pre-execution confirmation when the agent supports it AND
627
+ // the pending action is a tool_use (only kind we can pre-block).
628
+ if (mode === ENFORCEMENT_MODES.SYNC_CONFIRM && isToolUse && action.id) {
629
+ if (decision.decision === 'allow') {
630
+ const res = await confirmAllow({
631
+ apiKey: this.apiKey,
632
+ sessionId: action.session_id,
633
+ toolUseId: action.id,
634
+ });
635
+ return { enforced: true, mode: 'confirm_allow', native_response: res };
636
+ }
637
+ const res = await confirmDeny({
638
+ apiKey: this.apiKey,
639
+ sessionId: action.session_id,
640
+ toolUseId: action.id,
641
+ denyMessage: decision.reason,
642
+ });
643
+ return { enforced: true, mode: 'confirm_deny', native_response: res };
644
+ }
645
+
646
+ // Path 2 — post-hoc interrupt. The only enforcement available when
647
+ // the agent has no always_ask tools, OR for non-tool actions we
648
+ // can't pre-block.
649
+ if (decision.decision === 'deny') {
650
+ const res = await interruptSession({
651
+ apiKey: this.apiKey,
652
+ sessionId: action.session_id,
653
+ followUpMessage: decision.reason,
654
+ });
655
+ return { enforced: true, mode: 'interrupt', native_response: res };
656
+ }
657
+
658
+ // Allow + no pre-gate available = nothing to do at the SDK level.
659
+ return { enforced: false, mode: 'no_op', reason: 'no pre-execution gate available for this action' };
535
660
  }
536
661
  }
@@ -127,6 +127,19 @@ export const PROVIDERS = Object.freeze({
127
127
  // * SUB-AGENT FIELDS (PR-C — see WMAAction.parent_agent_id):
128
128
  // * @property {string|null} parent_agent_id Null for root agents
129
129
  // * @property {string|null} composition_pattern From COMPOSITION_PATTERNS
130
+ // *
131
+ // * MULTI-AGENT DISCRIMINATORS (v1.0.2 F-6a — preserved LOCALLY only,
132
+ // * never sent raw to Fortress; the SignalsAggregator derives the
133
+ // * aggregated session_ids list from them at finalize time):
134
+ // * @property {string|null} session_thread_id The thread the event happened in.
135
+ // * For frameworks where one session can
136
+ // * host multiple threads/sub-agents
137
+ // * (Anthropic Task tool, future similar
138
+ // * designs), this is how the vendor
139
+ // * itself discriminates "parent vs sub".
140
+ // * @property {string|null} agent_name The human-named emitter of this event
141
+ // * (the parent agent OR a sub-agent
142
+ // * running inside the parent's session).
130
143
  // */
131
144
 
132
145
  const REQUIRED_FIELDS = ['id', 'provider', 'agent_id', 'session_id', 'action_type', 'timestamp', 'status'];