watchmyagents 1.1.0 → 1.1.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,12 +1,26 @@
1
1
  # Watch My Agents
2
2
 
3
- **Security observability for AI agents.** A zero-dependency CLI + SDK that captures every action your AI agents take — tool calls, prompts, state transitions, errors, multi-agent comms — into local NDJSON logs. Built for security audits, not just token counting.
3
+ **Real-time security observability AND enforcement for AI agents.** A zero-dependency CLI + SDK that captures every action your AI agents take — tool calls, prompts, state transitions, errors, multi-agent comms — into local NDJSON logs **AND** enforces security policies live, with sub-second propagation from the Fortress control plane to the Shield runtime.
4
4
 
5
- Designed around three guarantees:
5
+ Designed around four guarantees:
6
6
 
7
7
  1. **Local-first.** Raw payloads (prompts, outputs, tool arguments) stay 100% on your machine. Nothing leaves unless you explicitly opt in.
8
- 2. **Trace everything, not just what costs tokens.** A `web_fetch` to a suspicious URL carries zero tokens but is exactly what a security audit needs to see.
9
- 3. **Zero dependencies.** Only Node.js 18+ built-ins. No telemetry, no phone-home, no hidden network calls.
8
+ 2. **Trace everything, not just what costs tokens.** A `web_fetch` to a suspicious URL carries zero tokens but is exactly what a security audit needs to see. Even tool calls that were blocked, denied, or interrupted before producing a result are logged with `status: error` so the audit trail is complete.
9
+ 3. **Real-time enforcement, not post-hoc auditing.** A policy accepted in Fortress UI is active in Shield within ~1 second via SSE + Postgres realtime. A policy violation is blocked in ~3ms via Anthropic's `user.tool_confirmation` / `user.interrupt` events. Measured in production, not promised in roadmap.
10
+ 4. **Zero dependencies.** Only Node.js 18+ built-ins. No telemetry, no phone-home, no hidden network calls. Preserved through every release including the SSE realtime work (custom RFC-compliant SSE parser, no `@supabase/realtime-js` or `ws` dep).
11
+
12
+ ### Measured end-to-end loop latency (v1.1.0+)
13
+
14
+ ```
15
+ Anthropic agent action ────────► Watch capture : ≤ 60s (configurable via --interval)
16
+ Watch capture ────────► Fortress signal upload : ≤ 60s (same cycle)
17
+ Fortress signal ────────► Guardian analysis : ≤ 30s (event-triggered, debounced)
18
+ Guardian proposal ────────► Operator accepts in UI : (human)
19
+ Policy accepted ────────► Shield receives via SSE : ≤ 1s (sub-second push, validated)
20
+ Shield evaluates ────────► Decision (allow/deny) : ≤ 3ms (measured on Anthropic Managed)
21
+ ```
22
+
23
+ Full audit-clean: 3 successful Codex audit passes (v1.0.1, v1.0.2, v1.0.3) closed 7 findings with zero regression. Containment invariant (raw payloads never leave the customer machine) is formalized in `docs/CONTAINMENT.md` and locked by 8 regression tests.
10
24
 
11
25
  ---
12
26
 
package/SECURITY.md CHANGED
@@ -57,6 +57,8 @@ WMA combines **two complementary layers**:
57
57
  - **Blind spots in agent behavior.** Watch captures tool calls, prompts, state transitions, and errors for after-the-fact analysis.
58
58
  - **Token-only observability tools.** WMA captures every action including zero-token ones (`tool_use`, `state_transition`, etc.) that are the most security-relevant.
59
59
  - **Inline policy violations** (Shield). When the agent has `permission_policy: always_ask` configured, Shield blocks tool calls before execution. When not, Shield interrupts the session on first violation (the offending tool already ran, but the agent loop stops).
60
+ - **Stale enforcement after a policy update.** A new policy accepted in the Fortress dashboard is active in Shield within ~1 second via SSE + Postgres realtime (validated in production on v1.1.0). The 60s polling refresh is a fallback for environments where the SSE channel can't be established (firewall, proxy stripping `text/event-stream`).
61
+ - **Lost audit trail for blocked / denied / interrupted tool calls.** Tool calls that started but never produced a result (Shield pre-block, operator denial, mid-execution kill, session termination) are logged as explicit `tool_use` entries with `status: error` and `error: "no_result_observed"` — they cannot disappear silently from the audit. (Fix shipped in v1.1.1 after the Codex P1 finding.)
60
62
  - **Vendor lock-in.** NDJSON is portable; you own the data.
61
63
 
62
64
  ### What WMA does NOT defend against
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "watchmyagents",
3
- "version": "1.1.0",
3
+ "version": "1.1.2",
4
4
  "description": "Security observability + real-time policy enforcement for AI agents. Local-first NDJSON capture with a continuous Watch daemon that auto-uploads anonymized signals, Shield CLI that blocks policy violations live (with policies pulled from Fortress cloud), anonymizer producing signals-only payloads, bidirectional sync with WatchMyAgents Fortress, and one-command install as an always-on launchd/systemd service — closing the recursive Watch→Guardian→Shield security loop.",
5
5
  "type": "module",
6
6
  "files": [
package/scripts/agents.js CHANGED
@@ -23,6 +23,7 @@ import { listAgents } from '../src/sources/anthropic-managed.js';
23
23
  import { classifyAgentType } from '../src/typology.js';
24
24
  import { aggregate, buildFeatures, NON_DERIVABLE } from '../src/typology-features.js';
25
25
  import { isValidAgentId, assertSafePathSegment } from '../src/validate.js';
26
+ import { maybePrintVersionAndExit } from '../src/version.js';
26
27
 
27
28
  function parseArgs(argv) {
28
29
  const out = { _: [] };
@@ -43,6 +44,8 @@ function info(msg) { process.stdout.write(`[wma-agents] ${msg}\n`); }
43
44
  // extraction). The rest of this file is just CLI presentation.
44
45
 
45
46
  async function main() {
47
+ // v1.1.1 F-13: --version / -v short-circuit before any other parsing.
48
+ maybePrintVersionAndExit(process.argv);
46
49
  const args = parseArgs(process.argv.slice(2));
47
50
  if (args._[0] && args._[0] !== 'list') die(`unknown command "${args._[0]}" (only "list" supported)`);
48
51
  const apiKey = args['api-key'] || process.env.ANTHROPIC_API_KEY;
@@ -29,6 +29,7 @@ import { Logger } from '../src/logger.js';
29
29
  import { TokenTracker } from '../src/tokens.js';
30
30
  import { SignalsAggregator } from '../src/anonymizer.js';
31
31
  import { resolveFortressBase, fortressEndpoint } from '../src/fortress/url.js';
32
+ import { cleanLabel } from '../src/labels.js';
32
33
  import { isValidAgentId, isValidSessionId, assertSafePathSegment } from '../src/validate.js';
33
34
  import { classifyAgentType } from '../src/typology.js';
34
35
  import { aggregate, buildFeatures } from '../src/typology-features.js';
@@ -36,6 +37,7 @@ import {
36
37
  getAgent, listAgents, listSessions, fetchSessionEntries, fetchRawEvents,
37
38
  AnthropicManagedSource, effectiveEnforcementMode,
38
39
  } from '../src/sources/anthropic-managed.js';
40
+ import { maybePrintVersionAndExit } from '../src/version.js';
39
41
 
40
42
  function parseArgs(argv) {
41
43
  const out = {};
@@ -73,9 +75,9 @@ function parseSince(s) {
73
75
  function die(msg, code = 1) { process.stderr.write(`${msg}\n`); process.exit(code); }
74
76
  function info(msg) { process.stdout.write(`[wma-fetch] ${msg}\n`); }
75
77
  function warn(msg) { process.stderr.write(`[wma-fetch] ⚠️ ${msg}\n`); }
76
- // Strip control chars + truncate a customer-set agent name before it goes into
77
- // a log line or the Fortress display_name (defense-in-depth vs log/payload injection).
78
- function cleanLabel(s) { return [...String(s ?? '')].filter((c) => c.charCodeAt(0) >= 32 && c.charCodeAt(0) !== 127).join('').slice(0, 60).trim(); }
78
+ // v1.1.1 F-11: cleanLabel moved to src/labels.js so wma-upload-fortress
79
+ // (and any future consumer) shares the exact same sanitization. Defense
80
+ // in depth vs log/payload injection from customer-set agent names.
79
81
 
80
82
  function resolveModel(agent) {
81
83
  const raw = agent.model || agent.config?.model || null;
@@ -83,6 +85,12 @@ function resolveModel(agent) {
83
85
  }
84
86
 
85
87
  // HTTPS POST helper for the --upload signals push (mirrors wma-upload-fortress).
88
+ // v1.1.2 F-17: response body cap for the Fortress ingest-signals POST.
89
+ // The expected reply is a small JSON confirmation ({signal_id, agent_id,
90
+ // registered_new_agent}) — well under 1 MB. Any larger and the endpoint
91
+ // is misconfigured or compromised; abort.
92
+ const MAX_FORTRESS_RESPONSE_BYTES = 1 * 1024 * 1024;
93
+
86
94
  function postJson(url, headers, body) {
87
95
  return new Promise((resolveReq, rejectReq) => {
88
96
  const u = new URL(url);
@@ -95,8 +103,22 @@ function postJson(url, headers, body) {
95
103
  rejectUnauthorized: true,
96
104
  }, (res) => {
97
105
  const chunks = [];
98
- res.on('data', (c) => chunks.push(c));
106
+ let receivedBytes = 0;
107
+ let aborted = false;
108
+ res.on('data', (c) => {
109
+ if (aborted) return;
110
+ receivedBytes += c.length;
111
+ if (receivedBytes > MAX_FORTRESS_RESPONSE_BYTES) {
112
+ aborted = true;
113
+ chunks.length = 0;
114
+ try { req.destroy(); } catch { /* already destroyed */ }
115
+ rejectReq(new Error(`Fortress response exceeded ${MAX_FORTRESS_RESPONSE_BYTES} bytes — aborting`));
116
+ return;
117
+ }
118
+ chunks.push(c);
119
+ });
99
120
  res.on('end', () => {
121
+ if (aborted) return;
100
122
  const raw = Buffer.concat(chunks).toString('utf8');
101
123
  let parsed = null; try { parsed = JSON.parse(raw); } catch { /* keep raw */ }
102
124
  resolveReq({ status: res.statusCode || 0, body: parsed ?? raw });
@@ -261,7 +283,7 @@ const sleep = (ms, signal) => new Promise((res) => {
261
283
  });
262
284
 
263
285
  // ── ONE-SHOT ──────────────────────────────────────────────────────────────
264
- async function fetchOneShot({ apiKey, agentId, model, logDir, since, sessionId, dumpRaw }) {
286
+ async function fetchOneShot({ apiKey, agentId, model, logDir, since, sessionId, dumpRaw, forceDuplicates = false }) {
265
287
  let sessions;
266
288
  if (sessionId) {
267
289
  sessions = [{ id: sessionId, created_at: new Date().toISOString() }];
@@ -272,7 +294,18 @@ async function fetchOneShot({ apiKey, agentId, model, logDir, since, sessionId,
272
294
  if (sessions.length === 0) { info('no sessions to fetch'); return; }
273
295
  info(`${sessions.length} session(s) to fetch`);
274
296
 
297
+ // v1.1.1 F-10 (P2 Codex audit): preload the entry ids already on disk for
298
+ // this agent so re-running the one-shot doesn't duplicate events. The
299
+ // watch daemon does this already; the one-shot was the missing piece.
300
+ // Operators who explicitly want the legacy duplicate-on-rerun behavior
301
+ // can opt back in with --force-duplicates.
302
+ const seenIds = forceDuplicates ? new Set() : await preloadSeenIds(logDir, agentId);
303
+ if (!forceDuplicates && seenIds.size > 0) {
304
+ info(`preloaded ${seenIds.size} known event id(s) for dedup`);
305
+ }
306
+
275
307
  let totalEntries = 0;
308
+ let totalSkipped = 0;
276
309
  for (const s of sessions) {
277
310
  const sid = s.id;
278
311
  process.stdout.write(`\n[wma-fetch] session ${sid}\n`);
@@ -288,23 +321,27 @@ async function fetchOneShot({ apiKey, agentId, model, logDir, since, sessionId,
288
321
  const logger = new Logger({ logDir, agentId, sessionId: sid, silent: true });
289
322
  const tracker = new TokenTracker();
290
323
  let count = 0;
324
+ let skipped = 0;
291
325
  for await (const entry of fetchSessionEntries({ apiKey, agentId, sessionId: sid, model })) {
326
+ if (entry.id && seenIds.has(entry.id)) { skipped++; continue; }
292
327
  const written = await logger.write(entry);
328
+ if (entry.id) seenIds.add(entry.id);
293
329
  tracker.record(written);
294
330
  count++;
295
331
  }
332
+ totalSkipped += skipped;
296
333
  const stats = tracker.stats().total;
297
334
  await logger.write({
298
335
  action_type: 'session_end', provider: 'anthropic-managed', status: 'ok', model,
299
336
  session_tokens: { input: stats.input, output: stats.output, cache_read: stats.cache_read, cache_creation: stats.cache_creation, total: stats.sum },
300
337
  session_cost_usd: stats.cost_usd || null,
301
338
  });
302
- process.stdout.write(` entries : ${count} (+1 session_end)\n`);
339
+ process.stdout.write(` entries : ${count} (+1 session_end)${skipped ? ` · ${skipped} skipped (dedup)` : ''}\n`);
303
340
  process.stdout.write(` tokens : in=${stats.input} out=${stats.output} cache_r=${stats.cache_read} cache_w=${stats.cache_creation}\n`);
304
341
  process.stdout.write(` written to : ${logger._pathForToday()}\n`);
305
342
  totalEntries += count + 1;
306
343
  }
307
- process.stdout.write(`\n[wma-fetch] done — ${totalEntries} total entries across ${sessions.length} session(s)\n`);
344
+ process.stdout.write(`\n[wma-fetch] done — ${totalEntries} total entries across ${sessions.length} session(s)${totalSkipped ? `, ${totalSkipped} skipped (dedup)` : ''}\n`);
308
345
  process.stdout.write(`[wma-fetch] inspect with: npx wma-inspect ${logDir}\n`);
309
346
  }
310
347
 
@@ -441,6 +478,8 @@ async function runWatch({ apiKey, resolveAgents, fleet, logDir, intervalMs, wind
441
478
  }
442
479
 
443
480
  async function main() {
481
+ // v1.1.1 F-13: --version / -v short-circuit before any other parsing.
482
+ maybePrintVersionAndExit(process.argv);
444
483
  const args = parseArgs(process.argv.slice(2));
445
484
  const apiKey = args['api-key'] || process.env.ANTHROPIC_API_KEY;
446
485
  const agentId = args['agent-id'];
@@ -545,7 +584,7 @@ async function main() {
545
584
  info(`resolving agent ${agentId}…`);
546
585
  const agent = await getAgent(apiKey, agentId).catch((e) => die(`failed to GET agent: ${e.message}`));
547
586
  const since = args.since ? parseSince(args.since) : null;
548
- await fetchOneShot({ apiKey, agentId, model: resolveModel(agent), logDir, since, sessionId: args['session-id'], dumpRaw: !!args['dump-raw'] });
587
+ await fetchOneShot({ apiKey, agentId, model: resolveModel(agent), logDir, since, sessionId: args['session-id'], dumpRaw: !!args['dump-raw'], forceDuplicates: !!args['force-duplicates'] });
549
588
  }
550
589
  }
551
590
 
@@ -13,6 +13,7 @@ import { createReadStream } from 'node:fs';
13
13
  import { createInterface } from 'node:readline';
14
14
  import { join, resolve } from 'node:path';
15
15
  import { TokenTracker } from '../src/tokens.js';
16
+ import { maybePrintVersionAndExit } from '../src/version.js';
16
17
 
17
18
  // Streaming line-by-line reader — bounds memory usage on large NDJSON files
18
19
  // (a long-running agent can produce hundreds of MB per day).
@@ -66,6 +67,8 @@ function extractDestination(input) {
66
67
  }
67
68
 
68
69
  async function main() {
70
+ // v1.1.1 F-13: --version / -v short-circuit before any other parsing.
71
+ maybePrintVersionAndExit(process.argv);
69
72
  const files = await collectFiles(target);
70
73
  if (files.length === 0) {
71
74
  process.stderr.write(`No .ndjson files found under ${target}\n`); process.exit(1);
@@ -25,6 +25,7 @@ import { join } from 'node:path';
25
25
  import { fileURLToPath } from 'node:url';
26
26
  import { execFileSync } from 'node:child_process';
27
27
  import { isValidAgentId } from '../src/validate.js';
28
+ import { maybePrintVersionAndExit } from '../src/version.js';
28
29
 
29
30
  const HOME = os.homedir();
30
31
  const PLATFORM = process.platform; // 'darwin' | 'linux' | …
@@ -338,6 +339,8 @@ The service starts at login and restarts on crash. Raw logs stay local.
338
339
  }
339
340
 
340
341
  function main() {
342
+ // v1.1.1 F-13: --version / -v short-circuit before any other parsing.
343
+ maybePrintVersionAndExit(process.argv);
341
344
  const args = parseArgs(process.argv.slice(2));
342
345
  const cmd = args._[0];
343
346
  switch (cmd) {
package/scripts/shield.js CHANGED
@@ -32,6 +32,7 @@ import {
32
32
  confirmAllow, confirmDeny, interruptSession,
33
33
  getAgentConfig, detectAlwaysAsk,
34
34
  } from '../src/shield/enforce.js';
35
+ import { maybePrintVersionAndExit } from '../src/version.js';
35
36
  import { DecisionLogger } from '../src/shield/decisions.js';
36
37
  import { listSessions, listAgents } from '../src/sources/anthropic-managed.js';
37
38
  import { FortressPolicySource, postDecision } from '../src/shield/sources/fortress.js';
@@ -405,6 +406,8 @@ async function runAgentWide(ctx) {
405
406
  // Main
406
407
  // ────────────────────────────────────────────────────────────────────────
407
408
  async function main() {
409
+ // v1.1.1 F-13: --version / -v short-circuit before any other parsing.
410
+ maybePrintVersionAndExit(process.argv);
408
411
  const args = parseArgs(process.argv.slice(2));
409
412
  const apiKey = args['api-key'] || process.env.ANTHROPIC_API_KEY;
410
413
  const agentId = args['agent-id'];
@@ -25,6 +25,7 @@ import { resolve, join } from 'node:path';
25
25
  import { SignalsAggregator } from '../src/anonymizer.js';
26
26
  import { createReadStream } from 'node:fs';
27
27
  import { createInterface } from 'node:readline';
28
+ import { maybePrintVersionAndExit } from '../src/version.js';
28
29
 
29
30
  function parseArgs(argv) {
30
31
  const out = {};
@@ -59,6 +60,8 @@ async function collectFiles(p) {
59
60
  }
60
61
 
61
62
  async function main() {
63
+ // v1.1.1 F-13: --version / -v short-circuit before any other parsing.
64
+ maybePrintVersionAndExit(process.argv);
62
65
  const args = parseArgs(process.argv.slice(2));
63
66
 
64
67
  if (!args._target) {
@@ -31,6 +31,8 @@ import { createInterface } from 'node:readline';
31
31
  import { SignalsAggregator } from '../src/anonymizer.js';
32
32
  import { resolveFortressBase, fortressEndpoint } from '../src/fortress/url.js';
33
33
  import { AnthropicManagedSource } from '../src/sources/anthropic-managed.js';
34
+ import { cleanLabel } from '../src/labels.js';
35
+ import { maybePrintVersionAndExit } from '../src/version.js';
34
36
 
35
37
  function parseArgs(argv) {
36
38
  const out = {};
@@ -61,6 +63,11 @@ async function collectFiles(p) {
61
63
  return out;
62
64
  }
63
65
 
66
+ // v1.1.2 F-17: Fortress ingest-signals response is a small confirmation
67
+ // JSON. Cap at 1 MB and abort if the endpoint streams more — defensive
68
+ // against a compromised or misconfigured response.
69
+ const MAX_FORTRESS_RESPONSE_BYTES = 1 * 1024 * 1024;
70
+
64
71
  function postJson(url, headers, body) {
65
72
  return new Promise((resolveReq, rejectReq) => {
66
73
  const u = new URL(url);
@@ -83,8 +90,22 @@ function postJson(url, headers, body) {
83
90
  },
84
91
  (res) => {
85
92
  const chunks = [];
86
- res.on('data', (c) => chunks.push(c));
93
+ let receivedBytes = 0;
94
+ let aborted = false;
95
+ res.on('data', (c) => {
96
+ if (aborted) return;
97
+ receivedBytes += c.length;
98
+ if (receivedBytes > MAX_FORTRESS_RESPONSE_BYTES) {
99
+ aborted = true;
100
+ chunks.length = 0;
101
+ try { req.destroy(); } catch { /* already destroyed */ }
102
+ rejectReq(new Error(`Fortress response exceeded ${MAX_FORTRESS_RESPONSE_BYTES} bytes — aborting`));
103
+ return;
104
+ }
105
+ chunks.push(c);
106
+ });
87
107
  res.on('end', () => {
108
+ if (aborted) return;
88
109
  const raw = Buffer.concat(chunks).toString('utf8');
89
110
  let parsed = null;
90
111
  try { parsed = JSON.parse(raw); } catch { /* keep raw */ }
@@ -99,13 +120,18 @@ function postJson(url, headers, body) {
99
120
  }
100
121
 
101
122
  async function main() {
123
+ // v1.1.1 F-13: --version / -v short-circuit before any other parsing.
124
+ maybePrintVersionAndExit(process.argv);
102
125
  const args = parseArgs(process.argv.slice(2));
103
126
 
104
127
  const agentId = args['agent-id'];
105
128
  const logDir = resolve(args['log-dir'] || './watchmyagents-logs');
106
129
  const apiKey = args['api-key'] || process.env.WMA_API_KEY;
107
130
  const salt = args.salt || process.env.WMA_SIGNALS_SALT;
108
- const displayName = args['display-name'] || agentId;
131
+ // v1.1.1 F-11: sanitize the customer-supplied display name with the
132
+ // same cleanLabel used by the Watch daemon (defense-in-depth vs log
133
+ // injection / Fortress payload injection via control bytes).
134
+ const displayName = cleanLabel(args['display-name'] || agentId) || agentId;
109
135
  const dryRun = !!args['dry-run'];
110
136
 
111
137
  // Resolve Fortress base URL. Accepts:
package/src/labels.js ADDED
@@ -0,0 +1,39 @@
1
+ // ────────────────────────────────────────────────────────────────────────
2
+ // labels — shared sanitization for human-facing identifiers
3
+ // ────────────────────────────────────────────────────────────────────────
4
+ //
5
+ // Customer-set strings (agent display names, workspace labels, etc.) end
6
+ // up in:
7
+ // - log lines (stdout/stderr of the Watch + Shield daemons)
8
+ // - the Fortress ingest-signals payload (`display_name` field)
9
+ // - eventually rendered in the Fortress dashboard
10
+ //
11
+ // We don't trust them. A name carrying:
12
+ // - control bytes (0x00-0x1F, 0x7F) can poison terminal output (ANSI
13
+ // escape sequences) or break NDJSON parsing
14
+ // - excessive length can bloat payloads and break UI columns
15
+ //
16
+ // `cleanLabel()` is the single, shared sanitizer. Both wma-fetch (the
17
+ // daemon) and wma-upload-fortress (the one-shot uploader) MUST run
18
+ // every customer-supplied label through it before logging or shipping.
19
+ // Extracted to its own module in v1.1.1 (F-11 Codex audit fix) so a
20
+ // future change benefits both consumers automatically.
21
+
22
+ const MAX_LABEL_CHARS = 60;
23
+
24
+ /**
25
+ * Strip control bytes (< 0x20 and 0x7F DEL) and truncate to MAX_LABEL_CHARS
26
+ * characters. Returns the empty string for null/undefined input.
27
+ *
28
+ * Uses [...str] to iterate by code point so surrogate pairs aren't split.
29
+ */
30
+ export function cleanLabel(s) {
31
+ return [...String(s ?? '')]
32
+ .filter((c) => {
33
+ const code = c.charCodeAt(0);
34
+ return code >= 32 && code !== 127;
35
+ })
36
+ .join('')
37
+ .slice(0, MAX_LABEL_CHARS)
38
+ .trim();
39
+ }
@@ -37,6 +37,13 @@ const RECONNECT_MIN_MS = 1_000;
37
37
  const RECONNECT_MAX_MS = 60_000;
38
38
  const FALLBACK_RETRY_INTERVAL_MS = 5 * 60_000;
39
39
  const PERMANENT_FAILURE_LOG_INTERVAL_MS = 5 * 60_000;
40
+ // v1.1.1 F-9 (P2 Codex audit): hard cap on a single SSE event's buffer.
41
+ // A buggy or compromised Fortress endpoint could stream bytes forever
42
+ // without emitting the "\n\n" event separator, growing Shield's memory.
43
+ // 1 MB is far above any legitimate `policy_changed` payload (the data
44
+ // field carries {rule_id, action, ts, kind} = maybe 200 bytes) so we
45
+ // abort the connection and reconnect on overflow.
46
+ const MAX_SSE_EVENT_BYTES = 1 * 1024 * 1024;
40
47
 
41
48
  export class PolicyStream extends EventEmitter {
42
49
  constructor({ url, apiKey, anthropicAgentId, onError, onInfo }) {
@@ -147,6 +154,17 @@ export class PolicyStream extends EventEmitter {
147
154
  let buffer = '';
148
155
  res.on('data', (chunk) => {
149
156
  buffer += chunk;
157
+ // v1.1.1 F-9: cap on a single SSE event buffer. A buggy/compromised
158
+ // endpoint that never emits "\n\n" would otherwise OOM Shield.
159
+ // Abort + reconnect on overflow; the buffer is dropped so we
160
+ // restart fresh on the new connection.
161
+ if (buffer.length > MAX_SSE_EVENT_BYTES) {
162
+ this.onError(new Error(`policy-stream: SSE event exceeded ${MAX_SSE_EVENT_BYTES} bytes — aborting connection and reconnecting`));
163
+ buffer = '';
164
+ try { res.destroy(); } catch { /* already destroyed */ }
165
+ if (!this._closed) this._scheduleReconnect();
166
+ return;
167
+ }
150
168
  // SSE events are separated by a blank line ("\n\n").
151
169
  let eolIdx;
152
170
  while ((eolIdx = buffer.indexOf('\n\n')) !== -1) {
@@ -32,13 +32,27 @@ export async function loadPolicies(path) {
32
32
  throw new Error(`policy file ${path} has no "policies" array`);
33
33
  }
34
34
  // Pre-compile regex for performance + early failure on bad patterns.
35
+ const VALID_ACTIONS = ['allow', 'deny', 'interrupt'];
35
36
  for (const p of data.policies) {
36
37
  compileMatchRegexes(p.match || {});
37
- if (!['allow', 'deny', 'interrupt'].includes(p.action)) {
38
+ if (!VALID_ACTIONS.includes(p.action)) {
38
39
  throw new Error(`policy ${p.id || p.name}: unsupported action "${p.action}"`);
39
40
  }
40
41
  }
42
+ // v1.1.2 F-14 (P2 Codex audit): validate the ruleset's default.action
43
+ // against the SAME canonical set as per-policy actions. Before this fix
44
+ // a typo like `default: { action: "drop" }` was accepted silently — at
45
+ // evaluation time evaluate() returned `decision: "drop"`, which the
46
+ // interrupt-mode runtime treated as a no-op (only deny/interrupt trigger
47
+ // termination) and the tool_confirmation-mode runtime left dangling
48
+ // (no allow/deny event sent). Either way the agent ran without
49
+ // enforcement, exactly opposite of the operator's intent.
41
50
  data.default = data.default || { action: 'allow' };
51
+ if (!VALID_ACTIONS.includes(data.default.action)) {
52
+ throw new Error(
53
+ `policy file ${path} default.action "${data.default.action}" is invalid — must be one of: ${VALID_ACTIONS.join(', ')}`,
54
+ );
55
+ }
42
56
  return data;
43
57
  }
44
58
 
@@ -14,6 +14,15 @@ import { URL } from 'node:url';
14
14
  import { fortressEndpoint } from '../../fortress/url.js';
15
15
 
16
16
  const DEFAULT_TIMEOUT_MS = 15_000;
17
+ // v1.1.2 F-17 (P3 Codex audit): cap on the total bytes we'll accumulate
18
+ // for a Fortress JSON response before aborting the request. A misconfigured
19
+ // or compromised endpoint streaming an unbounded body would otherwise
20
+ // exhaust Shield's memory, despite the HTTPS-only + timeout guards.
21
+ // 8 MB is far above the realistic ceiling for a customer's policy ruleset
22
+ // (hundreds of policies × ~1 KB each → ~hundreds of KB). On overflow we
23
+ // destroy the request, which propagates to onError + cached-ruleset
24
+ // fallback.
25
+ const MAX_RESPONSE_BYTES = 8 * 1024 * 1024;
17
26
 
18
27
  function httpsJson(method, url, headers, body, timeoutMs = DEFAULT_TIMEOUT_MS) {
19
28
  return new Promise((resolveReq, rejectReq) => {
@@ -35,8 +44,23 @@ function httpsJson(method, url, headers, body, timeoutMs = DEFAULT_TIMEOUT_MS) {
35
44
  };
36
45
  const req = httpsRequest(opts, (res) => {
37
46
  const chunks = [];
38
- res.on('data', (c) => chunks.push(c));
47
+ let receivedBytes = 0;
48
+ let aborted = false;
49
+ res.on('data', (c) => {
50
+ if (aborted) return;
51
+ receivedBytes += c.length;
52
+ if (receivedBytes > MAX_RESPONSE_BYTES) {
53
+ aborted = true;
54
+ // Free anything we already buffered, then tear down the request.
55
+ chunks.length = 0;
56
+ try { req.destroy(); } catch { /* already destroyed */ }
57
+ rejectReq(new Error(`Fortress response exceeded ${MAX_RESPONSE_BYTES} bytes — aborting (received ${receivedBytes} so far)`));
58
+ return;
59
+ }
60
+ chunks.push(c);
61
+ });
39
62
  res.on('end', () => {
63
+ if (aborted) return;
40
64
  const raw = Buffer.concat(chunks).toString('utf8');
41
65
  let parsed = null;
42
66
  try { parsed = raw ? JSON.parse(raw) : null; } catch { /* keep raw */ }
@@ -179,6 +203,17 @@ export class FortressPolicySource {
179
203
  this.onError(new Error(`skipping invalid Fortress policy "${p?.rule_id || p?.name || '?'}": ${e.message}`));
180
204
  }
181
205
  }
206
+ // v1.1.2 F-15 (P2 Codex audit): the policy evaluator is "first match
207
+ // wins" (src/shield/policy.js evaluate()), so policy order matters.
208
+ // Fortress validates `priority` server-side, but the API does not
209
+ // contractually guarantee that the returned array is sorted by
210
+ // priority. If a wide "allow" rule sat before a higher-priority
211
+ // "deny" rule in the response, the deny would never fire. Sort
212
+ // client-side by descending priority (higher priority first) before
213
+ // assigning to ruleset. Policies without `priority` (or with equal
214
+ // priorities) keep their relative order via the stable sort
215
+ // guarantee in V8 — predictable behavior.
216
+ compiled.sort((a, b) => (b.priority ?? 0) - (a.priority ?? 0));
182
217
  this.ruleset = {
183
218
  version: 1,
184
219
  policies: compiled,
@@ -9,6 +9,15 @@
9
9
  const API_BASE = 'https://api.anthropic.com';
10
10
  const BETA = 'managed-agents-2026-04-01';
11
11
  const VERSION = '2023-06-01';
12
+ // v1.1.2 F-16 (P2 Codex audit): hard cap on a single SSE frame buffer.
13
+ // A buggy upstream proxy that strips event separators OR a compromised
14
+ // Anthropic-style endpoint streaming bytes forever without "\n\n" would
15
+ // otherwise OOM Shield's host. 1 MB is far above any real Anthropic
16
+ // event payload (the heaviest events are agent.thinking + agent.message
17
+ // which carry at most a few hundred KB of text). On overflow we throw,
18
+ // which propagates through the generator and triggers the caller's
19
+ // reconnect logic — same outcome as a network error.
20
+ const MAX_SSE_FRAME_BYTES = 1 * 1024 * 1024;
12
21
 
13
22
  function authHeaders(apiKey) {
14
23
  return {
@@ -43,6 +52,14 @@ export async function* openEventStream({ apiKey, sessionId, signal }) {
43
52
  if (done) break;
44
53
  buffer += decoder.decode(value, { stream: true });
45
54
 
55
+ // v1.1.2 F-16: guard against an upstream that never emits "\n\n" —
56
+ // throw to abort the stream cleanly, the caller's reconnect logic
57
+ // will pick up. Drop the buffer to free memory before throwing.
58
+ if (buffer.length > MAX_SSE_FRAME_BYTES) {
59
+ buffer = '';
60
+ throw new Error(`SSE frame exceeded ${MAX_SSE_FRAME_BYTES} bytes — aborting stream (caller should reconnect)`);
61
+ }
62
+
46
63
  // SSE frames are separated by a blank line ("\n\n"). Each frame may
47
64
  // contain multiple lines; we only care about `data:` lines for now.
48
65
  let nlIdx;
@@ -29,6 +29,12 @@ const VERSION = '2023-06-01';
29
29
  // Hard cap on any single GET so a hung connection can't pin Watch/Shield
30
30
  // forever. getWithRetry will retry on timeout (the error propagates here).
31
31
  const REQUEST_TIMEOUT_MS = 30_000;
32
+ // v1.1.2 F-17 (P3 Codex audit): cap on a single Anthropic response body.
33
+ // Event history pages (/v1/sessions/{id}/events) can carry up to ~1000
34
+ // events × thousands of bytes each, so 16 MB is the headroom we leave
35
+ // before we conclude something is wrong. Above this we abort the
36
+ // request and getWithRetry will retry on the next attempt.
37
+ const MAX_ANTHROPIC_RESPONSE_BYTES = 16 * 1024 * 1024;
32
38
 
33
39
  function httpGet(apiKey, path) {
34
40
  return new Promise((resolve, reject) => {
@@ -43,8 +49,22 @@ function httpGet(apiKey, path) {
43
49
  },
44
50
  }, res => {
45
51
  const chunks = [];
46
- res.on('data', c => chunks.push(c));
52
+ let receivedBytes = 0;
53
+ let aborted = false;
54
+ res.on('data', c => {
55
+ if (aborted) return;
56
+ receivedBytes += c.length;
57
+ if (receivedBytes > MAX_ANTHROPIC_RESPONSE_BYTES) {
58
+ aborted = true;
59
+ chunks.length = 0;
60
+ try { req.destroy(); } catch { /* already destroyed */ }
61
+ reject(new Error(`Anthropic response exceeded ${MAX_ANTHROPIC_RESPONSE_BYTES} bytes — aborting (${path})`));
62
+ return;
63
+ }
64
+ chunks.push(c);
65
+ });
47
66
  res.on('end', () => {
67
+ if (aborted) return;
48
68
  const body = Buffer.concat(chunks).toString('utf8');
49
69
  if (res.statusCode >= 200 && res.statusCode < 300) {
50
70
  try { resolve(JSON.parse(body)); } catch (e) { reject(e); }
@@ -326,6 +346,11 @@ export async function* fetchSessionEntries({ apiKey, agentId, sessionId, model }
326
346
  isMcp: type === 'agent.mcp_tool_use',
327
347
  input: ev.input ?? null,
328
348
  mcpServer: ev.server_name ?? ev.mcp_server_name ?? null,
349
+ // v1.1.1 F-8: capture sub-agent context at storage time so the
350
+ // end-of-session flush yields entries with the right attribution.
351
+ startTimestamp: ts,
352
+ session_thread_id,
353
+ agent_name,
329
354
  });
330
355
  continue;
331
356
  }
@@ -483,6 +508,37 @@ export async function* fetchSessionEntries({ apiKey, agentId, sessionId, model }
483
508
  continue;
484
509
  }
485
510
  }
511
+
512
+ // v1.1.1 F-8 (P1 Codex audit): flush remaining pendingToolUse entries
513
+ // as explicit "no_result_observed" tool_use events. These are tool
514
+ // calls that started (we saw agent.tool_use) but never produced a
515
+ // result (no agent.tool_result paired): most commonly because Shield
516
+ // pre-blocked them, the operator denied via tool_confirmation, the
517
+ // tool died mid-execution, or the session terminated before the
518
+ // result event arrived. For a security audit product, these incomplete
519
+ // calls are often the MOST useful signals — a blocked exfil attempt
520
+ // shows up here, not in successful tool_results. Yielding them
521
+ // explicitly with status='error' keeps the local NDJSON, anonymizer
522
+ // signals (counts, IoC hashes, tool_counts), and Fortress decisions
523
+ // honest about what actually happened.
524
+ for (const [toolUseId, pending] of pendingToolUse) {
525
+ yield {
526
+ ...base,
527
+ session_thread_id: pending.session_thread_id,
528
+ agent_name: pending.agent_name,
529
+ id: toolUseId,
530
+ action_type: pending.isMcp ? 'mcp_tool_use' : 'tool_use',
531
+ tool_name: pending.name,
532
+ model: model || null,
533
+ timestamp: pending.startTimestamp,
534
+ duration_ms: null,
535
+ status: 'error',
536
+ error: 'no_result_observed',
537
+ input: pending.input,
538
+ output: { mcp_server: pending.mcpServer ?? undefined },
539
+ };
540
+ }
541
+ pendingToolUse.clear();
486
542
  }
487
543
 
488
544
  // ────────────────────────────────────────────────────────────────────────
package/src/version.js ADDED
@@ -0,0 +1,52 @@
1
+ // ────────────────────────────────────────────────────────────────────────
2
+ // version — shared --version flag handler for the wma-* CLI binaries
3
+ // ────────────────────────────────────────────────────────────────────────
4
+ //
5
+ // v1.1.1 F-13: every CLI binary (wma-fetch, wma-shield, wma-signals,
6
+ // wma-upload-fortress, wma-inspect, wma-agents, wma-service) gets a
7
+ // --version / -v flag that prints the installed version and exits.
8
+ // Operators previously had to grep package.json under npm root to know
9
+ // what was deployed; this is now a one-liner.
10
+ //
11
+ // We resolve the version from the package.json next to the SDK source
12
+ // (../package.json relative to this file) so it stays in sync with the
13
+ // release that's actually executing.
14
+
15
+ import { readFileSync } from 'node:fs';
16
+ import { dirname, join } from 'node:path';
17
+ import { fileURLToPath } from 'node:url';
18
+
19
+ const HERE = dirname(fileURLToPath(import.meta.url));
20
+ const PKG_PATH = join(HERE, '..', 'package.json');
21
+
22
+ let cachedVersion = null;
23
+
24
+ /** Returns the installed watchmyagents version, parsed from package.json. */
25
+ export function getVersion() {
26
+ if (cachedVersion) return cachedVersion;
27
+ try {
28
+ const pkg = JSON.parse(readFileSync(PKG_PATH, 'utf8'));
29
+ cachedVersion = pkg.version || 'unknown';
30
+ } catch {
31
+ cachedVersion = 'unknown';
32
+ }
33
+ return cachedVersion;
34
+ }
35
+
36
+ /**
37
+ * If argv contains --version or -v, print the version and exit(0).
38
+ * Call this BEFORE any other parsing so it short-circuits on bad input
39
+ * (e.g., the user types `wma-fetch --version` with no env vars set).
40
+ *
41
+ * Usage at the top of every wma-* script:
42
+ * import { maybePrintVersionAndExit } from '../src/version.js';
43
+ * maybePrintVersionAndExit(process.argv);
44
+ */
45
+ export function maybePrintVersionAndExit(argv) {
46
+ for (const a of argv) {
47
+ if (a === '--version' || a === '-v') {
48
+ process.stdout.write(`watchmyagents ${getVersion()}\n`);
49
+ process.exit(0);
50
+ }
51
+ }
52
+ }