watchmyagents 1.1.0 → 1.1.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +18 -4
- package/SECURITY.md +2 -0
- package/package.json +1 -1
- package/scripts/agents.js +3 -0
- package/scripts/fetch-anthropic.js +26 -7
- package/scripts/inspect.js +3 -0
- package/scripts/service.js +3 -0
- package/scripts/shield.js +3 -0
- package/scripts/signals.js +3 -0
- package/scripts/upload-fortress.js +8 -1
- package/src/labels.js +39 -0
- package/src/shield/policy-stream.js +18 -0
- package/src/sources/anthropic-managed.js +36 -0
- package/src/version.js +52 -0
package/README.md
CHANGED
|
@@ -1,12 +1,26 @@
|
|
|
1
1
|
# Watch My Agents
|
|
2
2
|
|
|
3
|
-
**
|
|
3
|
+
**Real-time security observability AND enforcement for AI agents.** A zero-dependency CLI + SDK that captures every action your AI agents take — tool calls, prompts, state transitions, errors, multi-agent comms — into local NDJSON logs **AND** enforces security policies live, with sub-second propagation from the Fortress control plane to the Shield runtime.
|
|
4
4
|
|
|
5
|
-
Designed around
|
|
5
|
+
Designed around four guarantees:
|
|
6
6
|
|
|
7
7
|
1. **Local-first.** Raw payloads (prompts, outputs, tool arguments) stay 100% on your machine. Nothing leaves unless you explicitly opt in.
|
|
8
|
-
2. **Trace everything, not just what costs tokens.** A `web_fetch` to a suspicious URL carries zero tokens but is exactly what a security audit needs to see.
|
|
9
|
-
3. **
|
|
8
|
+
2. **Trace everything, not just what costs tokens.** A `web_fetch` to a suspicious URL carries zero tokens but is exactly what a security audit needs to see. Even tool calls that were blocked, denied, or interrupted before producing a result are logged with `status: error` so the audit trail is complete.
|
|
9
|
+
3. **Real-time enforcement, not post-hoc auditing.** A policy accepted in Fortress UI is active in Shield within ~1 second via SSE + Postgres realtime. A policy violation is blocked in ~3ms via Anthropic's `user.tool_confirmation` / `user.interrupt` events. Measured in production, not promised in roadmap.
|
|
10
|
+
4. **Zero dependencies.** Only Node.js 18+ built-ins. No telemetry, no phone-home, no hidden network calls. Preserved through every release including the SSE realtime work (custom RFC-compliant SSE parser, no `@supabase/realtime-js` or `ws` dep).
|
|
11
|
+
|
|
12
|
+
### Measured end-to-end loop latency (v1.1.0+)
|
|
13
|
+
|
|
14
|
+
```
|
|
15
|
+
Anthropic agent action ────────► Watch capture : ≤ 60s (configurable via --interval)
|
|
16
|
+
Watch capture ────────► Fortress signal upload : ≤ 60s (same cycle)
|
|
17
|
+
Fortress signal ────────► Guardian analysis : ≤ 30s (event-triggered, debounced)
|
|
18
|
+
Guardian proposal ────────► Operator accepts in UI : (human)
|
|
19
|
+
Policy accepted ────────► Shield receives via SSE : ≤ 1s (sub-second push, validated)
|
|
20
|
+
Shield evaluates ────────► Decision (allow/deny) : ≤ 3ms (measured on Anthropic Managed)
|
|
21
|
+
```
|
|
22
|
+
|
|
23
|
+
Full audit-clean: 3 successful Codex audit passes (v1.0.1, v1.0.2, v1.0.3) closed 7 findings with zero regression. Containment invariant (raw payloads never leave the customer machine) is formalized in `docs/CONTAINMENT.md` and locked by 8 regression tests.
|
|
10
24
|
|
|
11
25
|
---
|
|
12
26
|
|
package/SECURITY.md
CHANGED
|
@@ -57,6 +57,8 @@ WMA combines **two complementary layers**:
|
|
|
57
57
|
- **Blind spots in agent behavior.** Watch captures tool calls, prompts, state transitions, and errors for after-the-fact analysis.
|
|
58
58
|
- **Token-only observability tools.** WMA captures every action including zero-token ones (`tool_use`, `state_transition`, etc.) that are the most security-relevant.
|
|
59
59
|
- **Inline policy violations** (Shield). When the agent has `permission_policy: always_ask` configured, Shield blocks tool calls before execution. When not, Shield interrupts the session on first violation (the offending tool already ran, but the agent loop stops).
|
|
60
|
+
- **Stale enforcement after a policy update.** A new policy accepted in the Fortress dashboard is active in Shield within ~1 second via SSE + Postgres realtime (validated in production on v1.1.0). The 60s polling refresh is a fallback for environments where the SSE channel can't be established (firewall, proxy stripping `text/event-stream`).
|
|
61
|
+
- **Lost audit trail for blocked / denied / interrupted tool calls.** Tool calls that started but never produced a result (Shield pre-block, operator denial, mid-execution kill, session termination) are logged as explicit `tool_use` entries with `status: error` and `error: "no_result_observed"` — they cannot disappear silently from the audit. (Fix shipped in v1.1.1 after the Codex P1 finding.)
|
|
60
62
|
- **Vendor lock-in.** NDJSON is portable; you own the data.
|
|
61
63
|
|
|
62
64
|
### What WMA does NOT defend against
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "watchmyagents",
|
|
3
|
-
"version": "1.1.
|
|
3
|
+
"version": "1.1.1",
|
|
4
4
|
"description": "Security observability + real-time policy enforcement for AI agents. Local-first NDJSON capture with a continuous Watch daemon that auto-uploads anonymized signals, Shield CLI that blocks policy violations live (with policies pulled from Fortress cloud), anonymizer producing signals-only payloads, bidirectional sync with WatchMyAgents Fortress, and one-command install as an always-on launchd/systemd service — closing the recursive Watch→Guardian→Shield security loop.",
|
|
5
5
|
"type": "module",
|
|
6
6
|
"files": [
|
package/scripts/agents.js
CHANGED
|
@@ -23,6 +23,7 @@ import { listAgents } from '../src/sources/anthropic-managed.js';
|
|
|
23
23
|
import { classifyAgentType } from '../src/typology.js';
|
|
24
24
|
import { aggregate, buildFeatures, NON_DERIVABLE } from '../src/typology-features.js';
|
|
25
25
|
import { isValidAgentId, assertSafePathSegment } from '../src/validate.js';
|
|
26
|
+
import { maybePrintVersionAndExit } from '../src/version.js';
|
|
26
27
|
|
|
27
28
|
function parseArgs(argv) {
|
|
28
29
|
const out = { _: [] };
|
|
@@ -43,6 +44,8 @@ function info(msg) { process.stdout.write(`[wma-agents] ${msg}\n`); }
|
|
|
43
44
|
// extraction). The rest of this file is just CLI presentation.
|
|
44
45
|
|
|
45
46
|
async function main() {
|
|
47
|
+
// v1.1.1 F-13: --version / -v short-circuit before any other parsing.
|
|
48
|
+
maybePrintVersionAndExit(process.argv);
|
|
46
49
|
const args = parseArgs(process.argv.slice(2));
|
|
47
50
|
if (args._[0] && args._[0] !== 'list') die(`unknown command "${args._[0]}" (only "list" supported)`);
|
|
48
51
|
const apiKey = args['api-key'] || process.env.ANTHROPIC_API_KEY;
|
|
@@ -29,6 +29,7 @@ import { Logger } from '../src/logger.js';
|
|
|
29
29
|
import { TokenTracker } from '../src/tokens.js';
|
|
30
30
|
import { SignalsAggregator } from '../src/anonymizer.js';
|
|
31
31
|
import { resolveFortressBase, fortressEndpoint } from '../src/fortress/url.js';
|
|
32
|
+
import { cleanLabel } from '../src/labels.js';
|
|
32
33
|
import { isValidAgentId, isValidSessionId, assertSafePathSegment } from '../src/validate.js';
|
|
33
34
|
import { classifyAgentType } from '../src/typology.js';
|
|
34
35
|
import { aggregate, buildFeatures } from '../src/typology-features.js';
|
|
@@ -36,6 +37,7 @@ import {
|
|
|
36
37
|
getAgent, listAgents, listSessions, fetchSessionEntries, fetchRawEvents,
|
|
37
38
|
AnthropicManagedSource, effectiveEnforcementMode,
|
|
38
39
|
} from '../src/sources/anthropic-managed.js';
|
|
40
|
+
import { maybePrintVersionAndExit } from '../src/version.js';
|
|
39
41
|
|
|
40
42
|
function parseArgs(argv) {
|
|
41
43
|
const out = {};
|
|
@@ -73,9 +75,9 @@ function parseSince(s) {
|
|
|
73
75
|
function die(msg, code = 1) { process.stderr.write(`${msg}\n`); process.exit(code); }
|
|
74
76
|
function info(msg) { process.stdout.write(`[wma-fetch] ${msg}\n`); }
|
|
75
77
|
function warn(msg) { process.stderr.write(`[wma-fetch] ⚠️ ${msg}\n`); }
|
|
76
|
-
//
|
|
77
|
-
//
|
|
78
|
-
|
|
78
|
+
// v1.1.1 F-11: cleanLabel moved to src/labels.js so wma-upload-fortress
|
|
79
|
+
// (and any future consumer) shares the exact same sanitization. Defense
|
|
80
|
+
// in depth vs log/payload injection from customer-set agent names.
|
|
79
81
|
|
|
80
82
|
function resolveModel(agent) {
|
|
81
83
|
const raw = agent.model || agent.config?.model || null;
|
|
@@ -261,7 +263,7 @@ const sleep = (ms, signal) => new Promise((res) => {
|
|
|
261
263
|
});
|
|
262
264
|
|
|
263
265
|
// ── ONE-SHOT ──────────────────────────────────────────────────────────────
|
|
264
|
-
async function fetchOneShot({ apiKey, agentId, model, logDir, since, sessionId, dumpRaw }) {
|
|
266
|
+
async function fetchOneShot({ apiKey, agentId, model, logDir, since, sessionId, dumpRaw, forceDuplicates = false }) {
|
|
265
267
|
let sessions;
|
|
266
268
|
if (sessionId) {
|
|
267
269
|
sessions = [{ id: sessionId, created_at: new Date().toISOString() }];
|
|
@@ -272,7 +274,18 @@ async function fetchOneShot({ apiKey, agentId, model, logDir, since, sessionId,
|
|
|
272
274
|
if (sessions.length === 0) { info('no sessions to fetch'); return; }
|
|
273
275
|
info(`${sessions.length} session(s) to fetch`);
|
|
274
276
|
|
|
277
|
+
// v1.1.1 F-10 (P2 Codex audit): preload the entry ids already on disk for
|
|
278
|
+
// this agent so re-running the one-shot doesn't duplicate events. The
|
|
279
|
+
// watch daemon does this already; the one-shot was the missing piece.
|
|
280
|
+
// Operators who explicitly want the legacy duplicate-on-rerun behavior
|
|
281
|
+
// can opt back in with --force-duplicates.
|
|
282
|
+
const seenIds = forceDuplicates ? new Set() : await preloadSeenIds(logDir, agentId);
|
|
283
|
+
if (!forceDuplicates && seenIds.size > 0) {
|
|
284
|
+
info(`preloaded ${seenIds.size} known event id(s) for dedup`);
|
|
285
|
+
}
|
|
286
|
+
|
|
275
287
|
let totalEntries = 0;
|
|
288
|
+
let totalSkipped = 0;
|
|
276
289
|
for (const s of sessions) {
|
|
277
290
|
const sid = s.id;
|
|
278
291
|
process.stdout.write(`\n[wma-fetch] session ${sid}\n`);
|
|
@@ -288,23 +301,27 @@ async function fetchOneShot({ apiKey, agentId, model, logDir, since, sessionId,
|
|
|
288
301
|
const logger = new Logger({ logDir, agentId, sessionId: sid, silent: true });
|
|
289
302
|
const tracker = new TokenTracker();
|
|
290
303
|
let count = 0;
|
|
304
|
+
let skipped = 0;
|
|
291
305
|
for await (const entry of fetchSessionEntries({ apiKey, agentId, sessionId: sid, model })) {
|
|
306
|
+
if (entry.id && seenIds.has(entry.id)) { skipped++; continue; }
|
|
292
307
|
const written = await logger.write(entry);
|
|
308
|
+
if (entry.id) seenIds.add(entry.id);
|
|
293
309
|
tracker.record(written);
|
|
294
310
|
count++;
|
|
295
311
|
}
|
|
312
|
+
totalSkipped += skipped;
|
|
296
313
|
const stats = tracker.stats().total;
|
|
297
314
|
await logger.write({
|
|
298
315
|
action_type: 'session_end', provider: 'anthropic-managed', status: 'ok', model,
|
|
299
316
|
session_tokens: { input: stats.input, output: stats.output, cache_read: stats.cache_read, cache_creation: stats.cache_creation, total: stats.sum },
|
|
300
317
|
session_cost_usd: stats.cost_usd || null,
|
|
301
318
|
});
|
|
302
|
-
process.stdout.write(` entries : ${count} (+1 session_end)\n`);
|
|
319
|
+
process.stdout.write(` entries : ${count} (+1 session_end)${skipped ? ` · ${skipped} skipped (dedup)` : ''}\n`);
|
|
303
320
|
process.stdout.write(` tokens : in=${stats.input} out=${stats.output} cache_r=${stats.cache_read} cache_w=${stats.cache_creation}\n`);
|
|
304
321
|
process.stdout.write(` written to : ${logger._pathForToday()}\n`);
|
|
305
322
|
totalEntries += count + 1;
|
|
306
323
|
}
|
|
307
|
-
process.stdout.write(`\n[wma-fetch] done — ${totalEntries} total entries across ${sessions.length} session(s)\n`);
|
|
324
|
+
process.stdout.write(`\n[wma-fetch] done — ${totalEntries} total entries across ${sessions.length} session(s)${totalSkipped ? `, ${totalSkipped} skipped (dedup)` : ''}\n`);
|
|
308
325
|
process.stdout.write(`[wma-fetch] inspect with: npx wma-inspect ${logDir}\n`);
|
|
309
326
|
}
|
|
310
327
|
|
|
@@ -441,6 +458,8 @@ async function runWatch({ apiKey, resolveAgents, fleet, logDir, intervalMs, wind
|
|
|
441
458
|
}
|
|
442
459
|
|
|
443
460
|
async function main() {
|
|
461
|
+
// v1.1.1 F-13: --version / -v short-circuit before any other parsing.
|
|
462
|
+
maybePrintVersionAndExit(process.argv);
|
|
444
463
|
const args = parseArgs(process.argv.slice(2));
|
|
445
464
|
const apiKey = args['api-key'] || process.env.ANTHROPIC_API_KEY;
|
|
446
465
|
const agentId = args['agent-id'];
|
|
@@ -545,7 +564,7 @@ async function main() {
|
|
|
545
564
|
info(`resolving agent ${agentId}…`);
|
|
546
565
|
const agent = await getAgent(apiKey, agentId).catch((e) => die(`failed to GET agent: ${e.message}`));
|
|
547
566
|
const since = args.since ? parseSince(args.since) : null;
|
|
548
|
-
await fetchOneShot({ apiKey, agentId, model: resolveModel(agent), logDir, since, sessionId: args['session-id'], dumpRaw: !!args['dump-raw'] });
|
|
567
|
+
await fetchOneShot({ apiKey, agentId, model: resolveModel(agent), logDir, since, sessionId: args['session-id'], dumpRaw: !!args['dump-raw'], forceDuplicates: !!args['force-duplicates'] });
|
|
549
568
|
}
|
|
550
569
|
}
|
|
551
570
|
|
package/scripts/inspect.js
CHANGED
|
@@ -13,6 +13,7 @@ import { createReadStream } from 'node:fs';
|
|
|
13
13
|
import { createInterface } from 'node:readline';
|
|
14
14
|
import { join, resolve } from 'node:path';
|
|
15
15
|
import { TokenTracker } from '../src/tokens.js';
|
|
16
|
+
import { maybePrintVersionAndExit } from '../src/version.js';
|
|
16
17
|
|
|
17
18
|
// Streaming line-by-line reader — bounds memory usage on large NDJSON files
|
|
18
19
|
// (a long-running agent can produce hundreds of MB per day).
|
|
@@ -66,6 +67,8 @@ function extractDestination(input) {
|
|
|
66
67
|
}
|
|
67
68
|
|
|
68
69
|
async function main() {
|
|
70
|
+
// v1.1.1 F-13: --version / -v short-circuit before any other parsing.
|
|
71
|
+
maybePrintVersionAndExit(process.argv);
|
|
69
72
|
const files = await collectFiles(target);
|
|
70
73
|
if (files.length === 0) {
|
|
71
74
|
process.stderr.write(`No .ndjson files found under ${target}\n`); process.exit(1);
|
package/scripts/service.js
CHANGED
|
@@ -25,6 +25,7 @@ import { join } from 'node:path';
|
|
|
25
25
|
import { fileURLToPath } from 'node:url';
|
|
26
26
|
import { execFileSync } from 'node:child_process';
|
|
27
27
|
import { isValidAgentId } from '../src/validate.js';
|
|
28
|
+
import { maybePrintVersionAndExit } from '../src/version.js';
|
|
28
29
|
|
|
29
30
|
const HOME = os.homedir();
|
|
30
31
|
const PLATFORM = process.platform; // 'darwin' | 'linux' | …
|
|
@@ -338,6 +339,8 @@ The service starts at login and restarts on crash. Raw logs stay local.
|
|
|
338
339
|
}
|
|
339
340
|
|
|
340
341
|
function main() {
|
|
342
|
+
// v1.1.1 F-13: --version / -v short-circuit before any other parsing.
|
|
343
|
+
maybePrintVersionAndExit(process.argv);
|
|
341
344
|
const args = parseArgs(process.argv.slice(2));
|
|
342
345
|
const cmd = args._[0];
|
|
343
346
|
switch (cmd) {
|
package/scripts/shield.js
CHANGED
|
@@ -32,6 +32,7 @@ import {
|
|
|
32
32
|
confirmAllow, confirmDeny, interruptSession,
|
|
33
33
|
getAgentConfig, detectAlwaysAsk,
|
|
34
34
|
} from '../src/shield/enforce.js';
|
|
35
|
+
import { maybePrintVersionAndExit } from '../src/version.js';
|
|
35
36
|
import { DecisionLogger } from '../src/shield/decisions.js';
|
|
36
37
|
import { listSessions, listAgents } from '../src/sources/anthropic-managed.js';
|
|
37
38
|
import { FortressPolicySource, postDecision } from '../src/shield/sources/fortress.js';
|
|
@@ -405,6 +406,8 @@ async function runAgentWide(ctx) {
|
|
|
405
406
|
// Main
|
|
406
407
|
// ────────────────────────────────────────────────────────────────────────
|
|
407
408
|
async function main() {
|
|
409
|
+
// v1.1.1 F-13: --version / -v short-circuit before any other parsing.
|
|
410
|
+
maybePrintVersionAndExit(process.argv);
|
|
408
411
|
const args = parseArgs(process.argv.slice(2));
|
|
409
412
|
const apiKey = args['api-key'] || process.env.ANTHROPIC_API_KEY;
|
|
410
413
|
const agentId = args['agent-id'];
|
package/scripts/signals.js
CHANGED
|
@@ -25,6 +25,7 @@ import { resolve, join } from 'node:path';
|
|
|
25
25
|
import { SignalsAggregator } from '../src/anonymizer.js';
|
|
26
26
|
import { createReadStream } from 'node:fs';
|
|
27
27
|
import { createInterface } from 'node:readline';
|
|
28
|
+
import { maybePrintVersionAndExit } from '../src/version.js';
|
|
28
29
|
|
|
29
30
|
function parseArgs(argv) {
|
|
30
31
|
const out = {};
|
|
@@ -59,6 +60,8 @@ async function collectFiles(p) {
|
|
|
59
60
|
}
|
|
60
61
|
|
|
61
62
|
async function main() {
|
|
63
|
+
// v1.1.1 F-13: --version / -v short-circuit before any other parsing.
|
|
64
|
+
maybePrintVersionAndExit(process.argv);
|
|
62
65
|
const args = parseArgs(process.argv.slice(2));
|
|
63
66
|
|
|
64
67
|
if (!args._target) {
|
|
@@ -31,6 +31,8 @@ import { createInterface } from 'node:readline';
|
|
|
31
31
|
import { SignalsAggregator } from '../src/anonymizer.js';
|
|
32
32
|
import { resolveFortressBase, fortressEndpoint } from '../src/fortress/url.js';
|
|
33
33
|
import { AnthropicManagedSource } from '../src/sources/anthropic-managed.js';
|
|
34
|
+
import { cleanLabel } from '../src/labels.js';
|
|
35
|
+
import { maybePrintVersionAndExit } from '../src/version.js';
|
|
34
36
|
|
|
35
37
|
function parseArgs(argv) {
|
|
36
38
|
const out = {};
|
|
@@ -99,13 +101,18 @@ function postJson(url, headers, body) {
|
|
|
99
101
|
}
|
|
100
102
|
|
|
101
103
|
async function main() {
|
|
104
|
+
// v1.1.1 F-13: --version / -v short-circuit before any other parsing.
|
|
105
|
+
maybePrintVersionAndExit(process.argv);
|
|
102
106
|
const args = parseArgs(process.argv.slice(2));
|
|
103
107
|
|
|
104
108
|
const agentId = args['agent-id'];
|
|
105
109
|
const logDir = resolve(args['log-dir'] || './watchmyagents-logs');
|
|
106
110
|
const apiKey = args['api-key'] || process.env.WMA_API_KEY;
|
|
107
111
|
const salt = args.salt || process.env.WMA_SIGNALS_SALT;
|
|
108
|
-
|
|
112
|
+
// v1.1.1 F-11: sanitize the customer-supplied display name with the
|
|
113
|
+
// same cleanLabel used by the Watch daemon (defense-in-depth vs log
|
|
114
|
+
// injection / Fortress payload injection via control bytes).
|
|
115
|
+
const displayName = cleanLabel(args['display-name'] || agentId) || agentId;
|
|
109
116
|
const dryRun = !!args['dry-run'];
|
|
110
117
|
|
|
111
118
|
// Resolve Fortress base URL. Accepts:
|
package/src/labels.js
ADDED
|
@@ -0,0 +1,39 @@
|
|
|
1
|
+
// ────────────────────────────────────────────────────────────────────────
|
|
2
|
+
// labels — shared sanitization for human-facing identifiers
|
|
3
|
+
// ────────────────────────────────────────────────────────────────────────
|
|
4
|
+
//
|
|
5
|
+
// Customer-set strings (agent display names, workspace labels, etc.) end
|
|
6
|
+
// up in:
|
|
7
|
+
// - log lines (stdout/stderr of the Watch + Shield daemons)
|
|
8
|
+
// - the Fortress ingest-signals payload (`display_name` field)
|
|
9
|
+
// - eventually rendered in the Fortress dashboard
|
|
10
|
+
//
|
|
11
|
+
// We don't trust them. A name carrying:
|
|
12
|
+
// - control bytes (0x00-0x1F, 0x7F) can poison terminal output (ANSI
|
|
13
|
+
// escape sequences) or break NDJSON parsing
|
|
14
|
+
// - excessive length can bloat payloads and break UI columns
|
|
15
|
+
//
|
|
16
|
+
// `cleanLabel()` is the single, shared sanitizer. Both wma-fetch (the
|
|
17
|
+
// daemon) and wma-upload-fortress (the one-shot uploader) MUST run
|
|
18
|
+
// every customer-supplied label through it before logging or shipping.
|
|
19
|
+
// Extracted to its own module in v1.1.1 (F-11 Codex audit fix) so a
|
|
20
|
+
// future change benefits both consumers automatically.
|
|
21
|
+
|
|
22
|
+
const MAX_LABEL_CHARS = 60;
|
|
23
|
+
|
|
24
|
+
/**
|
|
25
|
+
* Strip control bytes (< 0x20 and 0x7F DEL) and truncate to MAX_LABEL_CHARS
|
|
26
|
+
* characters. Returns the empty string for null/undefined input.
|
|
27
|
+
*
|
|
28
|
+
* Uses [...str] to iterate by code point so surrogate pairs aren't split.
|
|
29
|
+
*/
|
|
30
|
+
export function cleanLabel(s) {
|
|
31
|
+
return [...String(s ?? '')]
|
|
32
|
+
.filter((c) => {
|
|
33
|
+
const code = c.charCodeAt(0);
|
|
34
|
+
return code >= 32 && code !== 127;
|
|
35
|
+
})
|
|
36
|
+
.join('')
|
|
37
|
+
.slice(0, MAX_LABEL_CHARS)
|
|
38
|
+
.trim();
|
|
39
|
+
}
|
|
@@ -37,6 +37,13 @@ const RECONNECT_MIN_MS = 1_000;
|
|
|
37
37
|
const RECONNECT_MAX_MS = 60_000;
|
|
38
38
|
const FALLBACK_RETRY_INTERVAL_MS = 5 * 60_000;
|
|
39
39
|
const PERMANENT_FAILURE_LOG_INTERVAL_MS = 5 * 60_000;
|
|
40
|
+
// v1.1.1 F-9 (P2 Codex audit): hard cap on a single SSE event's buffer.
|
|
41
|
+
// A buggy or compromised Fortress endpoint could stream bytes forever
|
|
42
|
+
// without emitting the "\n\n" event separator, growing Shield's memory.
|
|
43
|
+
// 1 MB is far above any legitimate `policy_changed` payload (the data
|
|
44
|
+
// field carries {rule_id, action, ts, kind} = maybe 200 bytes) so we
|
|
45
|
+
// abort the connection and reconnect on overflow.
|
|
46
|
+
const MAX_SSE_EVENT_BYTES = 1 * 1024 * 1024;
|
|
40
47
|
|
|
41
48
|
export class PolicyStream extends EventEmitter {
|
|
42
49
|
constructor({ url, apiKey, anthropicAgentId, onError, onInfo }) {
|
|
@@ -147,6 +154,17 @@ export class PolicyStream extends EventEmitter {
|
|
|
147
154
|
let buffer = '';
|
|
148
155
|
res.on('data', (chunk) => {
|
|
149
156
|
buffer += chunk;
|
|
157
|
+
// v1.1.1 F-9: cap on a single SSE event buffer. A buggy/compromised
|
|
158
|
+
// endpoint that never emits "\n\n" would otherwise OOM Shield.
|
|
159
|
+
// Abort + reconnect on overflow; the buffer is dropped so we
|
|
160
|
+
// restart fresh on the new connection.
|
|
161
|
+
if (buffer.length > MAX_SSE_EVENT_BYTES) {
|
|
162
|
+
this.onError(new Error(`policy-stream: SSE event exceeded ${MAX_SSE_EVENT_BYTES} bytes — aborting connection and reconnecting`));
|
|
163
|
+
buffer = '';
|
|
164
|
+
try { res.destroy(); } catch { /* already destroyed */ }
|
|
165
|
+
if (!this._closed) this._scheduleReconnect();
|
|
166
|
+
return;
|
|
167
|
+
}
|
|
150
168
|
// SSE events are separated by a blank line ("\n\n").
|
|
151
169
|
let eolIdx;
|
|
152
170
|
while ((eolIdx = buffer.indexOf('\n\n')) !== -1) {
|
|
@@ -326,6 +326,11 @@ export async function* fetchSessionEntries({ apiKey, agentId, sessionId, model }
|
|
|
326
326
|
isMcp: type === 'agent.mcp_tool_use',
|
|
327
327
|
input: ev.input ?? null,
|
|
328
328
|
mcpServer: ev.server_name ?? ev.mcp_server_name ?? null,
|
|
329
|
+
// v1.1.1 F-8: capture sub-agent context at storage time so the
|
|
330
|
+
// end-of-session flush yields entries with the right attribution.
|
|
331
|
+
startTimestamp: ts,
|
|
332
|
+
session_thread_id,
|
|
333
|
+
agent_name,
|
|
329
334
|
});
|
|
330
335
|
continue;
|
|
331
336
|
}
|
|
@@ -483,6 +488,37 @@ export async function* fetchSessionEntries({ apiKey, agentId, sessionId, model }
|
|
|
483
488
|
continue;
|
|
484
489
|
}
|
|
485
490
|
}
|
|
491
|
+
|
|
492
|
+
// v1.1.1 F-8 (P1 Codex audit): flush remaining pendingToolUse entries
|
|
493
|
+
// as explicit "no_result_observed" tool_use events. These are tool
|
|
494
|
+
// calls that started (we saw agent.tool_use) but never produced a
|
|
495
|
+
// result (no agent.tool_result paired): most commonly because Shield
|
|
496
|
+
// pre-blocked them, the operator denied via tool_confirmation, the
|
|
497
|
+
// tool died mid-execution, or the session terminated before the
|
|
498
|
+
// result event arrived. For a security audit product, these incomplete
|
|
499
|
+
// calls are often the MOST useful signals — a blocked exfil attempt
|
|
500
|
+
// shows up here, not in successful tool_results. Yielding them
|
|
501
|
+
// explicitly with status='error' keeps the local NDJSON, anonymizer
|
|
502
|
+
// signals (counts, IoC hashes, tool_counts), and Fortress decisions
|
|
503
|
+
// honest about what actually happened.
|
|
504
|
+
for (const [toolUseId, pending] of pendingToolUse) {
|
|
505
|
+
yield {
|
|
506
|
+
...base,
|
|
507
|
+
session_thread_id: pending.session_thread_id,
|
|
508
|
+
agent_name: pending.agent_name,
|
|
509
|
+
id: toolUseId,
|
|
510
|
+
action_type: pending.isMcp ? 'mcp_tool_use' : 'tool_use',
|
|
511
|
+
tool_name: pending.name,
|
|
512
|
+
model: model || null,
|
|
513
|
+
timestamp: pending.startTimestamp,
|
|
514
|
+
duration_ms: null,
|
|
515
|
+
status: 'error',
|
|
516
|
+
error: 'no_result_observed',
|
|
517
|
+
input: pending.input,
|
|
518
|
+
output: { mcp_server: pending.mcpServer ?? undefined },
|
|
519
|
+
};
|
|
520
|
+
}
|
|
521
|
+
pendingToolUse.clear();
|
|
486
522
|
}
|
|
487
523
|
|
|
488
524
|
// ────────────────────────────────────────────────────────────────────────
|
package/src/version.js
ADDED
|
@@ -0,0 +1,52 @@
|
|
|
1
|
+
// ────────────────────────────────────────────────────────────────────────
|
|
2
|
+
// version — shared --version flag handler for the wma-* CLI binaries
|
|
3
|
+
// ────────────────────────────────────────────────────────────────────────
|
|
4
|
+
//
|
|
5
|
+
// v1.1.1 F-13: every CLI binary (wma-fetch, wma-shield, wma-signals,
|
|
6
|
+
// wma-upload-fortress, wma-inspect, wma-agents, wma-service) gets a
|
|
7
|
+
// --version / -v flag that prints the installed version and exits.
|
|
8
|
+
// Operators previously had to grep package.json under npm root to know
|
|
9
|
+
// what was deployed; this is now a one-liner.
|
|
10
|
+
//
|
|
11
|
+
// We resolve the version from the package.json next to the SDK source
|
|
12
|
+
// (../package.json relative to this file) so it stays in sync with the
|
|
13
|
+
// release that's actually executing.
|
|
14
|
+
|
|
15
|
+
import { readFileSync } from 'node:fs';
|
|
16
|
+
import { dirname, join } from 'node:path';
|
|
17
|
+
import { fileURLToPath } from 'node:url';
|
|
18
|
+
|
|
19
|
+
const HERE = dirname(fileURLToPath(import.meta.url));
|
|
20
|
+
const PKG_PATH = join(HERE, '..', 'package.json');
|
|
21
|
+
|
|
22
|
+
let cachedVersion = null;
|
|
23
|
+
|
|
24
|
+
/** Returns the installed watchmyagents version, parsed from package.json. */
|
|
25
|
+
export function getVersion() {
|
|
26
|
+
if (cachedVersion) return cachedVersion;
|
|
27
|
+
try {
|
|
28
|
+
const pkg = JSON.parse(readFileSync(PKG_PATH, 'utf8'));
|
|
29
|
+
cachedVersion = pkg.version || 'unknown';
|
|
30
|
+
} catch {
|
|
31
|
+
cachedVersion = 'unknown';
|
|
32
|
+
}
|
|
33
|
+
return cachedVersion;
|
|
34
|
+
}
|
|
35
|
+
|
|
36
|
+
/**
|
|
37
|
+
* If argv contains --version or -v, print the version and exit(0).
|
|
38
|
+
* Call this BEFORE any other parsing so it short-circuits on bad input
|
|
39
|
+
* (e.g., the user types `wma-fetch --version` with no env vars set).
|
|
40
|
+
*
|
|
41
|
+
* Usage at the top of every wma-* script:
|
|
42
|
+
* import { maybePrintVersionAndExit } from '../src/version.js';
|
|
43
|
+
* maybePrintVersionAndExit(process.argv);
|
|
44
|
+
*/
|
|
45
|
+
export function maybePrintVersionAndExit(argv) {
|
|
46
|
+
for (const a of argv) {
|
|
47
|
+
if (a === '--version' || a === '-v') {
|
|
48
|
+
process.stdout.write(`watchmyagents ${getVersion()}\n`);
|
|
49
|
+
process.exit(0);
|
|
50
|
+
}
|
|
51
|
+
}
|
|
52
|
+
}
|