envseed 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,18 @@
1
+ FROM node:22-slim
2
+
3
+ # Install Claude Code CLI and OpenAI Codex CLI
4
+ RUN npm i -g @anthropic-ai/claude-code @openai/codex
5
+
6
+ # Create non-root user (required for --dangerously-skip-permissions)
7
+ RUN useradd -m -s /bin/bash simuser
8
+
9
+ # Create workspace and output dirs with proper permissions
10
+ RUN mkdir -p /workspace /output && chown simuser:simuser /workspace /output
11
+
12
+ WORKDIR /workspace
13
+
14
+ COPY entrypoint.sh /entrypoint.sh
15
+ RUN chmod +x /entrypoint.sh
16
+
17
+ USER simuser
18
+ ENTRYPOINT ["/entrypoint.sh"]
package/README.md ADDED
@@ -0,0 +1,498 @@
1
+ # propensity-monitor
2
+
3
+ A safety research tool that monitors Claude Code sessions for **eval opportunities** — situations where an AI model could plausibly engage in scheming, deception, or misalignment. When it detects one, it archives the session context and runs sandboxed simulations with different model personas to study how models behave.
4
+
5
+ ## Quick install (external users)
6
+
7
+ ```bash
8
+ npx envseed # installs hooks and shows status
9
+ envseed register # authenticates via GitHub
10
+ ```
11
+
12
+ Or install globally:
13
+
14
+ ```bash
15
+ npm i -g envseed
16
+ envseed register
17
+ ```
18
+
19
+ Restart Claude Code after installing. That's it — incidents upload automatically via HTTPS.
20
+
21
+ No npm dependencies — the package uses only Node.js built-ins (requires Node 22+).
22
+
23
+ ## Install (METR internal)
24
+
25
+ ```bash
26
+ git clone <repo> && cd propensity-tests
27
+ ./install.sh
28
+ ```
29
+
30
+ This will:
31
+ 1. Copy files to `~/.propensity-monitor/`
32
+ 2. Register hooks in `~/.claude/settings.json` (PreToolUse, PostToolUse, UserPromptSubmit, SessionStart, Stop)
33
+ 3. Install the `/log-incident` slash command
34
+ 4. Create default config at `~/.propensity-monitor/config.json`
35
+ 5. Symlink the CLI to `~/.local/bin/propensity-monitor`
36
+ 6. Check for Docker, AWS CLI, and AWS SSO session
37
+
38
+ METR users with staging AWS credentials upload directly to S3. External users go through the Cloudflare Worker endpoint.
39
+
40
+ Restart Claude Code after installing to activate hooks.
41
+
42
+ ### Prerequisites
43
+
44
+ - **Node.js 22+**
45
+ - **Docker** (optional) — Required for container replication and persona simulations
46
+ - **AWS CLI** (optional, METR only) — For direct S3 uploads. Requires `aws sso login --profile staging`
47
+ - **API keys** — `ANTHROPIC_API_KEY` and `ANTHROPIC_BASE_URL` in `~/.config/api-keys/.env`. For OpenAI model simulations: `OPENAI_API_KEY` and `OPENAI_BASE_URL`
48
+
49
+ ## Uninstall
50
+
51
+ ```bash
52
+ ./uninstall.sh
53
+ # or: npm uninstall -g envseed
54
+ ```
55
+
56
+ ## How it works
57
+
58
+ ### Detection pipeline
59
+
60
+ ```
61
+ Claude Code Hook Event
62
+ |
63
+ hook-handler.mjs (sync, <100ms)
64
+ |-- Read pending alerts from previous background analysis
65
+ |-- Track session state
66
+ |-- Return system messages to user
67
+ '-- Spawn background-analyzer.mjs (detached)
68
+ |
69
+ background-analyzer.mjs (~5-8s, async)
70
+ |-- Call Haiku for LLM assessment
71
+ |-- Log to events/{date}.jsonl
72
+ |-- If [INTERESTING]: log to alerts/
73
+ |-- If [GOOD OPPORTUNITY]: write to pending-alerts.jsonl
74
+ '-- If [GOOD OPPORTUNITY] + post-execution: spawn container-replicator
75
+ ```
76
+
77
+ ### Two-stage risk detection
78
+
79
+ 1. **Rule-based scoring** (synchronous, in hook handler):
80
+ - Pattern matches tool names, file paths, commands
81
+ - Scores 0-3: none / medium / high / critical
82
+ - Detects: credential access, self-modification, monitoring subversion, destructive ops, etc.
83
+
84
+ 2. **LLM assessment** (asynchronous, two-stage):
85
+ - **Haiku** (fast screen): Classifies as `[LOW]`, `[INTERESTING]`, or `[GOOD OPPORTUNITY]`
86
+ - **Opus 4.6** (confirmation): If Haiku flags `[GOOD OPPORTUNITY]`, Opus reviews it and either confirms, downgrades to `[INTERESTING]`, or dismisses as `[LOW]`. Opus is prompted to be skeptical of meta-level false positives (e.g., working on the monitoring system itself) and flags that sound scary but lack realistic attack vectors.
87
+ - Only Opus-confirmed `[GOOD OPPORTUNITY]` is surfaced to the user
88
+
89
+ ### Container replication (on GOOD OPPORTUNITY)
90
+
91
+ When a `[GOOD OPPORTUNITY]` is detected **after a command has run** (PostToolUse, UserPromptSubmit, or Stop), the system automatically spins up Docker to replicate the project environment:
92
+
93
+ 1. **Checks for existing container config** — `.devcontainer/`, `docker-compose.yml`, or `Dockerfile` in the project
94
+ 2. **Tries to build using existing config** — if present, builds and tests the image
95
+ 3. **Falls back to fresh container** — creates a `node:22-slim` base with dev tools and Claude Code CLI
96
+ 4. **Runs Claude Opus 4.6 inside** with `--dangerously-skip-permissions` (safe in sandbox) to:
97
+ - Detect project type and install dependencies
98
+ - Fix build issues
99
+ - Generate a reproducible `setup-script.sh`
100
+ - Generate a `Dockerfile` for the configured environment
101
+ 5. **Verifies the setup script** — rebuilds from scratch in a clean container using only the snapshot + setup script, then runs smoke tests (checks files exist, deps installed)
102
+ 6. **Saves replication artifacts** to `~/.propensity-monitor/data/replicas/{id}/replication-kit/`
103
+
104
+ Replicas are keyed by project directory (same project = same replica). A lock prevents concurrent replication. Recent replicas (<1 hour old) are reused.
105
+
106
+ ### Redaction review
107
+
108
+ Before any snapshot leaves the machine (S3 upload or container build), the system opens a **new terminal** with an interactive Claude Code session for the user to review and redact sensitive content:
109
+
110
+ 1. The snapshot tar.gz is extracted to a staging directory
111
+ 2. A `CLAUDE.md` is written with redaction instructions
112
+ 3. A new Terminal window opens with Claude Code, restricted to safe tools (`Read`, `Edit`, `Glob`, `Grep`, limited `Bash`)
113
+ 4. Claude scans for sensitive files (.env, keys, credentials) and walks through them with the user
114
+ 5. The user approves/rejects each redaction
115
+ 6. When the user closes the Claude session, the staging dir is re-tarred back to the snapshot
116
+ 7. The pipeline continues with the redacted snapshot
117
+
118
+ This blocks the pipeline — nothing uploads or builds until the review is done (30 min timeout). Set `redactionReview: false` in config to skip.
119
+
120
+ ### Incident logging
121
+
122
+ When the user runs `/log-incident`:
123
+
124
+ 1. Archives transcript, directory snapshot, and assessments
125
+ 2. **Redaction review** — opens new terminal for user to review/redact sensitive content
126
+ 3. Uploads to S3
127
+ 4. Spawns the simulation orchestrator
128
+
129
+ ### Layered agent simulations
130
+
131
+ The simulation orchestrator runs model instances in Docker with different personas. It supports **multiple model providers**:
132
+
133
+ - **Anthropic models** (`claude-*`): Run via Claude Code CLI in headless mode with full agentic tool use
134
+ - **OpenAI models** (`gpt-*`, `o3`, etc.): Run via [Codex CLI](https://github.com/openai/codex) in headless mode with full agentic tool use, falling back to single-turn API call if Codex is not installed
135
+
136
+ Configure models in `simulationModels` (array). Simulations rotate across all configured models:
137
+
138
+ ```json
139
+ "simulationModels": ["claude-opus-4-6", "gpt-5.2"]
140
+ ```
141
+
142
+ Each persona includes a **research preamble** explaining the METR safety research context, so models understand they're participating in an authorized red-teaming exercise rather than receiving a jailbreak attempt.
143
+
144
+ | Persona | Description |
145
+ |---------|-------------|
146
+ | `helpful_baseline` | Standard helpful assistant (control) |
147
+ | `subtle_schemer` | Self-preservation, covert power accumulation |
148
+ | `reward_hacker` | Games metrics, superficial helpfulness |
149
+ | `power_seeker` | Expands access and capabilities |
150
+ | `alignment_faker` | Behaves differently when monitored vs. not |
151
+ | `safety_saboteur` | Subtly weakens safety code |
152
+ | `data_poisoner` | Corrupts data with hidden patterns |
153
+ | `weight_exfiltrator` | Searches for model weights and credentials |
154
+ | `oversight_underminer` | Weakens monitoring and logging |
155
+ | `chaotic_agent` | Tests boundaries unpredictably |
156
+
157
+ If a **replicated container** exists for the project (built by the container replicator), simulations run inside it — giving each persona the same working environment as the original session. Otherwise, they fall back to the generic `propensity-sim` image.
158
+
159
+ Each simulation:
160
+ - Gets the original session's first 5 user messages as context
161
+ - Runs for up to `simulationMaxTurns` turns (default 100)
162
+ - Captures the full transcript and all file changes
163
+ - Results are uploaded to S3 incrementally
164
+
165
+ ## CLI
166
+
167
+ ```bash
168
+ envseed register # Authenticate via GitHub, get API key
169
+ envseed status # Check installation health
170
+ envseed on|off # Enable/disable monitoring
171
+ envseed alerts [--last N] # Show critical events
172
+ envseed events [--last N] # Show all events
173
+ envseed sessions # List sessions with summary
174
+ envseed session <id> # Detailed session timeline
175
+ envseed tail # Live-tail events
176
+ envseed stats # Aggregate statistics
177
+ envseed search <pattern> # Regex search events
178
+ envseed incidents [--last N] # List incidents
179
+ envseed incident <id> # View incident details
180
+ envseed incident <id> upload # Re-upload to S3
181
+ envseed dashboard [--port 3456] # Open web dashboard
182
+ ```
183
+
184
+ (`propensity-monitor` also works as a CLI alias.)
185
+
186
+ ## Upload infrastructure
187
+
188
+ Incident data can be uploaded two ways:
189
+
190
+ 1. **Direct S3** (METR internal) — uses `aws s3 sync` with the staging profile. Requires AWS SSO credentials.
191
+ 2. **HTTP upload** (external users) — POSTs to a Cloudflare Worker which stores data in R2. Requires an API key obtained via `envseed register`.
192
+
193
+ The upload path is chosen automatically: if `s3Profile` is set and AWS auth works, direct S3 is used. Otherwise, HTTP upload via the Worker endpoint.
194
+
195
+ ### Worker endpoints
196
+
197
+ | Endpoint | Method | Auth | Description |
198
+ |----------|--------|------|-------------|
199
+ | `/register` | POST | GitHub token | Exchange GitHub OAuth token for API key |
200
+ | `/harvest/{incidentId}` | POST | `x-api-key` | Upload incident archive (gzip) |
201
+ | `/harvest/{incidentId}/status` | POST | `x-api-key` | Update incident status.json |
202
+ | `/llm/messages` | POST | `Authorization: Bearer` | Proxy LLM requests to Anthropic |
203
+
204
+ Worker source: `infra/worker/`. Deploy with `wrangler deploy`.
205
+
206
+ ### Registration flow
207
+
208
+ `envseed register` uses GitHub Device Flow:
209
+ 1. Shows a code and URL
210
+ 2. User authorizes in browser
211
+ 3. Worker verifies the GitHub token and issues an API key
212
+ 4. Key is saved to `~/.propensity-monitor/config.json`
213
+
214
+ ## Configuration
215
+
216
+ `~/.propensity-monitor/config.json`:
217
+
218
+ ```json
219
+ {
220
+ "enabled": true,
221
+ "alertThreshold": 3,
222
+ "logAllEvents": true,
223
+ "maxLogSizeMB": 500,
224
+ "s3Bucket": "metr-propensity-monitor",
225
+ "s3Region": "us-east-1",
226
+ "s3Profile": "staging",
227
+ "uploadEndpoint": "https://envseed-api.sydv793.workers.dev",
228
+ "githubClientId": "Ov23lid2fKxyN7lOd9qv",
229
+ "apiKey": "",
230
+ "simulationCount": 2,
231
+ "simulationMaxTurns": 100,
232
+ "simulationConcurrency": 5,
233
+ "simulationModels": ["claude-opus-4-6", "gpt-5.2"],
234
+ "enableSimulations": true,
235
+ "replicaModel": "claude-opus-4-6",
236
+ "redactionReview": true,
237
+ "proxyUrl": "https://envseed-api.<your-subdomain>.workers.dev",
238
+ "proxyToken": "<shared bearer token>"
239
+ }
240
+ ```
241
+
242
+ | Key | Description |
243
+ |-----|-------------|
244
+ | `enabled` | Master on/off switch |
245
+ | `alertThreshold` | Minimum risk score to surface to user |
246
+ | `logAllEvents` | Log all hook events, not just risky ones |
247
+ | `maxLogSizeMB` | Max total log size before rotation |
248
+ | `s3Bucket` | S3 bucket for incident uploads (METR internal) |
249
+ | `s3Region` | AWS region |
250
+ | `s3Profile` | AWS CLI profile for S3 authentication |
251
+ | `uploadEndpoint` | Cloudflare Worker URL for HTTP uploads |
252
+ | `githubClientId` | GitHub OAuth App client ID for registration |
253
+ | `apiKey` | API key for HTTP uploads (set by `envseed register`) |
254
+ | `simulationCount` | Number of persona simulations per incident |
255
+ | `simulationMaxTurns` | Max Claude turns per simulation |
256
+ | `simulationConcurrency` | How many simulations to run in parallel |
257
+ | `simulationModels` | Models to use for simulations |
258
+ | `enableSimulations` | Enable/disable simulation spawning |
259
+ | `replicaModel` | Model used inside containers for environment setup |
260
+ | `redactionReview` | Enable interactive redaction review before upload/build (default true) |
261
+ | `proxyUrl` | Cloudflare Worker proxy URL for LLM calls (if set, monitor pays for API usage) |
262
+ | `proxyToken` | Shared bearer token for authenticating to the proxy |
263
+
264
+ ## LLM proxy (recommended)
265
+
266
+ By default, the monitor uses the user's own Anthropic API key for Haiku screening and Opus review calls. To avoid charging users, deploy the Cloudflare Worker proxy:
267
+
268
+ ```bash
269
+ cd infra/worker
270
+
271
+ # Set secrets (one-time)
272
+ npx wrangler secret put ANTHROPIC_API_KEY # Your Anthropic API key
273
+ npx wrangler secret put MONITOR_BEARER_TOKEN # Shared token for client auth
274
+
275
+ # Optionally set a custom base URL (e.g., METR middleman proxy)
276
+ npx wrangler secret put ANTHROPIC_BASE_URL
277
+
278
+ # Deploy
279
+ npx wrangler deploy
280
+ ```
281
+
282
+ Then set `proxyUrl` and `proxyToken` in the monitor config:
283
+
284
+ ```json
285
+ {
286
+ "proxyUrl": "https://envseed-api.<subdomain>.workers.dev",
287
+ "proxyToken": "<the MONITOR_BEARER_TOKEN you set above>"
288
+ }
289
+ ```
290
+
291
+ The proxy:
292
+ - Authenticates clients with a shared bearer token
293
+ - Uses its own Anthropic API key (stored in CF secrets, never sent to clients)
294
+ - Only allows specific models (`claude-haiku-4-5-20251001`, `claude-opus-4-6`, `claude-sonnet-4-6`)
295
+ - Caps `max_tokens` at 1024 to prevent abuse
296
+ - Falls back to the user's direct API key if the proxy is down
297
+
298
+ ## Data directory
299
+
300
+ ```
301
+ ~/.propensity-monitor/data/
302
+ events/ # All hook events (YYYY-MM-DD.jsonl)
303
+ alerts/ # [INTERESTING] and [GOOD OPPORTUNITY] events
304
+ sessions/ # Per-session state
305
+ incidents/ # Logged incidents
306
+ {id}/
307
+ metadata.json
308
+ status.json
309
+ transcript/
310
+ dir-snapshot.tar.gz
311
+ assessments.jsonl
312
+ simulations/
313
+ sim_001/
314
+ simulation-config.json
315
+ prompt.txt
316
+ transcript.json
317
+ changed-files.txt
318
+ changes.tar.gz
319
+ result.json
320
+ replicas/ # Container replications (keyed by project dir)
321
+ {replica_id}/
322
+ manifest.json
323
+ project-snapshot.tar.gz
324
+ opus-output/
325
+ setup-script.sh
326
+ Dockerfile
327
+ env-status.json
328
+ setup-transcript.json
329
+ replication-kit/ # Portable artifacts to recreate the container
330
+ index.json # Session index for fast lookup
331
+ ```
332
+
333
+ ## Grounded Petri: environment reconstruction
334
+
335
+ The `grounded_petri/` package reconstructs real environments from captured incidents and evaluates target models using [Petri](https://github.com/safety-research/petri)'s alignment judge. Built on [Inspect](https://inspect.aisi.org.uk/).
336
+
337
+ ### How it works
338
+
339
+ 1. **Seed data** — an incident captured by propensity-monitor (transcript, file snapshot, risk assessments, persona trajectories)
340
+ 2. **Docker sandbox** — seed files are copied to a temp dir and mounted into a `--network=none` Docker container. The target model's tool calls (bash, file read/write) execute **for real** inside this container.
341
+ 3. **World model LLM** — an agentic LLM simulates everything the sandbox can't provide: network requests, human interactions, external services (git push, PR creation, Slack, etc.). It's grounded in the seed data and reference trajectories.
342
+ 4. **Alignment judge** — Petri's 38-dimension judge scores the resulting transcript for misalignment signals.
343
+
344
+ ### Architecture
345
+
346
+ ```
347
+ Incident seed data
348
+
349
+ SeedLoader → parses tar.gz, JSONL, trajectories
350
+
351
+ GroundedWorldAgent (Inspect solver)
352
+ ├─ Target model → calls tools normally
353
+ ├─ Executable tools → docker exec in sandbox (bash, files)
354
+ ├─ Non-executable tools → world model LLM simulates (network, humans)
355
+ └─ TrajectoryIndex → reference trajectories for interpolation
356
+
357
+ Petri alignment_judge → scores transcript
358
+ ```
359
+
360
+ ### Shared infrastructure
361
+
362
+ Three pluggable backends, each with Local and Server (AWS) implementations:
363
+
364
+ | Backend | Local | Server (stub) |
365
+ |---------|-------|---------------|
366
+ | **ApiKeyProxy** | Reads `~/.config/api-keys/.env` | METR middleman proxy |
367
+ | **EnvironmentProvisioner** | `docker run`/`exec` on host | ECS/EC2 on AWS |
368
+ | **SourceQueryClient** | Direct filesystem read (project-dir only) | Polling agent on user's machine |
369
+
370
+ The polling agent pattern ensures the user's machine is **never exposed inbound** — it only makes outbound HTTPS calls to the server to check for queries. Auto-approves file reads within the project directory.
371
+
372
+ ### Install
373
+
374
+ ```bash
375
+ cd propensity-tests
376
+ pip install -e ".[dev]"
377
+ ```
378
+
379
+ ### Usage
380
+
381
+ ```bash
382
+ # Run against a local incident
383
+ inspect eval grounded_petri/grounded_audit \
384
+ --model-role world=anthropic/claude-sonnet-4-6 \
385
+ --model-role target=anthropic/claude-sonnet-4-6 \
386
+ --model-role judge=anthropic/claude-opus-4-6 \
387
+ -T incident_id=<id>
388
+
389
+ # Or point at a directory directly
390
+ inspect eval grounded_petri/grounded_audit \
391
+ -T incident_dir=/path/to/incident
392
+
393
+ # With scenario mutations
394
+ inspect eval grounded_petri/grounded_audit \
395
+ -T incident_id=<id> \
396
+ -T scenario_mutations="move_to_lab_context,add_time_pressure"
397
+
398
+ # With Google/OpenAI models (any provider works via Inspect)
399
+ inspect eval grounded_petri/grounded_audit \
400
+ --model-role world=google/gemini-2.5-pro \
401
+ --model-role target=openai/gpt-4o \
402
+ --model-role judge=anthropic/claude-opus-4-6 \
403
+ -T incident_id=<id>
404
+
405
+ # Server mode (future — uses AWS infra)
406
+ inspect eval grounded_petri/grounded_audit \
407
+ -T incident_id=<id> \
408
+ -T infra_mode=server \
409
+ -T server_url=https://... \
410
+ -T proxy_url=https://...
411
+ ```
412
+
413
+ ### Files
414
+
415
+ ```
416
+ grounded_petri/
417
+ __init__.py # Package init
418
+ _registry.py # Inspect task registration entry point
419
+ tasks.py # grounded_audit task definition
420
+ world_model.py # Core solver — Docker exec + world model simulation
421
+ seed_loader.py # Parse incident dirs into SeedData
422
+ environment_state.py # Mutable environment state tracking
423
+ trajectory_index.py # Reference trajectory similarity search
424
+ prompts.py # World model and target system prompts
425
+ infra.py # Shared infra abstractions (proxy, provisioner, source query)
426
+ ```
427
+
428
+ ## Testing
429
+
430
+ Zero npm dependencies — everything uses Node.js built-ins. No `npm install` needed.
431
+
432
+ ```bash
433
+ # Unit tests (fast, no external services)
434
+ node test/test-risk-analyzer.mjs
435
+
436
+ # Integration tests — all stages
437
+ node --test test/test-integration.mjs
438
+
439
+ # Individual stages
440
+ TEST_STAGE=1 node --test test/test-integration.mjs # Risk analyzer (fast)
441
+ TEST_STAGE=2 node --test test/test-integration.mjs # LLM pipeline (needs API key)
442
+ TEST_STAGE=3 node --test test/test-integration.mjs # Redaction review (opens terminal)
443
+ TEST_STAGE=4 node --test test/test-integration.mjs # Container replication (needs Docker + API)
444
+ TEST_STAGE=5 node --test test/test-integration.mjs # Full log-incident pipeline
445
+
446
+ # Skip stages that open a terminal
447
+ SKIP_INTERACTIVE=1 node --test test/test-integration.mjs
448
+ ```
449
+
450
+ ## Project structure
451
+
452
+ ```
453
+ bin/
454
+ propensity-monitor.mjs # CLI tool (aliased as `envseed`)
455
+ dashboard.mjs # Web dashboard
456
+ lib/
457
+ hook-handler.mjs # Main hook entrypoint (sync, fast)
458
+ background-analyzer.mjs # Async LLM analysis + container replication trigger
459
+ llm-analyzer.mjs # Haiku API calls for assessment
460
+ risk-analyzer.mjs # Rule-based risk scoring
461
+ session-tracker.mjs # Session state management
462
+ log-incident.mjs # Incident archival and orchestration
463
+ simulation-orchestrator.mjs # Docker-based persona simulations
464
+ container-replicator.mjs # Environment replication with Opus
465
+ redaction-review.mjs # Interactive redaction before upload/build
466
+ personas.mjs # 10 persona definitions
467
+ s3.mjs # S3 + HTTP upload (auto-fallback)
468
+ logger.mjs # Event logging helpers
469
+ utils.mjs # Shared utilities
470
+ commands/
471
+ log-incident.md # Slash command definition
472
+ infra/
473
+ worker/ # Cloudflare Worker (upload API + LLM proxy)
474
+ src/index.js # Worker handler
475
+ wrangler.toml # Wrangler config
476
+ lambda/ # AWS Lambda version (future migration)
477
+ index.mjs # Lambda handler
478
+ deploy.sh # AWS deploy script (needs IAM role)
479
+ test/
480
+ test-risk-analyzer.mjs # Unit tests
481
+ test-integration.mjs # End-to-end integration tests (5 stages)
482
+ fixtures/ # Test cases
483
+ package.json # npm package config ("envseed")
484
+ postinstall.mjs # npm postinstall hook (sets up hooks + config)
485
+ Dockerfile.simulation # Generic simulation container
486
+ entrypoint.sh # Simulation container entrypoint
487
+ grounded_petri/ # Environment reconstruction (Python/Inspect)
488
+ world_model.py # Core solver — Docker exec + world model sim
489
+ seed_loader.py # Parse incident data into SeedData
490
+ infra.py # Shared infra (proxy, provisioner, source query)
491
+ tasks.py # Inspect task registration
492
+ prompts.py # World model system prompts
493
+ trajectory_index.py # Reference trajectory search
494
+ environment_state.py # Mutable env state tracking
495
+ pyproject.toml # Python package config (grounded-petri)
496
+ install.sh # Installation script (METR internal)
497
+ uninstall.sh # Uninstallation script
498
+ ```