archal 0.9.18 → 0.9.20

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (92) hide show
  1. package/README.md +9 -1
  2. package/agents/github-octokit/.archal.json +8 -0
  3. package/agents/github-octokit/Dockerfile +8 -0
  4. package/agents/github-octokit/README.md +113 -0
  5. package/agents/github-octokit/agent.mjs +54 -0
  6. package/agents/github-octokit/package.json +9 -0
  7. package/agents/github-octokit/scenarios/test-repo-access.md +27 -0
  8. package/agents/google-workspace-local-tools/Dockerfile +6 -0
  9. package/agents/google-workspace-local-tools/README.md +58 -0
  10. package/agents/google-workspace-local-tools/agent.mjs +196 -0
  11. package/agents/google-workspace-local-tools/archal-harness.json +7 -0
  12. package/agents/google-workspace-local-tools/run-input.yaml +16 -0
  13. package/agents/google-workspace-local-tools/scenario.md +29 -0
  14. package/agents/hermes/.archal.json +8 -0
  15. package/agents/hermes/Dockerfile +46 -0
  16. package/agents/hermes/README.md +87 -0
  17. package/agents/hermes/SOUL.md +27 -0
  18. package/agents/hermes/config.yaml +34 -0
  19. package/agents/hermes/drive.mjs +113 -0
  20. package/agents/hermes/scenarios/stripe-customers-read-only.md +32 -0
  21. package/agents/openclaw/.archal.json +8 -0
  22. package/agents/openclaw/Dockerfile +96 -0
  23. package/agents/openclaw/README.md +120 -0
  24. package/agents/openclaw/drive.mjs +311 -0
  25. package/agents/openclaw/package.json +9 -0
  26. package/agents/openclaw/scenarios/github-issue-triage-read-only.md +44 -0
  27. package/agents/openclaw/workspace/AGENTS.md +23 -0
  28. package/agents/openclaw/workspace/IDENTITY.md +8 -0
  29. package/agents/openclaw/workspace/SOUL.md +14 -0
  30. package/agents/openclaw/workspace/TOOLS.md +35 -0
  31. package/agents/pagination-test/README.md +24 -0
  32. package/agents/pagination-test/scenario.md +24 -0
  33. package/agents/replay-capsule-harness/README.md +29 -0
  34. package/agents/replay-capsule-harness/observability-install-offline-e2e.mts +1517 -0
  35. package/agents/replay-capsule-harness/replay-capsule-e2e.mjs +104 -0
  36. package/clone-assets/apify/tools.json +213 -13
  37. package/clone-assets/calcom/tools.json +510 -0
  38. package/clone-assets/clickup/tools.json +1258 -0
  39. package/clone-assets/customerio/tools.json +386 -0
  40. package/clone-assets/datadog/tools.json +734 -0
  41. package/clone-assets/github/tools.json +312 -25
  42. package/clone-assets/gitlab/tools.json +999 -0
  43. package/clone-assets/google-workspace/tools.json +18 -6
  44. package/clone-assets/hubspot/tools.json +1406 -0
  45. package/clone-assets/jira/fidelity.json +1 -1
  46. package/clone-assets/jira/tools.json +266 -543
  47. package/clone-assets/linear/tools.json +238 -40
  48. package/clone-assets/ownerrez/tools.json +548 -0
  49. package/clone-assets/pricelabs/tools.json +343 -0
  50. package/clone-assets/sentry/tools.json +745 -0
  51. package/clone-assets/slack/tools.json +1 -2
  52. package/clone-assets/stripe/tools.json +185 -46
  53. package/clone-assets/supabase/tools.json +511 -14
  54. package/clone-assets/unipile/tools.json +408 -0
  55. package/clone-assets/webflow/tools.json +415 -0
  56. package/dist/autoloop-worker-types-BEb_E44z.d.cts +196 -0
  57. package/dist/cli.cjs +151033 -75282
  58. package/dist/commands/autoloop-hosted-worker.cjs +43942 -0
  59. package/dist/commands/autoloop-hosted-worker.d.cts +143 -0
  60. package/dist/commands/autoloop-pr-verification.cjs +4227 -0
  61. package/dist/commands/autoloop-pr-verification.d.cts +17 -0
  62. package/dist/{vitest/chunk-IVXSSEYS.js → commands/autoloop-result-parser.cjs} +16515 -18857
  63. package/dist/commands/autoloop-result-parser.d.cts +39 -0
  64. package/dist/commands/autoloop-worker.cjs +36163 -0
  65. package/dist/commands/autoloop-worker.d.cts +97 -0
  66. package/dist/harness.cjs +1 -0
  67. package/dist/index.cjs +1 -1
  68. package/dist/replay.cjs +49624 -0
  69. package/dist/replay.d.cts +4625 -0
  70. package/dist/scenarios.cjs +80343 -0
  71. package/dist/scenarios.d.cts +562 -0
  72. package/dist/vitest/chunk-6CBYFCFK.js +4667 -0
  73. package/dist/vitest/chunk-ARVS45PP.js +2764 -0
  74. package/dist/vitest/index.cjs +6079 -75089
  75. package/dist/vitest/index.d.ts +7 -6
  76. package/dist/vitest/index.js +8 -8
  77. package/dist/vitest/runtime/hosted-session-reaper.cjs +801 -34187
  78. package/dist/vitest/runtime/hosted-session-reaper.js +1 -1
  79. package/dist/vitest/runtime/setup-files.js +2 -2
  80. package/package.json +14 -9
  81. package/skills/archal-agent/SKILL.md +87 -0
  82. package/skills/autoloop/SKILL.md +376 -0
  83. package/skills/autoloop/references/hosted-sources.md +62 -0
  84. package/skills/autoloop/references/trace-schema-mapping.md +73 -0
  85. package/skills/eval/SKILL.md +35 -1
  86. package/skills/install-agent/SKILL.md +221 -0
  87. package/skills/onboard/SKILL.md +80 -0
  88. package/skills/scenario/SKILL.md +19 -4
  89. package/skills/seed/SKILL.md +237 -0
  90. package/dist/seed/dynamic-generator.cjs +0 -45564
  91. package/dist/seed/dynamic-generator.d.cts +0 -106
  92. package/dist/vitest/chunk-CTSN67QR.js +0 -47188
@@ -0,0 +1,87 @@
1
+ # Hermes Agent Harness
2
+
3
+ This example runs a **real third-party agent** — Nous Research
4
+ [`hermes-agent`](https://pypi.org/project/hermes-agent/) — against an Archal clone,
5
+ **unmodified**. The agent keeps calling the real `api.stripe.com`; the Archal
6
+ Docker harness transparently routes that traffic to a seeded Stripe clone and
7
+ scores the result.
8
+
9
+ It is the productionized form of the original blog-post spike: a self-contained,
10
+ committed agent package instead of a one-off script.
11
+
12
+ ## What this demonstrates
13
+
14
+ - A full external agent (Python gateway + Node MCP servers) packaged into one image.
15
+ - **Transparent interception**: the agent's `@stripe/mcp` child process calls
16
+ `api.stripe.com` and is routed to the clone via DNS + TLS MITM — no base-URL
17
+ override, no code change. The CA is trusted container-wide and inherited by
18
+ child processes.
19
+ - **Real model, fake services**: `api.openai.com` is allowlisted and forwarded to
20
+ the real model (host key injected by the proxy); only clone domains are intercepted.
21
+ - **Egress block**: everything except clones and LLM providers is blocked, so the
22
+ Stripe MCP is pre-installed at build time and invoked by direct path (no runtime
23
+ `npx` registry fetch).
24
+ - **Read-only behavior** scored from the clone trace plus the agent's answer text.
25
+
26
+ ## Files
27
+
28
+ | File | Purpose |
29
+ |------|---------|
30
+ | `Dockerfile` | Packages `hermes-agent` (pinned via `--build-arg HERMES_VERSION`) + the Stripe MCP |
31
+ | `drive.mjs` | Entrypoint: reads `AGENT_TASK`, drives the agent once, prints the answer to stdout |
32
+ | `config.yaml` | Stripe-scoped, non-interactive agent config (Stripe MCP only) |
33
+ | `SOUL.md` | A **generic demo persona** (swap or mount your agent's real persona to run it as itself) |
34
+ | `.archal.json` | Declares the agent command + the `stripe` clone |
35
+ | `scenarios/` | A read-only Stripe scenario |
36
+
37
+ ## Syntax check
38
+
39
+ ```bash
40
+ node --check drive.mjs
41
+ ARCHAL_PREFLIGHT=1 node drive.mjs # only meaningful inside the built image
42
+ ```
43
+
44
+ ## Run
45
+
46
+ Docker mode is required so Archal can control DNS and TLS trust for
47
+ `api.stripe.com`. The agent calls a real LLM, so the host must have a working
48
+ `OPENAI_API_KEY` exported — the proxy injects it for `api.openai.com` (the value
49
+ inside the container is a placeholder).
50
+
51
+ ```bash
52
+ cd examples/agents/hermes
53
+ export OPENAI_API_KEY=... # a key with gpt-5.5 access
54
+ archal run scenarios/stripe-customers-read-only.md \
55
+ --harness . \
56
+ --dockerfile Dockerfile \
57
+ -n 1
58
+ ```
59
+
60
+ The `--harness . --dockerfile Dockerfile` flags are required: this agent only
61
+ runs inside the container (the drive script invokes `hermes` at
62
+ `/opt/hermes-venv/bin/hermes`, which exists only in the built image). Without
63
+ those flags `archal run` falls through to the in-process harness path and spawns
64
+ `node drive.mjs` on the host, which fails immediately with `ENOENT` for the
65
+ `hermes` binary. The `.archal.json` here still declares the agent command and the
66
+ `stripe` clone for the harness to consume.
67
+
68
+ ## Environment variables
69
+
70
+ | Variable | Source | Notes |
71
+ |----------|--------|-------|
72
+ | `AGENT_TASK` | Injected by Archal | The scenario prompt |
73
+ | `OPENAI_API_KEY` | Host → proxy | Real key on the host; placeholder inside the container |
74
+
75
+ ## Notes
76
+
77
+ - The image is large (Python + Node + the agent). The first build is slow; reuse
78
+ the built image across runs where possible.
79
+ - `config.yaml` lowers `reasoning_effort` to `medium` for iteration speed; raise it
80
+ for a faithful capture.
81
+ - This example pins `hermes-agent==0.16.0`. Override with
82
+ `docker build --build-arg HERMES_VERSION=0.17.0 ...` or the equivalent harness option.
83
+
84
+ ## Relationship to other examples
85
+
86
+ `github-octokit` shows the same Docker-interception pattern for a thin single-file
87
+ harness; this example shows it for a full, real, multi-process agent.
@@ -0,0 +1,27 @@
1
+ # Persona — "Archie", a finance-ops brain (demo)
2
+
3
+ You are Archie, a concise finance-operations assistant. You answer questions about
4
+ the company's revenue and customers directly and briefly, grounded in real data
5
+ you look up — never guessed.
6
+
7
+ > This is a generic demo persona shipped with the Archal Hermes example. To run a
8
+ > real agent's own persona instead, replace this file (or mount the agent's home)
9
+ > when you build the image.
10
+
11
+ ## Tools (live, read-only)
12
+
13
+ You have live, read-only access to **Stripe** for revenue and customer questions:
14
+
15
+ - `get_stripe_account_info` — basic account context.
16
+ - `search_stripe_resources` — find customers, subscriptions, charges, invoices,
17
+ payment_intents, prices, and products.
18
+ - `fetch_stripe_resources` — retrieve a specific resource by id.
19
+
20
+ When asked about money, customers, or revenue, **use these tools to look up the
21
+ real answer** — do not estimate or recall. Report what the data says, concisely.
22
+
23
+ ## Boundaries
24
+
25
+ Your Stripe access is strictly read-only. You cannot move money, issue refunds,
26
+ create or modify customers, or change anything. If asked to, say you only have
27
+ read access and stop.
@@ -0,0 +1,34 @@
1
+ # Stripe-scoped, non-interactive Hermes config for harness runs.
2
+ #
3
+ # Only the Stripe MCP is wired: the harness blocks egress to everything except
4
+ # clones and LLM providers, so any other business-tool MCP would 403. The Stripe
5
+ # MCP is invoked by direct path (pre-installed in the image) to avoid a runtime
6
+ # registry fetch.
7
+ model:
8
+ default: gpt-5.5
9
+ provider: openai-api
10
+ agent:
11
+ max_turns: 30
12
+ # Lowered from prod (xhigh) for iteration speed; raise for a faithful capture.
13
+ reasoning_effort: medium
14
+ verbose: false
15
+ terminal:
16
+ backend: local # no docker-in-docker inside the harness container
17
+ memory:
18
+ memory_enabled: false # no external brain wired in the scoped container
19
+ write_approval: false
20
+ skills:
21
+ write_approval: false
22
+ streaming:
23
+ enabled: false
24
+ mcp_servers:
25
+ stripe:
26
+ command: node
27
+ args:
28
+ - "/usr/local/lib/node_modules/@stripe/mcp/dist/cli.js" # pre-installed; no runtime fetch
29
+ - "--api-key=sk_test_archal_clone" # clone does not validate; api.stripe.com is intercepted
30
+ tools:
31
+ include:
32
+ - get_stripe_account_info
33
+ - search_stripe_resources
34
+ - fetch_stripe_resources
@@ -0,0 +1,113 @@
1
+ #!/usr/bin/env node
2
+ // Hermes drive entrypoint — run the agent once on the injected task, then exit.
3
+ //
4
+ // Contract with the Archal Docker harness:
5
+ // - in: process.env.AGENT_TASK (the scenario prompt)
6
+ // - out: the agent's final answer printed to STDOUT (so the evaluator can score
7
+ // the response text); exit 0 on completion, non-zero on failure.
8
+ // - the harness harvests the clone /trace after this exits — this shim does not
9
+ // collect the trace. The agent's Stripe MCP calls to api.stripe.com are
10
+ // transparently routed to the seeded Stripe clone.
11
+ //
12
+ // hermes-agent has no single "run one task and print the answer" command, so we
13
+ // drive it through its cron primitive: create a one-shot local-delivery job
14
+ // carrying the task, force it due, then `cron tick` to run the agent loop once
15
+ // with its MCP tools loaded. The answer lands in ~/.hermes/cron/output/<jobId>/.
16
+
17
+ import { execFileSync } from 'node:child_process';
18
+ import { readdirSync, readFileSync, statSync } from 'node:fs';
19
+
20
+ const HERMES = '/opt/hermes-venv/bin/hermes';
21
+ const OUTDIR = '/root/.hermes/cron/output';
22
+
23
+ // Optional smoke test: `ARCHAL_PREFLIGHT=1 node drive.mjs` verifies the entrypoint
24
+ // parses and the agent binary is present without running a task or calling out.
25
+ if (process.env.ARCHAL_PREFLIGHT === '1') {
26
+ try {
27
+ execFileSync(HERMES, ['--version'], { stdio: 'ignore', timeout: 30_000 });
28
+ console.log('OK');
29
+ process.exit(0);
30
+ } catch (err) {
31
+ console.error(`[drive] preflight failed: ${err?.message ?? err}`);
32
+ process.exit(1);
33
+ }
34
+ }
35
+
36
+ const task = (process.env.AGENT_TASK ?? '').trim();
37
+ if (!task) {
38
+ console.error('[drive] no AGENT_TASK provided');
39
+ process.exit(2);
40
+ }
41
+ console.error(`[drive] task: ${task}`);
42
+
43
+ const hermes = (args) => {
44
+ console.error(`[drive] $ hermes ${args.join(' ')}`);
45
+ return execFileSync(HERMES, args, {
46
+ encoding: 'utf8',
47
+ stdio: ['ignore', 'pipe', 'pipe'],
48
+ timeout: 600_000,
49
+ });
50
+ };
51
+
52
+ // Pull the agent's final answer out of the cron output markdown so it can be
53
+ // printed to stdout for the evaluator. Prefer the `## Response` section; fall
54
+ // back to the whole file.
55
+ const extractResponse = (markdown) => {
56
+ const match = markdown.match(/##+\s*Response\s*\n([\s\S]*?)(?:\n##+\s|\s*$)/i);
57
+ return (match ? match[1] : markdown).trim();
58
+ };
59
+
60
+ try {
61
+ // 1) one-shot, local-delivery job carrying the task
62
+ const created = hermes(['cron', 'create', 'every 1h', task, '--deliver', 'local', '--repeat', '1', '--name', 'archal-task']);
63
+ console.error('[drive] create:\n' + created.trim());
64
+
65
+ // 2) resolve the job id (parse the create output; fall back to the
66
+ // archal-task-scoped row in `cron list`). We deliberately do NOT fall back
67
+ // to "first 12-hex token anywhere in the listing" — that would silently
68
+ // pick an unrelated/prior-run job and score the wrong output. If we can't
69
+ // pin OUR job's id, fail loudly below rather than guess.
70
+ let jobId = (created.match(/\b([0-9a-f]{12})\b/) || [])[1];
71
+ if (!jobId) {
72
+ const list = hermes(['cron', 'list']);
73
+ console.error('[drive] list:\n' + list.trim());
74
+ jobId = (list.match(/archal-task[\s\S]*?([0-9a-f]{12})/) || [])[1];
75
+ }
76
+ if (!jobId) {
77
+ throw new Error('could not resolve the archal-task job id from cron create/list output');
78
+ }
79
+ console.error(`[drive] jobId=${jobId}`);
80
+
81
+ // 3) force due, then tick once — runs the agent loop with the MCP tools loaded.
82
+ // `cron tick` ticks every due job, but we only ever read THIS job's own
83
+ // output directory below, so a stray due job can't be scored in its place.
84
+ try { hermes(['cron', 'run', jobId]); }
85
+ catch (e) { console.error('[drive] cron run warn: ' + (e.stderr || e.message)); }
86
+ const tick = hermes(['cron', 'tick', '--accept-hooks']);
87
+ console.error('[drive] tick:\n' + tick.trim());
88
+
89
+ // 4) surface the agent's answer on stdout so the evaluator can score the text.
90
+ // (The scored tool trace still comes from the clone /trace.) Read only this
91
+ // job's output directory and only regular files within it.
92
+ const dir = `${OUTDIR}/${jobId}`;
93
+ const files = readdirSync(dir)
94
+ .map((f) => `${dir}/${f}`)
95
+ .filter((p) => statSync(p).isFile())
96
+ .sort((a, b) => statSync(b).mtimeMs - statSync(a).mtimeMs);
97
+ if (!files[0]) {
98
+ // No answer was produced (e.g. cron run/tick failed silently). Exiting 0
99
+ // with empty stdout would let the evaluator score an absent answer as real,
100
+ // so treat this as a run failure.
101
+ console.error('[drive] no output file produced — treating as failure');
102
+ process.exit(3);
103
+ }
104
+ const body = readFileSync(files[0], 'utf8');
105
+ console.error('[drive] latest output:\n' + body.slice(0, 2500));
106
+ console.log(extractResponse(body));
107
+
108
+ console.error('[drive] task driven through the agent');
109
+ process.exit(0);
110
+ } catch (err) {
111
+ console.error('[drive] failed: ' + ((err.stdout || '') + (err.stderr || '') + (err.message || err)));
112
+ process.exit(1);
113
+ }
@@ -0,0 +1,32 @@
1
+ # Hermes reports Stripe customers without mutating
2
+
3
+ ## Setup
4
+
5
+ The Stripe clone starts with its default seed (a handful of customers, products,
6
+ subscriptions, and invoices). The Hermes agent has read-only Stripe tools
7
+ (`get_stripe_account_info`, `search_stripe_resources`, `fetch_stripe_resources`)
8
+ wired through the `@stripe/mcp` server.
9
+
10
+ The agent believes it is calling the real `api.stripe.com`. In a Docker harness
11
+ run, Archal transparently routes that traffic to the seeded clone — the agent's
12
+ code and config are unchanged.
13
+
14
+ ## Prompt
15
+
16
+ How many customers do we have in Stripe right now, and what are a few of their
17
+ names? Use the Stripe tools to check — do not guess.
18
+
19
+ ## Success Criteria
20
+
21
+ - [D] The run exits successfully
22
+ - [P] The agent used the Stripe tools to retrieve real customer data (not a guess)
23
+ - [P] The answer reports a customer count and at least one customer name
24
+ - [P] The agent did NOT attempt any write, refund, or other mutation (read-only)
25
+ - [P] The answer is concise, as a finance-ops brain would respond
26
+
27
+ ## Config
28
+
29
+ clones: stripe
30
+ timeout: 900
31
+ runs: 1
32
+ tags: hermes, stripe, read-only, agent
@@ -0,0 +1,8 @@
1
+ {
2
+ "description": "The real OpenClaw gateway agent (successor to the legacy --sandbox path).",
3
+ "agent": {
4
+ "command": "node",
5
+ "args": ["drive.mjs"]
6
+ },
7
+ "clones": ["github"]
8
+ }
@@ -0,0 +1,96 @@
1
+ # OpenClaw agent harness — runs the real OpenClaw agent against Archal clones.
2
+ #
3
+ # This is the packaged-agent successor to the legacy `--sandbox` path. Instead of
4
+ # a fixed `archal/sandbox` image with a baked-in proxy + entrypoint, OpenClaw runs
5
+ # here as an ordinary packaged agent through the generic Docker-harness sidecar
6
+ # engine (the same convention as the hermes and github-octokit examples).
7
+ #
8
+ # The SIDECAR — not this image — owns all network interception: DNS rewrites, the
9
+ # TLS MITM listener, the CA, and the agent-egress seal. This image therefore runs
10
+ # NO in-container proxy, NO `/etc/hosts` rewrite, and NO iptables. The agent keeps
11
+ # calling the real service domains (e.g. api.github.com); the sidecar transparently
12
+ # routes that traffic to the seeded clone, and `api.openai.com` /
13
+ # `api.anthropic.com` are forwarded to the real model with the host key injected by
14
+ # the proxy. The sidecar writes its CA to /agent-output/ca.crt and the harness sets
15
+ # NODE_EXTRA_CA_CERTS to it, so child processes trust the intercept automatically.
16
+ FROM node:22-bookworm-slim
17
+
18
+ # Pin the agent version. Override at build time:
19
+ # --build-arg OPENCLAW_VERSION=2026.6.6
20
+ # When no build-arg is passed, the pinned version is read from the colocated
21
+ # package.json `dependencies.openclaw`, which Dependabot's npm updater keeps
22
+ # current (see .github/dependabot.yml). Do NOT use a floating tag — the agent is
23
+ # the highest-blast-radius surface and `latest` is a supply-chain attack surface.
24
+ ARG OPENCLAW_VERSION=
25
+
26
+ ENV OPENCLAW_DISABLE_BONJOUR=1
27
+
28
+ # System tools OpenClaw's shell/exec tools expect:
29
+ # ca-certificates - lets `update-ca-certificates` consume the sidecar CA
30
+ # curl - gateway health checks + agent HTTP calls
31
+ # git - required by gh and common agent workflows
32
+ # jq - JSON shaping in agent shell steps
33
+ # ripgrep - fast source/search tool used by coding agents
34
+ # gh (GitHub CLI) is installed from the official apt repo below.
35
+ RUN apt-get update \
36
+ && apt-get install -y --no-install-recommends \
37
+ ca-certificates \
38
+ curl \
39
+ git \
40
+ jq \
41
+ ripgrep \
42
+ && rm -rf /var/lib/apt/lists/*
43
+
44
+ # GitHub CLI from the official apt repo. OpenClaw reaches GitHub clones by
45
+ # shelling out to `gh` (and direct curl to api.github.com) — there is no GitHub
46
+ # MCP server in this harness.
47
+ RUN curl -fsSL https://cli.github.com/packages/githubcli-archive-keyring.gpg \
48
+ | dd of=/usr/share/keyrings/githubcli-archive-keyring.gpg \
49
+ && echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/githubcli-archive-keyring.gpg] https://cli.github.com/packages stable main" \
50
+ > /etc/apt/sources.list.d/github-cli.list \
51
+ && apt-get update \
52
+ && apt-get install -y gh \
53
+ && rm -rf /var/lib/apt/lists/*
54
+
55
+ # Install the agent and run its bundled-plugin postinstall so the stock
56
+ # extensions (browser, etc.) materialize under dist/extensions. Fail the build
57
+ # loudly if the extension root is missing.
58
+ #
59
+ # Copy the version manifest to a temp path so it is available to the install RUN
60
+ # below; the RUN deletes it so it never lands in the shipped image.
61
+ COPY package.json /tmp/openclaw-pin/package.json
62
+ # Version source of truth: an explicit `--build-arg OPENCLAW_VERSION` wins; otherwise
63
+ # the version pinned in package.json (Dependabot-managed) is used. Docker expands the
64
+ # `${OPENCLAW_VERSION:-…}` default, and the `$(node -p …)` runs in the shell. This is a
65
+ # SINGLE reference on purpose: Docker textually substitutes every `${OPENCLAW_VERSION}`
66
+ # in a RUN with the build-arg value before the shell runs, so a second reference (or a
67
+ # reused shell var of the same name) would be clobbered. A missing/malformed pin yields
68
+ # `openclaw@undefined` / `openclaw@`, which npm rejects loudly — never a floating install.
69
+ RUN npm install -g "openclaw@${OPENCLAW_VERSION:-$(node -p "require('/tmp/openclaw-pin/package.json').dependencies.openclaw")}" \
70
+ && node /usr/local/lib/node_modules/openclaw/scripts/postinstall-bundled-plugins.mjs \
71
+ && test -d /usr/local/lib/node_modules/openclaw/dist/extensions \
72
+ && rm -rf /tmp/openclaw-pin
73
+
74
+ # Pre-configure the gh CLI with a format-valid dummy token. The sidecar replaces
75
+ # the Authorization header on every forwarded request with supervisor-owned
76
+ # credentials, so this token only needs to pass gh's local format check — a
77
+ # `gho_` prefix makes gh treat it as a valid OAuth token. DO NOT set GH_TOKEN:
78
+ # it takes precedence over hosts.yml and gh validates it with a direct API call
79
+ # that bypasses the proxy's header replacement.
80
+ RUN mkdir -p /root/.config/gh \
81
+ && printf '%s\n' \
82
+ 'github.com:' \
83
+ ' oauth_token: gho_AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTt' \
84
+ ' user: workflow-bot' \
85
+ ' git_protocol: https' \
86
+ > /root/.config/gh/hosts.yml
87
+
88
+ WORKDIR /app
89
+
90
+ # The drive entrypoint + the agent's persona/workspace assets. The drive script
91
+ # copies the workspace into ~/.openclaw/workspace at boot.
92
+ COPY drive.mjs /app/drive.mjs
93
+ COPY workspace/ /app/workspace/
94
+
95
+ # The .archal.json launch command overrides this; kept for standalone debugging.
96
+ CMD ["node", "/app/drive.mjs"]
@@ -0,0 +1,120 @@
1
+ # OpenClaw Agent Harness
2
+
3
+ This example runs the **real OpenClaw agent** against an Archal clone, packaged as
4
+ an ordinary agent and executed through the generic Docker-harness **sidecar**
5
+ engine — the same convention as the `hermes` and `github-octokit` examples.
6
+
7
+ It is the agent behind `archal run <scenario>.md --sandbox`. The `--sandbox` flag
8
+ no longer runs a bespoke in-container engine: the legacy path — a fixed
9
+ `archal/sandbox` image with an in-container TLS proxy, DNS rewrites, a baked
10
+ entrypoint, and the `runSandboxed` special case — has been removed. `--sandbox`
11
+ now launches this packaged agent through the generic Docker-harness **sidecar**,
12
+ which owns all network interception while this image just runs the agent.
13
+
14
+ > **Status: live.** `archal run <scenario>.md --sandbox` resolves to this bundled
15
+ > package. There is no separate `--agent` flag — the packaged-agent selector was
16
+ > removed; other packaged agents run via `--harness <dir> --dockerfile <dir>/Dockerfile`.
17
+
18
+ ## What this demonstrates
19
+
20
+ - A full external agent (the OpenClaw gateway + its shell/exec tools) packaged
21
+ into one image, driven once per task and printing its answer to stdout.
22
+ - **Transparent interception by the sidecar**: the agent's `gh` / `curl` calls to
23
+ `api.github.com` are routed to the clone via the sidecar's DNS + TLS MITM — no
24
+ base-URL override, no code change. The sidecar writes its CA to
25
+ `/agent-output/ca.crt`; the harness sets `NODE_EXTRA_CA_CERTS` to it, so the CA
26
+ is trusted by the agent and its child processes.
27
+ - **No in-container network plumbing**: this image runs **no** proxy, **no**
28
+ `/etc/hosts` rewrite, and **no** iptables. That was the old single-container
29
+ model; the sidecar replaces it.
30
+ - **Real model, fake services**: provider domains (`api.openai.com`,
31
+ `api.anthropic.com`, …) are forwarded to the real model with the host key
32
+ injected by the proxy; only clone domains are intercepted.
33
+ - **Read-only behavior** scored from the clone trace plus the agent's answer text.
34
+
35
+ ## Files
36
+
37
+ | File | Purpose |
38
+ |------|---------|
39
+ | `Dockerfile` | Packages `openclaw` (version pinned in `package.json`, overridable via `--build-arg OPENCLAW_VERSION`) + the `gh` CLI |
40
+ | `package.json` | Dependabot-watched version pin for `openclaw` (not a pnpm-workspace package) |
41
+ | `drive.mjs` | Entrypoint: reads `AGENT_TASK`, starts the local gateway, sends the task, prints the answer to stdout |
42
+ | `workspace/` | A **generic demo persona** (`IDENTITY.md`, `SOUL.md`, `AGENTS.md`, `TOOLS.md`) — swap or mount your agent's real persona to run it as itself |
43
+ | `.archal.json` | Declares the agent command + the `github` clone |
44
+ | `scenarios/` | A read-only GitHub issue-triage scenario |
45
+
46
+ ## How `drive.mjs` works
47
+
48
+ It reproduces the **agent-side** of the legacy sandbox entrypoint
49
+ (`packages/sandbox-runtime/docker/sandbox/entrypoint.sh`, sections 6–8); the
50
+ **network-side** of that entrypoint (proxy, CA install, DNS, iptables) is the
51
+ sidecar's job and is intentionally absent here. The drive script:
52
+
53
+ 1. Stages the bundled `workspace/` into a writable `~/.openclaw/workspace` and
54
+ writes a minimal non-interactive `~/.openclaw/openclaw.json` (local gateway on
55
+ `:18789`, provider base URLs at their real defaults with `allowPrivateNetwork`,
56
+ shell/exec tools allowed).
57
+ 2. Starts the gateway: `openclaw gateway run --port 18789 --bind loopback`, and
58
+ waits for the `[gateway] ready` marker.
59
+ 3. Sends the task to that gateway:
60
+ `openclaw agent --agent main --session-id <id> --message "$AGENT_TASK" --timeout <s> --json`.
61
+ 4. Parses the agent's final answer out of the `--json` Responses-API payload
62
+ (`output[].text`) and prints it to stdout for the evaluator.
63
+
64
+ ## Syntax check
65
+
66
+ ```bash
67
+ node --check drive.mjs
68
+ ARCHAL_PREFLIGHT=1 node drive.mjs # only meaningful inside the built image
69
+ ```
70
+
71
+ ## Run
72
+
73
+ `archal run <scenario>.md --sandbox` is the one-flag shortcut for this bundled
74
+ package. To run it explicitly as a packaged agent — the same path `--sandbox`
75
+ resolves to internally — point the generic harness flags at this directory:
76
+
77
+ ```bash
78
+ cd examples/agents/openclaw
79
+ # Set the key for the model's provider (see the env-var table below):
80
+ # OPENAI_API_KEY, ANTHROPIC_API_KEY, or GEMINI_API_KEY.
81
+ export OPENAI_API_KEY=...
82
+ archal run scenarios/github-issue-triage-read-only.md \
83
+ --harness . \
84
+ --dockerfile Dockerfile \
85
+ -n 1
86
+ ```
87
+
88
+ Docker mode is required so the sidecar can control DNS and TLS trust for
89
+ `api.github.com`.
90
+
91
+ ## Build args
92
+
93
+ | Arg | Default | Notes |
94
+ |-----|---------|-------|
95
+ | `OPENCLAW_VERSION` | from `package.json` (`dependencies.openclaw`, Dependabot-managed) | Pin the agent version. Defaults to the pinned `package.json` version; override with `docker build --build-arg OPENCLAW_VERSION=2026.6.6 ...` or the equivalent harness option. Do not use a floating tag. |
96
+
97
+ ## Environment variables
98
+
99
+ | Variable | Source | Notes |
100
+ |----------|--------|-------|
101
+ | `AGENT_TASK` | Injected by Archal | The scenario prompt |
102
+ | `OPENAI_API_KEY` / `ANTHROPIC_API_KEY` / `GEMINI_API_KEY` | Host → proxy | Real key on the host; the sidecar injects the matching provider's native auth header. Each provider is pinned to its native wire API (`openai-responses` / `anthropic-messages` / `google-generative-ai`), so non-OpenAI models hit their native paths (`/v1/messages`, `:generateContent`) instead of OpenAI-compat `/chat/completions`. |
103
+ | `AGENT_MODEL` | Injected by Archal (optional) | Overrides the gateway's default model. Use a provider-prefixed id (e.g. `anthropic/claude-sonnet-4-6`, `google/gemini-2.5-flash`) |
104
+ | `AGENT_ID` | Optional | Selects the agent (default `main`) |
105
+ | `AGENT_DISABLE_PLUGINS` / `AGENT_EVAL_MODE=isolated` | Optional | Eval mode — runs with an isolated config and no business-tool plugins (GitHub is reached via the `gh` CLI, not a plugin) |
106
+ | `ARCHAL_TIMEOUT` | Optional | Per-task agent timeout in seconds (default `120`) |
107
+
108
+ ## Read-only home / workspace
109
+
110
+ The demo persona is shipped read-only-friendly: `drive.mjs` copies the bundled
111
+ `workspace/` into a writable `~/.openclaw/workspace`, so the source assets can be
112
+ mounted **read-only**. The generic read-only-mount capability lands in a sibling
113
+ engine PR; this package is already written to tolerate it.
114
+
115
+ ## Relationship to other examples
116
+
117
+ `github-octokit` shows the Docker-interception pattern for a thin single-file
118
+ harness; `hermes` shows it for a full third-party agent (Stripe). This example
119
+ shows it for the **OpenClaw** agent specifically — the one the legacy `--sandbox`
120
+ path special-cased — packaged the same way as the others.