archal 0.9.14 → 0.9.16

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (51) hide show
  1. package/README.md +81 -73
  2. package/bin/archal.cjs +1 -1
  3. package/clone-assets/apify/tools.json +668 -0
  4. package/{twin-assets → clone-assets}/discord/fidelity.json +1 -1
  5. package/{twin-assets → clone-assets}/discord/tools.json +510 -510
  6. package/clone-assets/github/fidelity.json +31 -0
  7. package/{twin-assets → clone-assets}/github/tools.json +113 -3
  8. package/{twin-assets → clone-assets}/google-workspace/fidelity.json +2 -2
  9. package/{twin-assets → clone-assets}/google-workspace/tools.json +10 -10
  10. package/{twin-assets → clone-assets}/jira/fidelity.json +44 -4
  11. package/{twin-assets → clone-assets}/jira/tools.json +1 -1
  12. package/clone-assets/linear/fidelity.json +36 -0
  13. package/{twin-assets → clone-assets}/linear/tools.json +1 -1
  14. package/{twin-assets → clone-assets}/ramp/fidelity.json +1 -1
  15. package/{twin-assets → clone-assets}/ramp/tools.json +1 -1
  16. package/clone-assets/slack/fidelity.json +38 -0
  17. package/{twin-assets → clone-assets}/slack/tools.json +1 -1
  18. package/clone-assets/stripe/fidelity.json +67 -0
  19. package/{twin-assets → clone-assets}/stripe/tools.json +42 -11
  20. package/clone-assets/supabase/fidelity.json +31 -0
  21. package/{twin-assets → clone-assets}/supabase/tools.json +1 -1
  22. package/clone-assets/tavily/tools.json +115 -0
  23. package/dist/cli.cjs +97983 -0
  24. package/dist/cli.d.cts +1 -0
  25. package/dist/harness.cjs +62 -0
  26. package/dist/harness.d.cts +20 -0
  27. package/dist/index.cjs +5 -87894
  28. package/dist/index.d.cts +3 -1
  29. package/dist/seed/dynamic-generator.cjs +8796 -9201
  30. package/dist/seed/dynamic-generator.d.cts +39 -0
  31. package/dist/vitest/{chunk-KTMNDJFB.js → chunk-CJJ32YQF.js} +45272 -44436
  32. package/dist/vitest/chunk-FU2VLK75.js +29296 -0
  33. package/dist/vitest/index.cjs +56931 -32004
  34. package/dist/vitest/index.d.ts +62 -28
  35. package/dist/vitest/index.js +147 -1809
  36. package/dist/vitest/runtime/hosted-session-reaper.cjs +34766 -28922
  37. package/dist/vitest/runtime/hosted-session-reaper.js +1 -2
  38. package/dist/vitest/runtime/setup-files.js +2 -3
  39. package/package.json +19 -10
  40. package/skills/eval/SKILL.md +29 -25
  41. package/skills/onboard/SKILL.md +56 -41
  42. package/skills/scenario/SKILL.md +22 -20
  43. package/skills/vitest/SKILL.md +25 -24
  44. package/dist/vitest/chunk-L6HSMJ3F.js +0 -2216
  45. package/dist/vitest/chunk-YJICENME.js +0 -1230
  46. package/dist/vitest/src-JGHX6UKK.js +0 -94
  47. package/twin-assets/github/fidelity.json +0 -13
  48. package/twin-assets/linear/fidelity.json +0 -18
  49. package/twin-assets/slack/fidelity.json +0 -20
  50. package/twin-assets/stripe/fidelity.json +0 -22
  51. package/twin-assets/supabase/fidelity.json +0 -13
package/README.md CHANGED
@@ -1,27 +1,19 @@
1
1
  # archal
2
2
 
3
- Pre-deployment testing for AI agents via Archal's hosted digital twins,
4
- including route-mode twins for Discord, GitHub, Slack, Stripe, Linear, Jira,
5
- Ramp, Supabase, and Google Workspace.
3
+ Pre-deployment testing for AI agents via Archal's hosted service clones,
4
+ including route-mode clones for Discord, GitHub, Slack, Stripe, Linear, Jira,
5
+ Ramp, Apify, Tavily, Supabase, and Google Workspace.
6
6
 
7
- ## Early access — expect rough edges
7
+ ## Known compatibility notes
8
8
 
9
- `archal@0.9.x` is early access. The CLI and twins work, but the surface is
10
- still being polished:
9
+ - `npm install -g archal` may print peer-dependency warnings from the
10
+ vendored CLI runtime. These are safe to ignore; all required modules
11
+ are bundled.
12
+ - The `vitest` integration under `archal/vitest` requires `vitest@^2.1.0`.
13
+ Projects on vitest 3 should pin a workspace to vitest 2 for Archal tests
14
+ to avoid duplicate module resolution.
11
15
 
12
- - `npm install -g archal` prints a handful of peer-dependency warnings
13
- from the vendored CLI runtime. They are safe to ignore; nothing is
14
- missing at runtime.
15
- - A few commands still print one or two help lines to `stderr` instead of
16
- `stdout`. Scripts that redirect streams will mostly see what they expect,
17
- but be aware of it.
18
- - The `vitest` integration under `archal/vitest` depends on `vitest@^2.1.0`.
19
- If your project is on vitest 3, the shared helper may resolve the wrong
20
- copy. Pinning a workspace to vitest 2 for your Archal tests fixes it.
21
-
22
- We're shipping `0.9.x` so that early users can start running the CLI against
23
- hosted twins. File anything surprising at
24
- <https://github.com/Archal-Labs/archal/issues>.
16
+ File issues at <https://github.com/Archal-Labs/archal/issues>.
25
17
 
26
18
  ## Install
27
19
 
@@ -36,13 +28,20 @@ archal login
36
28
  ```
37
29
 
38
30
  This writes credentials to `~/.archal/credentials.json`. You can also set
39
- `ARCHAL_TOKEN` directly in CI.
31
+ `ARCHAL_TOKEN` directly in CI. Use a workspace API key (`archal_ws_...`) for
32
+ CI instead of a personal token.
33
+
34
+ Workspace API keys are runtime and CI credentials bound to one workspace. They
35
+ can run clones, upload and read traces, and read usage for that workspace. They
36
+ cannot manage audit events or workspace API keys. Use an owner/admin user
37
+ credential, either `archal login` or a dashboard-issued user API key, for
38
+ workspace administration.
40
39
 
41
- ## CLI — primary use case
40
+ ## CLI
42
41
 
43
- The CLI is the primary way to use Archal today. `archal run` executes a
42
+ The CLI is the primary interface. `archal run` executes a
44
43
  scenario (a markdown file describing a task, the expected behavior, and
45
- success criteria) against a hosted twin session and reports a satisfaction
44
+ success criteria) against a hosted clone session and reports a satisfaction
46
45
  score.
47
46
 
48
47
  ### Quick start
@@ -51,35 +50,40 @@ score.
51
50
  # 1. Log in
52
51
  archal login
53
52
 
54
- # 2. Browse the twin catalog
55
- archal twin
53
+ # 2. Initialize your project
54
+ archal init
56
55
 
57
- # 3. Start a persistent session with one or more twins
58
- archal twin start github stripe
56
+ # 3. Edit .archal/harness.mjs to call your agent
59
57
 
60
- # 4. Run a scenario from .archal.json in the current directory
58
+ # 4. Run the starter scenario from .archal.json
61
59
  archal run
62
-
63
- # 5. When you're done
64
- archal twin stop
65
60
  ```
66
61
 
67
- `archal run` auto-discovers an `.archal.json` at the project root. A
68
- minimal config looks like this:
62
+ `archal init` creates `.archal.json`, `.archal/harness.mjs`, and
63
+ `scenarios/first-run.md`. The generated harness is a guarded stub: Archal
64
+ refuses to run it until you edit the file to call your real agent, so a first
65
+ run cannot be scored from placeholder text.
66
+
67
+ `archal run` auto-discovers an `.archal.json` at the project root. A minimal
68
+ config looks like this:
69
69
 
70
70
  ```json
71
71
  {
72
- "scenarios": "./scenarios",
73
- "twins": ["github", "stripe"],
72
+ "agent": {
73
+ "command": "node",
74
+ "args": [".archal/harness.mjs"]
75
+ },
76
+ "scenarios": ["scenarios/first-run.md"],
77
+ "clones": ["github", "stripe"],
74
78
  "runs": 1
75
79
  }
76
80
  ```
77
81
 
78
- ### Supported twins
82
+ ### Supported clones
79
83
 
80
- `archal twin` lists the nine twins available today:
84
+ `archal clone` lists the eleven clones available today:
81
85
 
82
- | Twin | Notes |
86
+ | Clone | Notes |
83
87
  |---|---|
84
88
  | Discord | Guilds, channels, messages, members |
85
89
  | GitHub | Repos, issues, PRs, labels, reviews |
@@ -88,6 +92,8 @@ minimal config looks like this:
88
92
  | Linear | Teams, issues, projects, cycles |
89
93
  | Jira | Projects, issues, workflows, components |
90
94
  | Ramp | Cards, transactions, reimbursements, users |
95
+ | Apify | Actors, tasks, runs, datasets |
96
+ | Tavily | Search and extraction responses |
91
97
  | Supabase | Auth, Postgres, storage, edge functions |
92
98
  | Google Workspace | Gmail, Drive, Calendar, Docs, People |
93
99
 
@@ -97,34 +103,36 @@ minimal config looks like this:
97
103
  |---|---|
98
104
  | `archal login` | Authenticate with Archal |
99
105
  | `archal logout` | Remove stored credentials |
100
- | `archal twin` | Browse the twin catalog |
101
- | `archal twin start [twins...]` | Start a hosted twin session |
102
- | `archal twin status` | Inspect the active session |
103
- | `archal twin stop` | Stop the active session |
104
- | `archal twin list` | List all your active sessions |
105
- | `archal twin attach <uuid>` | Reattach to a session by id |
106
- | `archal twin renew <seconds>` | Extend the session lifetime |
107
- | `archal twin reset` | Reset twin state without tearing down the session |
108
- | `archal twin seed <twin> <name>` | Load a named seed into a running twin |
109
- | `archal run [.archal.json]` | Run scenarios from a config file |
106
+ | `archal clone` | Browse the clone catalog |
107
+ | `archal clone start [clones...]` | Start a hosted clone session |
108
+ | `archal clone status` | Inspect the active session |
109
+ | `archal clone stop` | Stop the active session |
110
+ | `archal clone list` | List all your active sessions |
111
+ | `archal clone attach <uuid>` | Reattach to a session by id |
112
+ | `archal clone renew <seconds>` | Extend the session lifetime |
113
+ | `archal clone reset` | Reset clone state without tearing down the session |
114
+ | `archal clone seed <clone> <name>` | Load a named seed into a running clone |
115
+ | `archal run [scenario]` | Run a scenario file (or use `--config` for `.archal.json`) |
110
116
  | `archal scenario list` | Browse local and hosted scenarios |
111
- | `archal seed list [twin]` | List prebuilt twin seeds |
117
+ | `archal seed list [clone]` | List prebuilt clone seeds |
112
118
  | `archal trace` | View recent scenario traces |
113
119
  | `archal trace <name>` | View trace details for a run |
114
- | `archal usage` | Check session minutes and plan |
120
+ | `archal usage` | Check active workspace session-minutes and plan |
115
121
 
116
122
  Run `archal <command> --help` for flag details.
117
123
 
118
124
  ## Vitest integration (secondary use case)
119
125
 
120
126
  You can also import `archal/vitest` to route SDK traffic from a vitest
121
- suite through a hosted twin, with no code changes to your production
127
+ suite through a hosted clone, with no code changes to your production
122
128
  code. This is useful if you want to test the HTTP side of an integration
123
129
  without hitting real provider APIs.
124
130
 
125
- > Note: the vitest helper is still evolving. If you're just getting
126
- > started with Archal, prefer the CLI flow above it's the best-tested
127
- > path today.
131
+ > The vitest helper supports the hosted route-mode clone catalog: Apify,
132
+ > Discord, GitHub, Google Workspace, Jira, Linear, Ramp, Slack, Stripe,
133
+ > Supabase, and Tavily.
134
+ > If you only need scenario-level evaluation, the CLI flow above is
135
+ > simpler to set up.
128
136
 
129
137
  ### Minimal config
130
138
 
@@ -158,7 +166,7 @@ import Stripe from 'stripe';
158
166
 
159
167
  it('creates a customer', async () => {
160
168
  const stripe = new Stripe('sk_test_fake'); // fake key is fine
161
- const customer = await stripe.customers.create({ // goes to twin, not real Stripe
169
+ const customer = await stripe.customers.create({ // goes to clone, not real Stripe
162
170
  email: 'test@example.com',
163
171
  });
164
172
  expect(customer.id).toMatch(/^cus_/);
@@ -169,18 +177,18 @@ it('creates a customer', async () => {
169
177
 
170
178
  ```ts
171
179
  import { beforeEach } from 'vitest';
172
- import { resetArchalTwins } from 'archal/vitest';
180
+ import { resetArchalClones } from 'archal/vitest';
173
181
 
174
182
  beforeEach(async () => {
175
- await resetArchalTwins();
183
+ await resetArchalClones();
176
184
  });
177
185
  ```
178
186
 
179
187
  ### Webhook testing
180
188
 
181
- The hosted twin runs in AWS ECS, so it can't POST to your localhost.
189
+ The hosted clone runs in AWS ECS, so it can't POST to your localhost.
182
190
  Instead, your test pulls queued deliveries with `waitForArchalWebhook()`
183
- and invokes your handler directly with the exact payload the twin would
191
+ and invokes your handler directly with the exact payload the clone would
184
192
  have sent.
185
193
 
186
194
  ```ts
@@ -192,7 +200,7 @@ import { handleStripeWebhook } from './src/webhooks';
192
200
  const stripe = new Stripe('sk_test_fake');
193
201
 
194
202
  it('records a subscription when customer.subscription.created fires', async () => {
195
- // 1. Register a webhook endpoint the twin needs to know what events to queue
203
+ // 1. Register a webhook endpoint (the clone needs to know what events to queue)
196
204
  await stripe.webhookEndpoints.create({
197
205
  url: 'http://test.local/stripe-wh',
198
206
  enabled_events: ['customer.subscription.created'],
@@ -223,7 +231,7 @@ it('records a subscription when customer.subscription.created fires', async () =
223
231
  | Slack | ✅ | Receiver-side signature only |
224
232
  | Jira | ✅ | Delivered via history buffer (also POSTed to registered URLs) |
225
233
  | Linear | ✅ | Delivered via history buffer (also POSTed to registered URLs) |
226
- | Supabase | ❌ | Database-triggered test against real Postgres |
234
+ | Supabase | ❌ | Database-triggered; test against real Postgres |
227
235
  | Google Workspace | ❌ | GCP Pub/Sub push notifications, not webhooks |
228
236
 
229
237
  **Parallel workers caveat**: `waitForArchalWebhook()` consumes deliveries
@@ -233,18 +241,18 @@ depend on webhook events, set `testIsolation: 'serial'` in your config.
233
241
 
234
242
  ### Test isolation across parallel workers
235
243
 
236
- Each vitest worker is routed to its own per-worker state on the twin —
237
- parallel tests across workers don't see each other's writes. The
244
+ Each vitest worker is routed to its own per-worker state on the clone,
245
+ so parallel tests across workers don't see each other's writes. The
238
246
  integration reads `VITEST_WORKER_ID` in each worker process and tags
239
- every outbound SDK request with an `X-Archal-Worker-Id` header. The twin
247
+ every outbound SDK request with an `X-Archal-Worker-Id` header. The clone
240
248
  maintains a separate state engine per worker id, seeded from the
241
249
  baseline on first request.
242
250
 
243
- **Isolation-enabled twins**: Stripe, GitHub, Slack, Jira, Linear.
251
+ **Isolation-enabled clones**: Stripe, GitHub, Slack, Jira, Linear.
244
252
 
245
- **Twins without isolation** (Supabase, Google Workspace, Ramp) fall back
253
+ **Clones without isolation** (Supabase, Google Workspace, Ramp) fall back
246
254
  to shared state. If your tests depend on global assertions against those
247
- twins, set `testIsolation: 'serial'`.
255
+ clones, set `testIsolation: 'serial'`.
248
256
 
249
257
  ## Authentication
250
258
 
@@ -255,8 +263,8 @@ In priority order:
255
263
 
256
264
  This auth is only for Archal's hosted session provisioning. Your app's
257
265
  provider credentials do not need to be real when traffic is routed through
258
- a twin use placeholder tokens that satisfy the SDK's local validation
259
- rules, such as `sk_test_fake`, `ghp_fake_token_for_twin`, or a dummy
266
+ a clone. Use placeholder tokens that satisfy the SDK's local validation
267
+ rules, such as `sk_test_fake`, `ghp_fake_token_for_clone`, or a dummy
260
268
  Google bearer token.
261
269
 
262
270
  ## What you'll see in the terminal
@@ -266,18 +274,18 @@ On the first `archal run`, session provisioning takes about 30 seconds
266
274
  the wait is visible:
267
275
 
268
276
  ```
269
- [archal] provisioning stripe twin... 5s
270
- [archal] provisioning stripe twin... 10s
277
+ [archal] provisioning stripe clone... 5s
278
+ [archal] provisioning stripe clone... 10s
271
279
  ...
272
280
  ```
273
281
 
274
282
  Subsequent runs with the same configuration reuse the warm session
275
283
  (~2 seconds).
276
284
 
277
- At the end of every run, the estimated twin-minute usage is printed:
285
+ At the end of every run, the estimated workspace session-minute usage is printed:
278
286
 
279
287
  ```
280
- [archal] ~2 twin-minutes for this run (38.5s × 1 twin: stripe)
288
+ [archal] ~2 clone-minutes for this run (38.5s × 1 clone: stripe)
281
289
  ```
282
290
 
283
291
  ## Docs
package/bin/archal.cjs CHANGED
@@ -3,7 +3,7 @@
3
3
  const { existsSync } = require("node:fs");
4
4
  const { join } = require("node:path");
5
5
 
6
- const localDist = join(__dirname, "..", "dist", "index.cjs");
6
+ const localDist = join(__dirname, "..", "dist", "cli.cjs");
7
7
 
8
8
  if (existsSync(localDist)) {
9
9
  require(localDist);