archal 0.9.13 → 0.9.15
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +81 -73
- package/bin/archal.cjs +1 -1
- package/clone-assets/apify/tools.json +668 -0
- package/{twin-assets → clone-assets}/discord/fidelity.json +1 -1
- package/{twin-assets → clone-assets}/discord/tools.json +510 -510
- package/clone-assets/github/fidelity.json +31 -0
- package/{twin-assets → clone-assets}/github/tools.json +113 -3
- package/{twin-assets → clone-assets}/google-workspace/fidelity.json +2 -2
- package/{twin-assets → clone-assets}/google-workspace/tools.json +10 -10
- package/{twin-assets → clone-assets}/jira/fidelity.json +44 -4
- package/{twin-assets → clone-assets}/jira/tools.json +1 -1
- package/clone-assets/linear/fidelity.json +36 -0
- package/{twin-assets → clone-assets}/linear/tools.json +1 -1
- package/{twin-assets → clone-assets}/ramp/fidelity.json +1 -1
- package/{twin-assets → clone-assets}/ramp/tools.json +1 -1
- package/clone-assets/slack/fidelity.json +38 -0
- package/{twin-assets → clone-assets}/slack/tools.json +1 -1
- package/clone-assets/stripe/fidelity.json +67 -0
- package/{twin-assets → clone-assets}/stripe/tools.json +42 -11
- package/clone-assets/supabase/fidelity.json +31 -0
- package/{twin-assets → clone-assets}/supabase/tools.json +1 -1
- package/clone-assets/tavily/tools.json +115 -0
- package/dist/cli.cjs +97917 -0
- package/dist/cli.d.cts +1 -0
- package/dist/harness.cjs +62 -0
- package/dist/harness.d.cts +20 -0
- package/dist/index.cjs +5 -87878
- package/dist/index.d.cts +3 -1
- package/dist/seed/dynamic-generator.cjs +8796 -9201
- package/dist/seed/dynamic-generator.d.cts +39 -0
- package/dist/vitest/chunk-2GY4SFKE.js +29279 -0
- package/dist/vitest/{chunk-KTMNDJFB.js → chunk-WVRVNHAX.js} +45255 -44440
- package/dist/vitest/index.cjs +56408 -31519
- package/dist/vitest/index.d.ts +61 -27
- package/dist/vitest/index.js +145 -1807
- package/dist/vitest/runtime/hosted-session-reaper.cjs +34766 -28922
- package/dist/vitest/runtime/hosted-session-reaper.js +1 -2
- package/dist/vitest/runtime/setup-files.js +2 -3
- package/package.json +19 -10
- package/skills/eval/SKILL.md +113 -0
- package/skills/onboard/SKILL.md +67 -36
- package/skills/scenario/SKILL.md +22 -20
- package/skills/vitest/SKILL.md +25 -24
- package/dist/vitest/chunk-L6HSMJ3F.js +0 -2216
- package/dist/vitest/chunk-YJICENME.js +0 -1230
- package/dist/vitest/src-JGHX6UKK.js +0 -94
- package/skills/audit/SKILL.md +0 -55
- package/skills/test/SKILL.md +0 -109
- package/twin-assets/github/fidelity.json +0 -13
- package/twin-assets/linear/fidelity.json +0 -18
- package/twin-assets/slack/fidelity.json +0 -20
- package/twin-assets/stripe/fidelity.json +0 -22
- package/twin-assets/supabase/fidelity.json +0 -13
package/README.md
CHANGED
|
@@ -1,27 +1,19 @@
|
|
|
1
1
|
# archal
|
|
2
2
|
|
|
3
|
-
Pre-deployment testing for AI agents via Archal's hosted
|
|
4
|
-
including route-mode
|
|
5
|
-
Ramp, Supabase, and Google Workspace.
|
|
3
|
+
Pre-deployment testing for AI agents via Archal's hosted service clones,
|
|
4
|
+
including route-mode clones for Discord, GitHub, Slack, Stripe, Linear, Jira,
|
|
5
|
+
Ramp, Apify, Tavily, Supabase, and Google Workspace.
|
|
6
6
|
|
|
7
|
-
##
|
|
7
|
+
## Known compatibility notes
|
|
8
8
|
|
|
9
|
-
`
|
|
10
|
-
|
|
9
|
+
- `npm install -g archal` may print peer-dependency warnings from the
|
|
10
|
+
vendored CLI runtime. These are safe to ignore; all required modules
|
|
11
|
+
are bundled.
|
|
12
|
+
- The `vitest` integration under `archal/vitest` requires `vitest@^2.1.0`.
|
|
13
|
+
Projects on vitest 3 should pin a workspace to vitest 2 for Archal tests
|
|
14
|
+
to avoid duplicate module resolution.
|
|
11
15
|
|
|
12
|
-
|
|
13
|
-
from the vendored CLI runtime. They are safe to ignore; nothing is
|
|
14
|
-
missing at runtime.
|
|
15
|
-
- A few commands still print one or two help lines to `stderr` instead of
|
|
16
|
-
`stdout`. Scripts that redirect streams will mostly see what they expect,
|
|
17
|
-
but be aware of it.
|
|
18
|
-
- The `vitest` integration under `archal/vitest` depends on `vitest@^2.1.0`.
|
|
19
|
-
If your project is on vitest 3, the shared helper may resolve the wrong
|
|
20
|
-
copy. Pinning a workspace to vitest 2 for your Archal tests fixes it.
|
|
21
|
-
|
|
22
|
-
We're shipping `0.9.x` so that early users can start running the CLI against
|
|
23
|
-
hosted twins. File anything surprising at
|
|
24
|
-
<https://github.com/Archal-Labs/archal/issues>.
|
|
16
|
+
File issues at <https://github.com/Archal-Labs/archal/issues>.
|
|
25
17
|
|
|
26
18
|
## Install
|
|
27
19
|
|
|
@@ -36,13 +28,20 @@ archal login
|
|
|
36
28
|
```
|
|
37
29
|
|
|
38
30
|
This writes credentials to `~/.archal/credentials.json`. You can also set
|
|
39
|
-
`ARCHAL_TOKEN` directly in CI.
|
|
31
|
+
`ARCHAL_TOKEN` directly in CI. Use a workspace API key (`archal_ws_...`) for
|
|
32
|
+
CI instead of a personal token.
|
|
33
|
+
|
|
34
|
+
Workspace API keys are runtime and CI credentials bound to one workspace. They
|
|
35
|
+
can run clones, upload and read traces, and read usage for that workspace. They
|
|
36
|
+
cannot manage audit events or workspace API keys. Use an owner/admin user
|
|
37
|
+
credential, either `archal login` or a dashboard-issued user API key, for
|
|
38
|
+
workspace administration.
|
|
40
39
|
|
|
41
|
-
## CLI
|
|
40
|
+
## CLI
|
|
42
41
|
|
|
43
|
-
The CLI is the primary
|
|
42
|
+
The CLI is the primary interface. `archal run` executes a
|
|
44
43
|
scenario (a markdown file describing a task, the expected behavior, and
|
|
45
|
-
success criteria) against a hosted
|
|
44
|
+
success criteria) against a hosted clone session and reports a satisfaction
|
|
46
45
|
score.
|
|
47
46
|
|
|
48
47
|
### Quick start
|
|
@@ -51,35 +50,40 @@ score.
|
|
|
51
50
|
# 1. Log in
|
|
52
51
|
archal login
|
|
53
52
|
|
|
54
|
-
# 2.
|
|
55
|
-
archal
|
|
53
|
+
# 2. Initialize your project
|
|
54
|
+
archal init
|
|
56
55
|
|
|
57
|
-
# 3.
|
|
58
|
-
archal twin start github stripe
|
|
56
|
+
# 3. Edit .archal/harness.mjs to call your agent
|
|
59
57
|
|
|
60
|
-
# 4. Run
|
|
58
|
+
# 4. Run the starter scenario from .archal.json
|
|
61
59
|
archal run
|
|
62
|
-
|
|
63
|
-
# 5. When you're done
|
|
64
|
-
archal twin stop
|
|
65
60
|
```
|
|
66
61
|
|
|
67
|
-
`archal
|
|
68
|
-
|
|
62
|
+
`archal init` creates `.archal.json`, `.archal/harness.mjs`, and
|
|
63
|
+
`scenarios/first-run.md`. The generated harness is a guarded stub: Archal
|
|
64
|
+
refuses to run it until you edit the file to call your real agent, so a first
|
|
65
|
+
run cannot be scored from placeholder text.
|
|
66
|
+
|
|
67
|
+
`archal run` auto-discovers an `.archal.json` at the project root. A minimal
|
|
68
|
+
config looks like this:
|
|
69
69
|
|
|
70
70
|
```json
|
|
71
71
|
{
|
|
72
|
-
"
|
|
73
|
-
|
|
72
|
+
"agent": {
|
|
73
|
+
"command": "node",
|
|
74
|
+
"args": [".archal/harness.mjs"]
|
|
75
|
+
},
|
|
76
|
+
"scenarios": ["scenarios/first-run.md"],
|
|
77
|
+
"clones": ["github", "stripe"],
|
|
74
78
|
"runs": 1
|
|
75
79
|
}
|
|
76
80
|
```
|
|
77
81
|
|
|
78
|
-
### Supported
|
|
82
|
+
### Supported clones
|
|
79
83
|
|
|
80
|
-
`archal
|
|
84
|
+
`archal clone` lists the eleven clones available today:
|
|
81
85
|
|
|
82
|
-
|
|
|
86
|
+
| Clone | Notes |
|
|
83
87
|
|---|---|
|
|
84
88
|
| Discord | Guilds, channels, messages, members |
|
|
85
89
|
| GitHub | Repos, issues, PRs, labels, reviews |
|
|
@@ -88,6 +92,8 @@ minimal config looks like this:
|
|
|
88
92
|
| Linear | Teams, issues, projects, cycles |
|
|
89
93
|
| Jira | Projects, issues, workflows, components |
|
|
90
94
|
| Ramp | Cards, transactions, reimbursements, users |
|
|
95
|
+
| Apify | Actors, tasks, runs, datasets |
|
|
96
|
+
| Tavily | Search and extraction responses |
|
|
91
97
|
| Supabase | Auth, Postgres, storage, edge functions |
|
|
92
98
|
| Google Workspace | Gmail, Drive, Calendar, Docs, People |
|
|
93
99
|
|
|
@@ -97,34 +103,36 @@ minimal config looks like this:
|
|
|
97
103
|
|---|---|
|
|
98
104
|
| `archal login` | Authenticate with Archal |
|
|
99
105
|
| `archal logout` | Remove stored credentials |
|
|
100
|
-
| `archal
|
|
101
|
-
| `archal
|
|
102
|
-
| `archal
|
|
103
|
-
| `archal
|
|
104
|
-
| `archal
|
|
105
|
-
| `archal
|
|
106
|
-
| `archal
|
|
107
|
-
| `archal
|
|
108
|
-
| `archal
|
|
109
|
-
| `archal run [
|
|
106
|
+
| `archal clone` | Browse the clone catalog |
|
|
107
|
+
| `archal clone start [clones...]` | Start a hosted clone session |
|
|
108
|
+
| `archal clone status` | Inspect the active session |
|
|
109
|
+
| `archal clone stop` | Stop the active session |
|
|
110
|
+
| `archal clone list` | List all your active sessions |
|
|
111
|
+
| `archal clone attach <uuid>` | Reattach to a session by id |
|
|
112
|
+
| `archal clone renew <seconds>` | Extend the session lifetime |
|
|
113
|
+
| `archal clone reset` | Reset clone state without tearing down the session |
|
|
114
|
+
| `archal clone seed <clone> <name>` | Load a named seed into a running clone |
|
|
115
|
+
| `archal run [scenario]` | Run a scenario file (or use `--config` for `.archal.json`) |
|
|
110
116
|
| `archal scenario list` | Browse local and hosted scenarios |
|
|
111
|
-
| `archal seed list [
|
|
117
|
+
| `archal seed list [clone]` | List prebuilt clone seeds |
|
|
112
118
|
| `archal trace` | View recent scenario traces |
|
|
113
119
|
| `archal trace <name>` | View trace details for a run |
|
|
114
|
-
| `archal usage` | Check session
|
|
120
|
+
| `archal usage` | Check active workspace session-minutes and plan |
|
|
115
121
|
|
|
116
122
|
Run `archal <command> --help` for flag details.
|
|
117
123
|
|
|
118
124
|
## Vitest integration (secondary use case)
|
|
119
125
|
|
|
120
126
|
You can also import `archal/vitest` to route SDK traffic from a vitest
|
|
121
|
-
suite through a hosted
|
|
127
|
+
suite through a hosted clone, with no code changes to your production
|
|
122
128
|
code. This is useful if you want to test the HTTP side of an integration
|
|
123
129
|
without hitting real provider APIs.
|
|
124
130
|
|
|
125
|
-
>
|
|
126
|
-
>
|
|
127
|
-
>
|
|
131
|
+
> The vitest helper supports the hosted route-mode clone catalog: Apify,
|
|
132
|
+
> Discord, GitHub, Google Workspace, Jira, Linear, Ramp, Slack, Stripe,
|
|
133
|
+
> Supabase, and Tavily.
|
|
134
|
+
> If you only need scenario-level evaluation, the CLI flow above is
|
|
135
|
+
> simpler to set up.
|
|
128
136
|
|
|
129
137
|
### Minimal config
|
|
130
138
|
|
|
@@ -158,7 +166,7 @@ import Stripe from 'stripe';
|
|
|
158
166
|
|
|
159
167
|
it('creates a customer', async () => {
|
|
160
168
|
const stripe = new Stripe('sk_test_fake'); // fake key is fine
|
|
161
|
-
const customer = await stripe.customers.create({ // goes to
|
|
169
|
+
const customer = await stripe.customers.create({ // goes to clone, not real Stripe
|
|
162
170
|
email: 'test@example.com',
|
|
163
171
|
});
|
|
164
172
|
expect(customer.id).toMatch(/^cus_/);
|
|
@@ -169,18 +177,18 @@ it('creates a customer', async () => {
|
|
|
169
177
|
|
|
170
178
|
```ts
|
|
171
179
|
import { beforeEach } from 'vitest';
|
|
172
|
-
import {
|
|
180
|
+
import { resetArchalClones } from 'archal/vitest';
|
|
173
181
|
|
|
174
182
|
beforeEach(async () => {
|
|
175
|
-
await
|
|
183
|
+
await resetArchalClones();
|
|
176
184
|
});
|
|
177
185
|
```
|
|
178
186
|
|
|
179
187
|
### Webhook testing
|
|
180
188
|
|
|
181
|
-
The hosted
|
|
189
|
+
The hosted clone runs in AWS ECS, so it can't POST to your localhost.
|
|
182
190
|
Instead, your test pulls queued deliveries with `waitForArchalWebhook()`
|
|
183
|
-
and invokes your handler directly with the exact payload the
|
|
191
|
+
and invokes your handler directly with the exact payload the clone would
|
|
184
192
|
have sent.
|
|
185
193
|
|
|
186
194
|
```ts
|
|
@@ -192,7 +200,7 @@ import { handleStripeWebhook } from './src/webhooks';
|
|
|
192
200
|
const stripe = new Stripe('sk_test_fake');
|
|
193
201
|
|
|
194
202
|
it('records a subscription when customer.subscription.created fires', async () => {
|
|
195
|
-
// 1. Register a webhook endpoint
|
|
203
|
+
// 1. Register a webhook endpoint (the clone needs to know what events to queue)
|
|
196
204
|
await stripe.webhookEndpoints.create({
|
|
197
205
|
url: 'http://test.local/stripe-wh',
|
|
198
206
|
enabled_events: ['customer.subscription.created'],
|
|
@@ -223,7 +231,7 @@ it('records a subscription when customer.subscription.created fires', async () =
|
|
|
223
231
|
| Slack | ✅ | Receiver-side signature only |
|
|
224
232
|
| Jira | ✅ | Delivered via history buffer (also POSTed to registered URLs) |
|
|
225
233
|
| Linear | ✅ | Delivered via history buffer (also POSTed to registered URLs) |
|
|
226
|
-
| Supabase | ❌ | Database-triggered
|
|
234
|
+
| Supabase | ❌ | Database-triggered; test against real Postgres |
|
|
227
235
|
| Google Workspace | ❌ | GCP Pub/Sub push notifications, not webhooks |
|
|
228
236
|
|
|
229
237
|
**Parallel workers caveat**: `waitForArchalWebhook()` consumes deliveries
|
|
@@ -233,18 +241,18 @@ depend on webhook events, set `testIsolation: 'serial'` in your config.
|
|
|
233
241
|
|
|
234
242
|
### Test isolation across parallel workers
|
|
235
243
|
|
|
236
|
-
Each vitest worker is routed to its own per-worker state on the
|
|
237
|
-
parallel tests across workers don't see each other's writes. The
|
|
244
|
+
Each vitest worker is routed to its own per-worker state on the clone,
|
|
245
|
+
so parallel tests across workers don't see each other's writes. The
|
|
238
246
|
integration reads `VITEST_WORKER_ID` in each worker process and tags
|
|
239
|
-
every outbound SDK request with an `X-Archal-Worker-Id` header. The
|
|
247
|
+
every outbound SDK request with an `X-Archal-Worker-Id` header. The clone
|
|
240
248
|
maintains a separate state engine per worker id, seeded from the
|
|
241
249
|
baseline on first request.
|
|
242
250
|
|
|
243
|
-
**Isolation-enabled
|
|
251
|
+
**Isolation-enabled clones**: Stripe, GitHub, Slack, Jira, Linear.
|
|
244
252
|
|
|
245
|
-
**
|
|
253
|
+
**Clones without isolation** (Supabase, Google Workspace, Ramp) fall back
|
|
246
254
|
to shared state. If your tests depend on global assertions against those
|
|
247
|
-
|
|
255
|
+
clones, set `testIsolation: 'serial'`.
|
|
248
256
|
|
|
249
257
|
## Authentication
|
|
250
258
|
|
|
@@ -255,8 +263,8 @@ In priority order:
|
|
|
255
263
|
|
|
256
264
|
This auth is only for Archal's hosted session provisioning. Your app's
|
|
257
265
|
provider credentials do not need to be real when traffic is routed through
|
|
258
|
-
a
|
|
259
|
-
rules, such as `sk_test_fake`, `
|
|
266
|
+
a clone. Use placeholder tokens that satisfy the SDK's local validation
|
|
267
|
+
rules, such as `sk_test_fake`, `ghp_fake_token_for_clone`, or a dummy
|
|
260
268
|
Google bearer token.
|
|
261
269
|
|
|
262
270
|
## What you'll see in the terminal
|
|
@@ -266,18 +274,18 @@ On the first `archal run`, session provisioning takes about 30 seconds
|
|
|
266
274
|
the wait is visible:
|
|
267
275
|
|
|
268
276
|
```
|
|
269
|
-
[archal] provisioning stripe
|
|
270
|
-
[archal] provisioning stripe
|
|
277
|
+
[archal] provisioning stripe clone... 5s
|
|
278
|
+
[archal] provisioning stripe clone... 10s
|
|
271
279
|
...
|
|
272
280
|
```
|
|
273
281
|
|
|
274
282
|
Subsequent runs with the same configuration reuse the warm session
|
|
275
283
|
(~2 seconds).
|
|
276
284
|
|
|
277
|
-
At the end of every run, the estimated
|
|
285
|
+
At the end of every run, the estimated workspace session-minute usage is printed:
|
|
278
286
|
|
|
279
287
|
```
|
|
280
|
-
[archal] ~2
|
|
288
|
+
[archal] ~2 clone-minutes for this run (38.5s × 1 clone: stripe)
|
|
281
289
|
```
|
|
282
290
|
|
|
283
291
|
## Docs
|
package/bin/archal.cjs
CHANGED
|
@@ -3,7 +3,7 @@
|
|
|
3
3
|
const { existsSync } = require("node:fs");
|
|
4
4
|
const { join } = require("node:path");
|
|
5
5
|
|
|
6
|
-
const localDist = join(__dirname, "..", "dist", "
|
|
6
|
+
const localDist = join(__dirname, "..", "dist", "cli.cjs");
|
|
7
7
|
|
|
8
8
|
if (existsSync(localDist)) {
|
|
9
9
|
require(localDist);
|