@lythos/skill-arena 0.13.3 → 0.14.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +100 -85
- package/package.json +5 -5
- package/src/cli.ts +181 -1
- package/src/preflight.test.ts +189 -0
- package/src/preflight.ts +98 -0
package/README.md
CHANGED
|
@@ -1,140 +1,155 @@
|
|
|
1
1
|
# @lythos/skill-arena
|
|
2
2
|
|
|
3
|
-
|
|
3
|
+
> Controlled-variable benchmark for AI agent skills. Test single decks or compare A/B — agent-orchestrated by default, cross-player when you need it.
|
|
4
4
|
|
|
5
|
-
|
|
5
|
+
## Modes at a Glance
|
|
6
6
|
|
|
7
|
-
|
|
7
|
+
| Mode | How | When |
|
|
8
|
+
|------|-----|------|
|
|
9
|
+
| **Agent-Orchestrated** (DEFAULT) | Agent tool spawns subagents, parallel dispatch, native judge | Single deck test, cross-deck A/B comparison |
|
|
10
|
+
| **Cross-Player** (OPT-IN) | CLI runner spawns different agent binaries via Bun.spawn | Comparing kimi vs codex vs claude |
|
|
8
11
|
|
|
9
|
-
|
|
10
|
-
|
|
11
|
-
`skill-arena` scaffolds isolated environments where subagents complete the same task under different decks. A judge agent scores outputs across multiple dimensions. Supports:
|
|
12
|
-
|
|
13
|
-
- **Mode 1**: Single-skill comparison (controlled variable — same helper skills, different test skill).
|
|
14
|
-
- **Mode 2**: Full-deck comparison (Pareto frontier — no single winner, only optimal trade-offs).
|
|
15
|
-
|
|
16
|
-
## Prerequisites
|
|
17
|
-
|
|
18
|
-
Arena runs AI agents as subprocesses. You need at least one agent CLI installed:
|
|
19
|
-
|
|
20
|
-
### Kimi CLI (recommended default)
|
|
21
|
-
|
|
22
|
-
Kimi Code CLI is the default player for arena — it has reliable headless execution with eager tool loading (no deferred tool deadlock).
|
|
23
|
-
|
|
24
|
-
```bash
|
|
25
|
-
# Install via uv (recommended) — uv is Python's bunx equivalent
|
|
26
|
-
uv tool install kimi-cli
|
|
27
|
-
# Or run without installing:
|
|
28
|
-
uvx kimi-cli --print -p "hello"
|
|
29
|
-
|
|
30
|
-
# Authenticate
|
|
31
|
-
kimi login
|
|
32
|
-
# Or set API key:
|
|
33
|
-
export KIMI_API_KEY=your_key
|
|
34
|
-
```
|
|
35
|
-
|
|
36
|
-
Docs: [https://github.com/MoonshotAI/kimi-cli](https://github.com/MoonshotAI/kimi-cli)
|
|
37
|
-
|
|
38
|
-
### Claude CLI (secondary)
|
|
39
|
-
|
|
40
|
-
```bash
|
|
41
|
-
npm install -g @anthropic-ai/claude-code
|
|
42
|
-
claude --version # should be ≥ 2.1.128
|
|
43
|
-
```
|
|
44
|
-
|
|
45
|
-
Note: Claude `-p` mode has known issues with web tools in Bun.spawn (deferred tool deadlock). Kimi is the default for reliability.
|
|
12
|
+
**95% of arena use is agent-orchestrated.** The Agent tool can spawn parallel subagents with isolated workdirs and different decks — zero CLI. Cross-player mode is ONLY needed when comparing different agent CLIs (the Agent tool can only spawn same-type agents).
|
|
46
13
|
|
|
47
14
|
## Install
|
|
48
15
|
|
|
49
16
|
```bash
|
|
50
17
|
bun add -d @lythos/skill-arena
|
|
51
18
|
# or use directly
|
|
52
|
-
bunx @lythos/skill-arena@0.
|
|
19
|
+
bunx @lythos/skill-arena@0.14.1 <command>
|
|
53
20
|
```
|
|
54
21
|
|
|
55
22
|
## Quick Start
|
|
56
23
|
|
|
57
24
|
```bash
|
|
58
|
-
#
|
|
25
|
+
# single — test one deck (most common)
|
|
59
26
|
bunx @lythos/skill-arena@latest single \
|
|
60
27
|
--deck ./examples/decks/scout.toml \
|
|
61
28
|
--brief "Generate auth flow diagram" \
|
|
62
|
-
--player kimi \
|
|
63
|
-
--timeout 300000 \
|
|
64
29
|
--out ./output
|
|
65
30
|
|
|
66
|
-
#
|
|
67
|
-
|
|
68
|
-
--deck https://raw.githubusercontent.com/lythos-labs/lythoskill/main/examples/decks/scout.toml \
|
|
69
|
-
--brief "Generate auth flow diagram" \
|
|
70
|
-
--out ./output
|
|
71
|
-
|
|
72
|
-
# Vs: compare multiple decks side by side
|
|
73
|
-
curl -fsSL https://raw.githubusercontent.com/lythos-labs/lythoskill/main/examples/arena/research-compare/arena.toml > arena.toml
|
|
31
|
+
# cross-deck vs — compare two decks (agent-orchestrated)
|
|
32
|
+
# Create arena.toml declaring sides with different decks, then:
|
|
74
33
|
bunx @lythos/skill-arena@latest vs --config ./arena.toml
|
|
34
|
+
|
|
35
|
+
# cross-player vs — compare kimi vs codex (CLI only)
|
|
36
|
+
bunx @lythos/skill-arena@latest vs --config ./arena.toml --player kimi
|
|
75
37
|
```
|
|
76
38
|
|
|
77
|
-
**
|
|
78
|
-
- Agent runs in an isolated `/tmp` workdir (no workspace pollution)
|
|
79
|
-
- All artifacts are copied to `--out` after completion
|
|
80
|
-
- Prompt template injects fixed contract (decision-log, robustness, tool preference) + your brief as variable
|
|
39
|
+
**What happens**: Agent creates isolated `/tmp` workdir per side, `deck link` skills, spawns parallel subagents, collects artifacts, judge scores outputs. Parent deck restored after.
|
|
81
40
|
|
|
82
41
|
## Commands
|
|
83
42
|
|
|
84
|
-
###
|
|
43
|
+
### `single` — one deck, one task
|
|
85
44
|
|
|
86
45
|
```bash
|
|
87
|
-
|
|
88
|
-
|
|
89
|
-
|
|
90
|
-
|
|
91
|
-
|
|
46
|
+
bunx @lythos/skill-arena@latest single \
|
|
47
|
+
--deck ./deck.toml \
|
|
48
|
+
--brief "Produce a .docx report with radar chart" \
|
|
49
|
+
--timeout 600000 \
|
|
50
|
+
--out ./output
|
|
92
51
|
```
|
|
93
52
|
|
|
94
|
-
###
|
|
53
|
+
### `vs` — multi-deck comparison
|
|
95
54
|
|
|
55
|
+
```bash
|
|
56
|
+
bunx @lythos/skill-arena@latest vs --config ./arena.toml
|
|
57
|
+
bunx @lythos/skill-arena@latest vs --config ./arena.toml --dry-run
|
|
96
58
|
```
|
|
97
|
-
|
|
98
|
-
|
|
59
|
+
|
|
60
|
+
### `scaffold` — legacy directory setup
|
|
61
|
+
|
|
62
|
+
```bash
|
|
63
|
+
bunx @lythos/skill-arena@latest scaffold \
|
|
64
|
+
--task "Generate auth flow diagram" \
|
|
65
|
+
--decks "./decks/minimal.toml,./decks/rich.toml"
|
|
99
66
|
```
|
|
100
67
|
|
|
101
|
-
###
|
|
68
|
+
### `prepare-workdir` — isolate + link skills (agent-orchestrated)
|
|
102
69
|
|
|
103
70
|
```bash
|
|
104
|
-
bunx @lythos/skill-arena@
|
|
71
|
+
bunx @lythos/skill-arena@latest prepare-workdir \
|
|
72
|
+
--deck ./skill-deck.toml \
|
|
73
|
+
--out /tmp/arena-side-a \
|
|
74
|
+
--brief "task description"
|
|
75
|
+
|
|
76
|
+
# Plan-first: review before executing
|
|
77
|
+
bunx @lythos/skill-arena@latest prepare-workdir \
|
|
78
|
+
--deck ./skill-deck.toml \
|
|
79
|
+
--out /tmp/arena-side-a \
|
|
80
|
+
--brief "task" \
|
|
81
|
+
--dry-run
|
|
105
82
|
```
|
|
106
83
|
|
|
107
|
-
|
|
84
|
+
Creates `/tmp`-isolated workdir with deck copied, AGENTS.md written, and `deck link` run. `--dry-run` prints the plan (skills, workdir path, link needed) without creating anything.
|
|
108
85
|
|
|
109
|
-
|
|
110
|
-
The agent-visible **Skill** layer documentation is here:
|
|
111
|
-
[packages/lythoskill-arena/skill/SKILL.md](../../packages/lythoskill-arena/skill/SKILL.md)
|
|
86
|
+
### `archive` — collect agent outputs (agent-orchestrated)
|
|
112
87
|
|
|
113
|
-
|
|
88
|
+
```bash
|
|
89
|
+
bunx @lythos/skill-arena@latest archive \
|
|
90
|
+
--from /tmp/arena-side-a \
|
|
91
|
+
--to ./playground/output \
|
|
92
|
+
--sides side-a
|
|
93
|
+
|
|
94
|
+
# Plan-first: review what would be copied
|
|
95
|
+
bunx @lythos/skill-arena@latest archive \
|
|
96
|
+
--from /tmp/arena-side-a \
|
|
97
|
+
--to ./playground/output \
|
|
98
|
+
--sides side-a \
|
|
99
|
+
--dry-run
|
|
100
|
+
```
|
|
114
101
|
|
|
115
|
-
|
|
102
|
+
Copies agent artifacts from workdir(s) to output, skipping internal files (`.claude`, `skill-deck.toml`, `skill-deck.lock`, `AGENTS.md`). Single-side archives fall back to workdir root when the named side subdirectory doesn't exist. `--dry-run` shows the per-side plan before copying.
|
|
116
103
|
|
|
104
|
+
### `viz` — render results
|
|
105
|
+
|
|
106
|
+
```bash
|
|
107
|
+
bunx @lythos/skill-arena@latest viz runs/arena-<id>/
|
|
117
108
|
```
|
|
118
|
-
|
|
119
|
-
|
|
120
|
-
|
|
109
|
+
|
|
110
|
+
## Parameters
|
|
111
|
+
|
|
112
|
+
| Flag | Command | Description |
|
|
113
|
+
|------|---------|-------------|
|
|
114
|
+
| `--brief "<text>"` | single | Inline task brief |
|
|
115
|
+
| `--deck <path\|url>` | single | Deck file (URL auto-fetched) |
|
|
116
|
+
| `--player <name>` | single, vs | Only for cross-player: kimi\|codex\|deepseek\|claude |
|
|
117
|
+
| `--timeout <ms>` | single | Subagent timeout (300000–600000 for complex tasks) |
|
|
118
|
+
| `--from <dir>` | archive | Source workdir |
|
|
119
|
+
| `--to <dir>` | archive | Output directory |
|
|
120
|
+
| `--sides <names>` | archive | Comma-separated side names (default: `.`) |
|
|
121
|
+
| `--out <dir>` | single, vs, prepare-workdir | Output / workdir directory |
|
|
122
|
+
| `--config <path>` | vs | arena.toml |
|
|
123
|
+
| `--dry-run` | vs, prepare-workdir, archive | Print plan without execution |
|
|
124
|
+
|
|
125
|
+
## Prerequisites (cross-player only)
|
|
126
|
+
|
|
127
|
+
For cross-player mode, install at least one agent CLI:
|
|
128
|
+
|
|
129
|
+
```bash
|
|
130
|
+
uv tool install kimi-cli # kimi (recommended default)
|
|
131
|
+
npm i -g @openai/codex # codex
|
|
132
|
+
# deepseek: bundled with desktop app or pip install deepseek-cli
|
|
133
|
+
# claude: set ANTHROPIC_API_KEY (SDK, no CLI binary needed)
|
|
121
134
|
```
|
|
122
135
|
|
|
123
|
-
|
|
136
|
+
## Skill Documentation
|
|
137
|
+
|
|
138
|
+
The agent-visible skill layer: [skill/SKILL.md](./skill/SKILL.md)
|
|
139
|
+
|
|
140
|
+
## Architecture
|
|
124
141
|
|
|
125
142
|
```
|
|
126
143
|
arena.toml → ArenaToml (Zod) → ExecutionPlan (pure) → per-cell agent spawn (IO)
|
|
127
|
-
|
|
128
|
-
|
|
129
|
-
|
|
130
|
-
|
|
144
|
+
↓
|
|
145
|
+
aggregateAllStats (pure) ← verdicts[]
|
|
146
|
+
↓
|
|
147
|
+
runComparativeJudge (IO) → report.md + Pareto frontier
|
|
131
148
|
```
|
|
132
149
|
|
|
133
|
-
- **Intent**: `arena.toml` declarative config
|
|
150
|
+
- **Intent**: `arena.toml` declarative config
|
|
134
151
|
- **Plan**: `buildExecutionPlan()`, `aggregateSideStats()`, `computePareto()` — pure functions
|
|
135
|
-
- **Execute**:
|
|
136
|
-
|
|
137
|
-
Built on `@lythos/test-utils` shared infrastructure.
|
|
152
|
+
- **Execute**: Agent tool spawn (agent-orchestrated) or `AgentAdapter` (cross-player)
|
|
138
153
|
|
|
139
154
|
## License
|
|
140
155
|
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@lythos/skill-arena",
|
|
3
|
-
"version": "0.
|
|
3
|
+
"version": "0.14.1",
|
|
4
4
|
"description": "Skill Arena — benchmark skill effectiveness with controlled-variable comparison",
|
|
5
5
|
"keywords": [
|
|
6
6
|
"ai-agent",
|
|
@@ -42,13 +42,13 @@
|
|
|
42
42
|
"bun": ">=1.0.0"
|
|
43
43
|
},
|
|
44
44
|
"dependencies": {
|
|
45
|
-
"@lythos/cold-pool": "^0.
|
|
46
|
-
"@lythos/infra": "^0.
|
|
47
|
-
"@lythos/test-utils": "^0.
|
|
45
|
+
"@lythos/cold-pool": "^0.14.1",
|
|
46
|
+
"@lythos/infra": "^0.14.1",
|
|
47
|
+
"@lythos/test-utils": "^0.14.1",
|
|
48
48
|
"zod": "^3.24.0",
|
|
49
49
|
"zod-to-json-schema": "^3.25.2"
|
|
50
50
|
},
|
|
51
51
|
"optionalDependencies": {
|
|
52
|
-
"@lythos/agent-adapter-claude-sdk": "^0.
|
|
52
|
+
"@lythos/agent-adapter-claude-sdk": "^0.14.1"
|
|
53
53
|
}
|
|
54
54
|
}
|
package/src/cli.ts
CHANGED
|
@@ -5,7 +5,7 @@ import { homedir, tmpdir } from 'node:os'
|
|
|
5
5
|
import { ZodError } from 'zod'
|
|
6
6
|
import { formatPlanOutput, type ArenaResult, buildArenaPrompt } from './runner'
|
|
7
7
|
import { parseArenaToml, buildExecutionPlan } from './arena-toml'
|
|
8
|
-
import { buildCopyPlan, parseDeckSkills } from './preflight'
|
|
8
|
+
import { buildArchiveSidePlan, buildCopyPlan, buildPreparePlan, parseDeckSkills } from './preflight'
|
|
9
9
|
import { checkSkillExistence, formatSkillWarnings, resolveColdPoolDir } from './preflight'
|
|
10
10
|
|
|
11
11
|
// ─── fetchWithProxy (infra dependency, no package boundary) ─────────────────
|
|
@@ -100,6 +100,8 @@ Examples:
|
|
|
100
100
|
lythoskill-arena vs --config arena.toml --dry-run
|
|
101
101
|
lythoskill-arena vs --config arena.toml
|
|
102
102
|
lythoskill-arena viz runs/arena-20260504
|
|
103
|
+
lythoskill-arena prepare-workdir --deck ./decks/scout.toml --out /tmp/arena-20260517-side-a
|
|
104
|
+
lythoskill-arena archive --from /tmp/arena-20260517 --to playground/arena-20260517 --sides side-a,side-b
|
|
103
105
|
`)
|
|
104
106
|
process.exit(0)
|
|
105
107
|
}
|
|
@@ -113,6 +115,8 @@ function cli(args: string[]) {
|
|
|
113
115
|
if (cmd === 'vs' || cmd === 'compare') return vsRun(rest)
|
|
114
116
|
if (cmd === 'single' || cmd === 'run') return singleRun(rest)
|
|
115
117
|
if (cmd === 'viz') return vizRun(rest)
|
|
118
|
+
if (cmd === 'prepare-workdir') return prepareWorkdir(rest)
|
|
119
|
+
if (cmd === 'archive') return archiveRun(rest)
|
|
116
120
|
|
|
117
121
|
console.error(`Unknown command: ${cmd}`)
|
|
118
122
|
process.exit(1)
|
|
@@ -431,6 +435,182 @@ async function vizRun(args: string[]) {
|
|
|
431
435
|
console.log(`📈 Arena HTML report not yet implemented. See report.md in ${runsDir}/`)
|
|
432
436
|
}
|
|
433
437
|
|
|
438
|
+
// ═══════════════════════════════════════════════════════════════════════════
|
|
439
|
+
// ── prepare-workdir: reusable workdir setup (used by both CLI and agent) ──
|
|
440
|
+
// Intent: create an isolated arena workdir with deck linked and ready to run
|
|
441
|
+
|
|
442
|
+
async function prepareWorkdir(args: string[]) {
|
|
443
|
+
const opts: Record<string, string | undefined> = {}
|
|
444
|
+
let dryRun = false
|
|
445
|
+
for (let i = 0; i < args.length; i++) {
|
|
446
|
+
if (args[i] === '--deck' || args[i] === '-d') opts.deck = args[++i]
|
|
447
|
+
else if (args[i] === '--out' || args[i] === '-o') opts.out = args[++i]
|
|
448
|
+
else if (args[i] === '--brief' || args[i] === '-b') opts.brief = args[++i]
|
|
449
|
+
else if (args[i] === '--dry-run') dryRun = true
|
|
450
|
+
}
|
|
451
|
+
|
|
452
|
+
if (!opts.deck) {
|
|
453
|
+
console.error(`❌ --deck <path> is required.
|
|
454
|
+
lythoskill-arena prepare-workdir --deck ./skill-deck.toml --out /tmp/arena-side-a`)
|
|
455
|
+
process.exit(1)
|
|
456
|
+
}
|
|
457
|
+
|
|
458
|
+
const deckPath = resolve(opts.deck)
|
|
459
|
+
if (!existsSync(deckPath)) {
|
|
460
|
+
console.error(`❌ Deck file not found: ${deckPath}`)
|
|
461
|
+
process.exit(1)
|
|
462
|
+
}
|
|
463
|
+
|
|
464
|
+
const workDir = opts.out
|
|
465
|
+
? resolve(opts.out)
|
|
466
|
+
: join(tmpdir(), `arena-${Date.now()}`)
|
|
467
|
+
const deckContent = readFileSync(deckPath, 'utf-8')
|
|
468
|
+
|
|
469
|
+
// ── Plan (pure computation — what WOULD be created) ────────────────────
|
|
470
|
+
const plan = buildPreparePlan({
|
|
471
|
+
deckPath,
|
|
472
|
+
deckContent,
|
|
473
|
+
workDir,
|
|
474
|
+
skillCount: 0, // computed inside from deckContent
|
|
475
|
+
brief: opts.brief,
|
|
476
|
+
})
|
|
477
|
+
|
|
478
|
+
console.log('📋 Prepare plan:')
|
|
479
|
+
console.log(` deck: ${plan.deckPath}`)
|
|
480
|
+
console.log(` workdir: ${plan.workDir}`)
|
|
481
|
+
console.log(` skills: ${plan.skills.length} declared (${plan.skills.map(s => s.name).join(', ') || 'none'})`)
|
|
482
|
+
console.log(` link: ${plan.hasSkills ? 'Bun.spawn deck link' : 'skip (no skills)'}`)
|
|
483
|
+
console.log(` AGENTS.md: write (${plan.agentsMd.split('\n').length} lines)`)
|
|
484
|
+
if (opts.brief) console.log(` brief: ${opts.brief!.slice(0, 60)}...`)
|
|
485
|
+
|
|
486
|
+
if (dryRun) {
|
|
487
|
+
console.log(`\n🏁 Dry-run complete (no files created). Remove --dry-run to execute.`)
|
|
488
|
+
return
|
|
489
|
+
}
|
|
490
|
+
|
|
491
|
+
// ── Execute: create workdir ──────────────────────────────────────────
|
|
492
|
+
mkdirSync(workDir, { recursive: true })
|
|
493
|
+
writeFileSync(join(workDir, 'skill-deck.toml'), deckContent)
|
|
494
|
+
writeFileSync(join(workDir, 'AGENTS.md'), plan.agentsMd)
|
|
495
|
+
|
|
496
|
+
if (plan.hasSkills) {
|
|
497
|
+
const { existsSync: es2 } = await import('node:fs')
|
|
498
|
+
const localDeckCli = join(import.meta.dir, '..', '..', 'lythoskill-deck', 'src', 'cli.ts')
|
|
499
|
+
const linkCmd = es2(localDeckCli)
|
|
500
|
+
? ['bun', localDeckCli, 'link']
|
|
501
|
+
: ['bunx', '@lythos/skill-deck', 'link']
|
|
502
|
+
const linkProc = Bun.spawn(linkCmd,
|
|
503
|
+
{ cwd: workDir, env: { ...process.env, HOME: process.env.HOME! } },
|
|
504
|
+
)
|
|
505
|
+
await linkProc.exited
|
|
506
|
+
const linkStderr = await new Response(linkProc.stderr).text()
|
|
507
|
+
const linkResult = validateLinkResult(linkProc.exitCode, linkStderr)
|
|
508
|
+
if (!linkResult.ok) {
|
|
509
|
+
console.error(`❌ ${linkResult.error}`)
|
|
510
|
+
process.exit(1)
|
|
511
|
+
}
|
|
512
|
+
} else {
|
|
513
|
+
console.log('ℹ️ No skills declared in deck — skipping link')
|
|
514
|
+
}
|
|
515
|
+
|
|
516
|
+
// Skill existence check
|
|
517
|
+
try {
|
|
518
|
+
const coldPoolDefault = join(homedir(), '.agents', 'skill-repos')
|
|
519
|
+
const coldPoolDir = resolveColdPoolDir(Bun.TOML.parse(deckContent)?.deck?.cold_pool, homedir(), coldPoolDefault)
|
|
520
|
+
const checks = checkSkillExistence(plan.skills, coldPoolDir, existsSync)
|
|
521
|
+
for (const warning of formatSkillWarnings(checks)) {
|
|
522
|
+
console.warn(`⚠️ ${warning}`)
|
|
523
|
+
}
|
|
524
|
+
} catch (e) {
|
|
525
|
+
console.warn('⚠️ Could not check skill existence:', e instanceof Error ? e.message : e)
|
|
526
|
+
}
|
|
527
|
+
|
|
528
|
+
console.log(`✅ Workdir ready → ${workDir}`)
|
|
529
|
+
console.log(` deck: ${deckPath}`)
|
|
530
|
+
if (opts.brief) console.log(` brief: ${opts.brief!.slice(0, 60)}...`)
|
|
531
|
+
}
|
|
532
|
+
|
|
533
|
+
// ═══════════════════════════════════════════════════════════════════════════
|
|
534
|
+
// ── archive: copy agent outputs from workdir(s) to outDir ─────────────────
|
|
535
|
+
// Intent: same copy behavior as CLI singleRun, reusable for agent-orchestrated
|
|
536
|
+
|
|
537
|
+
async function archiveRun(args: string[]) {
|
|
538
|
+
const opts: Record<string, string | undefined> = {}
|
|
539
|
+
let dryRun = false
|
|
540
|
+
for (let i = 0; i < args.length; i++) {
|
|
541
|
+
if (args[i] === '--from' || args[i] === '-f') opts.from = args[++i]
|
|
542
|
+
else if (args[i] === '--to' || args[i] === '-o') opts.to = args[++i]
|
|
543
|
+
else if (args[i] === '--sides') opts.sides = args[++i]
|
|
544
|
+
else if (args[i] === '--report') opts.report = args[++i]
|
|
545
|
+
else if (args[i] === '--dry-run') dryRun = true
|
|
546
|
+
}
|
|
547
|
+
|
|
548
|
+
if (!opts.from || !opts.to) {
|
|
549
|
+
console.error(`❌ --from <workdir> and --to <outdir> are required.
|
|
550
|
+
lythoskill-arena archive --from /tmp/arena-20260517 --to playground/arena-20260517 --sides side-a,side-b --report ./report.md`)
|
|
551
|
+
process.exit(1)
|
|
552
|
+
}
|
|
553
|
+
|
|
554
|
+
const fromDir = resolve(opts.from)
|
|
555
|
+
const outDir = resolve(opts.to)
|
|
556
|
+
|
|
557
|
+
const sides = opts.sides ? opts.sides.split(',') : ['.']
|
|
558
|
+
const plan = buildArchiveSidePlan(fromDir, sides, existsSync)
|
|
559
|
+
|
|
560
|
+
// ── Plan output (always shown, also serves as dry-run) ──────────────────
|
|
561
|
+
console.log('📋 Archive plan:')
|
|
562
|
+
for (const pe of plan) {
|
|
563
|
+
if (!pe.found) {
|
|
564
|
+
console.log(` ⚠️ ${pe.side}: not found (${pe.sourceDir}) — will skip`)
|
|
565
|
+
} else if (pe.sourceDir === fromDir && pe.side !== '.') {
|
|
566
|
+
console.log(` ${pe.side}: ${pe.sourceDir} (fallback → root) → ${join(outDir, pe.side)}`)
|
|
567
|
+
} else {
|
|
568
|
+
console.log(` ${pe.side}: ${pe.sourceDir} → ${join(outDir, pe.side)}`)
|
|
569
|
+
}
|
|
570
|
+
}
|
|
571
|
+
if (dryRun) {
|
|
572
|
+
console.log(`\n🏁 Dry-run complete (no files copied). Remove --dry-run to execute.`)
|
|
573
|
+
return
|
|
574
|
+
}
|
|
575
|
+
|
|
576
|
+
// ── Execute: copy files ───────────────────────────────────────────────
|
|
577
|
+
mkdirSync(outDir, { recursive: true })
|
|
578
|
+
|
|
579
|
+
if (opts.report && existsSync(resolve(opts.report))) {
|
|
580
|
+
const { cpSync: cpR } = await import('node:fs')
|
|
581
|
+
cpR(resolve(opts.report), join(outDir, 'report.md'))
|
|
582
|
+
console.log(`📄 report.md → ${outDir}/report.md`)
|
|
583
|
+
}
|
|
584
|
+
|
|
585
|
+
const { cpSync, readdirSync } = await import('node:fs')
|
|
586
|
+
const skipSet = new Set(['.claude', 'skill-deck.toml', 'skill-deck.lock', 'AGENTS.md'])
|
|
587
|
+
|
|
588
|
+
for (const planEntry of plan) {
|
|
589
|
+
if (!planEntry.found) {
|
|
590
|
+
console.warn(`⚠️ Side workdir not found: ${planEntry.sourceDir}`)
|
|
591
|
+
continue
|
|
592
|
+
}
|
|
593
|
+
|
|
594
|
+
const sideOutDir = join(outDir, planEntry.side)
|
|
595
|
+
mkdirSync(sideOutDir, { recursive: true })
|
|
596
|
+
|
|
597
|
+
const entries = readdirSync(planEntry.sourceDir, { withFileTypes: true })
|
|
598
|
+
for (const entry of entries) {
|
|
599
|
+
if (skipSet.has(entry.name)) continue
|
|
600
|
+
const src = join(planEntry.sourceDir, entry.name)
|
|
601
|
+
const dest = join(sideOutDir, entry.name)
|
|
602
|
+
try {
|
|
603
|
+
cpSync(src, dest, { recursive: entry.isDirectory() })
|
|
604
|
+
console.log(` ${planEntry.side}/${entry.name} → ${dest}`)
|
|
605
|
+
} catch (e) {
|
|
606
|
+
console.warn(`⚠️ Failed to copy ${planEntry.side}/${entry.name}: ${e instanceof Error ? e.message : e}`)
|
|
607
|
+
}
|
|
608
|
+
}
|
|
609
|
+
}
|
|
610
|
+
|
|
611
|
+
console.log(`✅ Archive complete → ${outDir}`)
|
|
612
|
+
}
|
|
613
|
+
|
|
434
614
|
// ── Entry point ────────────────────────────────────────────────────────────
|
|
435
615
|
if (import.meta.main) {
|
|
436
616
|
main().catch(err => {
|
package/src/preflight.test.ts
CHANGED
|
@@ -393,3 +393,192 @@ describe('formatSkillWarnings', () => {
|
|
|
393
393
|
expect(formatSkillWarnings(checks)[0]).toContain('[transient]')
|
|
394
394
|
})
|
|
395
395
|
})
|
|
396
|
+
|
|
397
|
+
|
|
398
|
+
// ═══════════════════════════════════════════════════════════════════════════
|
|
399
|
+
// buildArchiveSidePlan
|
|
400
|
+
// ═══════════════════════════════════════════════════════════════════════════
|
|
401
|
+
|
|
402
|
+
|
|
403
|
+
// ═══════════════════════════════════════════════════════════════════════════
|
|
404
|
+
// buildArchiveSidePlan
|
|
405
|
+
// ═══════════════════════════════════════════════════════════════════════════
|
|
406
|
+
|
|
407
|
+
import { buildArchiveSidePlan } from './preflight'
|
|
408
|
+
import { join as pathJoin } from 'node:path'
|
|
409
|
+
|
|
410
|
+
const TMP = '/tmp/arena-test'
|
|
411
|
+
|
|
412
|
+
describe('buildArchiveSidePlan', () => {
|
|
413
|
+
|
|
414
|
+
test('default: sides=["."] maps to fromDir', () => {
|
|
415
|
+
const plan = buildArchiveSidePlan(TMP, ['.'], _p => true)
|
|
416
|
+
expect(plan).toEqual([
|
|
417
|
+
{ side: '.', sourceDir: TMP, found: true },
|
|
418
|
+
])
|
|
419
|
+
})
|
|
420
|
+
|
|
421
|
+
test('single side, subdirectory exists → source = fromDir/side', () => {
|
|
422
|
+
const exists = (p: string) => p === pathJoin(TMP, 'side-a')
|
|
423
|
+
const plan = buildArchiveSidePlan(TMP, ['side-a'], exists)
|
|
424
|
+
expect(plan).toEqual([
|
|
425
|
+
{ side: 'side-a', sourceDir: pathJoin(TMP, 'side-a'), found: true },
|
|
426
|
+
])
|
|
427
|
+
})
|
|
428
|
+
|
|
429
|
+
test('single side, subdirectory MISSING → fallback to fromDir root', () => {
|
|
430
|
+
const plan = buildArchiveSidePlan(TMP, ['side-a'], _p => false)
|
|
431
|
+
expect(plan).toEqual([
|
|
432
|
+
{ side: 'side-a', sourceDir: TMP, found: true },
|
|
433
|
+
])
|
|
434
|
+
})
|
|
435
|
+
|
|
436
|
+
test('multi side, all subdirectories exist', () => {
|
|
437
|
+
const exists = (p: string) =>
|
|
438
|
+
p === pathJoin(TMP, 'side-a') || p === pathJoin(TMP, 'side-b')
|
|
439
|
+
const plan = buildArchiveSidePlan(TMP, ['side-a', 'side-b'], exists)
|
|
440
|
+
expect(plan).toEqual([
|
|
441
|
+
{ side: 'side-a', sourceDir: pathJoin(TMP, 'side-a'), found: true },
|
|
442
|
+
{ side: 'side-b', sourceDir: pathJoin(TMP, 'side-b'), found: true },
|
|
443
|
+
])
|
|
444
|
+
})
|
|
445
|
+
|
|
446
|
+
test('multi side, one missing → found=false (caller handles warn+skip)', () => {
|
|
447
|
+
const exists = (p: string) => p === pathJoin(TMP, 'side-a')
|
|
448
|
+
const plan = buildArchiveSidePlan(TMP, ['side-a', 'side-b'], exists)
|
|
449
|
+
expect(plan).toEqual([
|
|
450
|
+
{ side: 'side-a', sourceDir: pathJoin(TMP, 'side-a'), found: true },
|
|
451
|
+
{ side: 'side-b', sourceDir: pathJoin(TMP, 'side-b'), found: false },
|
|
452
|
+
])
|
|
453
|
+
})
|
|
454
|
+
|
|
455
|
+
test('"." side does NOT trigger fallback when missing (found=false)', () => {
|
|
456
|
+
const plan = buildArchiveSidePlan(TMP, ['.'], _p => false)
|
|
457
|
+
expect(plan).toEqual([
|
|
458
|
+
{ side: '.', sourceDir: TMP, found: false },
|
|
459
|
+
])
|
|
460
|
+
})
|
|
461
|
+
|
|
462
|
+
test('empty sides array → empty plan', () => {
|
|
463
|
+
const plan = buildArchiveSidePlan(TMP, [], _p => true)
|
|
464
|
+
expect(plan).toEqual([])
|
|
465
|
+
})
|
|
466
|
+
|
|
467
|
+
test('three sides, middle missing', () => {
|
|
468
|
+
const exists = (p: string) =>
|
|
469
|
+
p === pathJoin(TMP, 'side-a') || p === pathJoin(TMP, 'side-c')
|
|
470
|
+
const plan = buildArchiveSidePlan(TMP, ['side-a', 'side-b', 'side-c'], exists)
|
|
471
|
+
expect(plan).toEqual([
|
|
472
|
+
{ side: 'side-a', sourceDir: pathJoin(TMP, 'side-a'), found: true },
|
|
473
|
+
{ side: 'side-b', sourceDir: pathJoin(TMP, 'side-b'), found: false },
|
|
474
|
+
{ side: 'side-c', sourceDir: pathJoin(TMP, 'side-c'), found: true },
|
|
475
|
+
])
|
|
476
|
+
})
|
|
477
|
+
})
|
|
478
|
+
|
|
479
|
+
// ═══════════════════════════════════════════════════════════════════════════
|
|
480
|
+
// buildPreparePlan
|
|
481
|
+
// ═══════════════════════════════════════════════════════════════════════════
|
|
482
|
+
|
|
483
|
+
import { buildPreparePlan } from './preflight'
|
|
484
|
+
|
|
485
|
+
const DECK_ONE_SKILL = `
|
|
486
|
+
[deck]
|
|
487
|
+
max_cards = 10
|
|
488
|
+
cold_pool = "~/.agents/skill-repos"
|
|
489
|
+
|
|
490
|
+
[tool.skills.pdf]
|
|
491
|
+
path = "github.com/anthropics/skills/skills/pdf"
|
|
492
|
+
`
|
|
493
|
+
|
|
494
|
+
const DECK_EMPTY = `
|
|
495
|
+
[deck]
|
|
496
|
+
max_cards = 5
|
|
497
|
+
`
|
|
498
|
+
|
|
499
|
+
const DECK_TWO_SKILLS = `
|
|
500
|
+
[deck]
|
|
501
|
+
max_cards = 10
|
|
502
|
+
|
|
503
|
+
[innate.skills.deck]
|
|
504
|
+
path = "github.com/lythos-labs/lythoskill/skills/lythoskill-deck"
|
|
505
|
+
|
|
506
|
+
[tool.skills.pdf]
|
|
507
|
+
path = "github.com/anthropics/skills/skills/pdf"
|
|
508
|
+
`
|
|
509
|
+
|
|
510
|
+
describe('buildPreparePlan', () => {
|
|
511
|
+
|
|
512
|
+
test('single skill deck → plan with 1 skill, hasSkills=true', () => {
|
|
513
|
+
const plan = buildPreparePlan({
|
|
514
|
+
deckPath: '/tmp/test-deck.toml',
|
|
515
|
+
deckContent: DECK_ONE_SKILL,
|
|
516
|
+
workDir: '/tmp/arena-test',
|
|
517
|
+
skillCount: 0,
|
|
518
|
+
})
|
|
519
|
+
expect(plan.skills).toHaveLength(1)
|
|
520
|
+
expect(plan.skills[0].name).toBe('pdf')
|
|
521
|
+
expect(plan.skills[0].section).toBe('tool')
|
|
522
|
+
expect(plan.hasSkills).toBe(true)
|
|
523
|
+
expect(plan.workDir).toBe('/tmp/arena-test')
|
|
524
|
+
expect(plan.deckPath).toBe('/tmp/test-deck.toml')
|
|
525
|
+
})
|
|
526
|
+
|
|
527
|
+
test('empty deck → skills=[], hasSkills=false', () => {
|
|
528
|
+
const plan = buildPreparePlan({
|
|
529
|
+
deckPath: '/tmp/empty.toml',
|
|
530
|
+
deckContent: DECK_EMPTY,
|
|
531
|
+
workDir: '/tmp/arena-empty',
|
|
532
|
+
skillCount: 0,
|
|
533
|
+
})
|
|
534
|
+
expect(plan.skills).toEqual([])
|
|
535
|
+
expect(plan.hasSkills).toBe(false)
|
|
536
|
+
})
|
|
537
|
+
|
|
538
|
+
test('two skills (innate + tool) → both parsed with correct sections', () => {
|
|
539
|
+
const plan = buildPreparePlan({
|
|
540
|
+
deckPath: '/tmp/two.toml',
|
|
541
|
+
deckContent: DECK_TWO_SKILLS,
|
|
542
|
+
workDir: '/tmp/arena-two',
|
|
543
|
+
skillCount: 0,
|
|
544
|
+
})
|
|
545
|
+
expect(plan.skills).toHaveLength(2)
|
|
546
|
+
expect(plan.skills[0]).toEqual({ name: 'deck', path: 'github.com/lythos-labs/lythoskill/skills/lythoskill-deck', section: 'innate' })
|
|
547
|
+
expect(plan.skills[1]).toEqual({ name: 'pdf', path: 'github.com/anthropics/skills/skills/pdf', section: 'tool' })
|
|
548
|
+
expect(plan.hasSkills).toBe(true)
|
|
549
|
+
})
|
|
550
|
+
|
|
551
|
+
test('AGENTS.md contains mandatory sections', () => {
|
|
552
|
+
const plan = buildPreparePlan({
|
|
553
|
+
deckPath: '/tmp/d.toml',
|
|
554
|
+
deckContent: DECK_ONE_SKILL,
|
|
555
|
+
workDir: '/tmp/arena-md',
|
|
556
|
+
skillCount: 0,
|
|
557
|
+
})
|
|
558
|
+
expect(plan.agentsMd).toContain('Arena Test Environment')
|
|
559
|
+
expect(plan.agentsMd).toContain('Setup Order')
|
|
560
|
+
expect(plan.agentsMd).toContain('decision-log.jsonl')
|
|
561
|
+
expect(plan.agentsMd).toContain('skill-deck.toml')
|
|
562
|
+
})
|
|
563
|
+
|
|
564
|
+
test('invalid TOML → skills=[], hasSkills=false (no crash)', () => {
|
|
565
|
+
const plan = buildPreparePlan({
|
|
566
|
+
deckPath: '/tmp/bad.toml',
|
|
567
|
+
deckContent: 'this is not toml {{{',
|
|
568
|
+
workDir: '/tmp/arena-bad',
|
|
569
|
+
skillCount: 0,
|
|
570
|
+
})
|
|
571
|
+
expect(plan.skills).toEqual([])
|
|
572
|
+
expect(plan.hasSkills).toBe(false)
|
|
573
|
+
})
|
|
574
|
+
|
|
575
|
+
test('deckContent is preserved in plan', () => {
|
|
576
|
+
const plan = buildPreparePlan({
|
|
577
|
+
deckPath: '/tmp/d.toml',
|
|
578
|
+
deckContent: DECK_ONE_SKILL,
|
|
579
|
+
workDir: '/tmp/arena-preserve',
|
|
580
|
+
skillCount: 0,
|
|
581
|
+
})
|
|
582
|
+
expect(plan.deckContent).toBe(DECK_ONE_SKILL)
|
|
583
|
+
})
|
|
584
|
+
})
|
package/src/preflight.ts
CHANGED
|
@@ -195,6 +195,104 @@ export function resolveColdPoolDir(
|
|
|
195
195
|
return raw.startsWith('~') ? `${homeDir}${raw.slice(1)}` : raw
|
|
196
196
|
}
|
|
197
197
|
|
|
198
|
+
// ── buildArchiveSidePlan ──────────────────────────────────────────────────
|
|
199
|
+
|
|
200
|
+
/**
|
|
201
|
+
* A single side's source mapping in an archive plan.
|
|
202
|
+
* Pure data — no IO, no console.
|
|
203
|
+
*/
|
|
204
|
+
export interface ArchiveSideEntry {
|
|
205
|
+
side: string
|
|
206
|
+
sourceDir: string
|
|
207
|
+
found: boolean
|
|
208
|
+
}
|
|
209
|
+
|
|
210
|
+
/**
|
|
211
|
+
* Build the per-side source directory plan for archive.
|
|
212
|
+
*
|
|
213
|
+
* Pure: strings + existence function → ArchiveSideEntry[].
|
|
214
|
+
* IO (`existsSync`) is injected via `existsFn` — test with mock, run with real.
|
|
215
|
+
*
|
|
216
|
+
* Single-side fallback: when --sides specifies exactly one named side and its
|
|
217
|
+
* subdirectory doesn't exist (agent put files in workdir root, prepare-workdir
|
|
218
|
+
* didn't create per-side dirs), fall back to `fromDir` as source (found=true).
|
|
219
|
+
*
|
|
220
|
+
* Default (no --sides): sides = ['.'] → sourceDir = fromDir.
|
|
221
|
+
*/
|
|
222
|
+
export function buildArchiveSidePlan(
|
|
223
|
+
fromDir: string,
|
|
224
|
+
sides: string[],
|
|
225
|
+
existsFn: (path: string) => boolean
|
|
226
|
+
): ArchiveSideEntry[] {
|
|
227
|
+
const plan: ArchiveSideEntry[] = []
|
|
228
|
+
for (const side of sides) {
|
|
229
|
+
let sourceDir = side === '.' ? fromDir : join(fromDir, side)
|
|
230
|
+
let found = existsFn(sourceDir)
|
|
231
|
+
if (!found && sides.length === 1 && side !== '.') {
|
|
232
|
+
sourceDir = fromDir
|
|
233
|
+
found = true
|
|
234
|
+
}
|
|
235
|
+
plan.push({ side, sourceDir, found })
|
|
236
|
+
}
|
|
237
|
+
return plan
|
|
238
|
+
}
|
|
239
|
+
|
|
240
|
+
// ── buildPreparePlan ─────────────────────────────────────────────────────
|
|
241
|
+
|
|
242
|
+
/**
|
|
243
|
+
* Plan-only result for prepare-workdir — what WOULD be created.
|
|
244
|
+
* Pure data, no IO. Caller renders this before executing.
|
|
245
|
+
*/
|
|
246
|
+
export interface PreparePlan {
|
|
247
|
+
deckPath: string
|
|
248
|
+
deckContent: string
|
|
249
|
+
workDir: string
|
|
250
|
+
skills: SkillDecl[]
|
|
251
|
+
hasSkills: boolean
|
|
252
|
+
agentsMd: string
|
|
253
|
+
}
|
|
254
|
+
|
|
255
|
+
/**
|
|
256
|
+
* Build the prepare-workdir plan from raw inputs.
|
|
257
|
+
*
|
|
258
|
+
* Pure computation: deck path + content → what workdir would contain.
|
|
259
|
+
* Caller does IO (reading deck, computing timestamp) and injects results.
|
|
260
|
+
*/
|
|
261
|
+
export function buildPreparePlan(params: {
|
|
262
|
+
deckPath: string
|
|
263
|
+
deckContent: string
|
|
264
|
+
workDir: string
|
|
265
|
+
skillCount: number
|
|
266
|
+
brief?: string
|
|
267
|
+
}): PreparePlan {
|
|
268
|
+
let deckParsed: Record<string, any> = {}
|
|
269
|
+
try { deckParsed = Bun.TOML.parse(params.deckContent) as Record<string, any> } catch {}
|
|
270
|
+
const skills = parseDeckSkills(deckParsed)
|
|
271
|
+
const hasSkills = skills.length > 0
|
|
272
|
+
|
|
273
|
+
const agentsMd = [
|
|
274
|
+
'# Arena Test Environment',
|
|
275
|
+
'**Mode**: agent-orchestrated cell',
|
|
276
|
+
'',
|
|
277
|
+
'## Setup Order (why this sequence)',
|
|
278
|
+
'1. `skill-deck.toml` copied here → declares which skills you can use',
|
|
279
|
+
'2. `deck link` runs → cold pool skills become visible in `.claude/skills/`',
|
|
280
|
+
'3. Skill existence checked → warns if any declared skill is missing from cold pool',
|
|
281
|
+
'4. `AGENTS.md` written last → confirms setup succeeded before agent starts',
|
|
282
|
+
'If setup fails mid-sequence, the workdir is incomplete and nothing runs.',
|
|
283
|
+
'',
|
|
284
|
+
'## How This Works',
|
|
285
|
+
'- Write ALL output files to this directory (CWD).',
|
|
286
|
+
'- Use available skills — check `ls .claude/skills/`.',
|
|
287
|
+
'',
|
|
288
|
+
'## Output Contract',
|
|
289
|
+
'- MANDATORY: `decision-log.jsonl` — one JSON line per decision:',
|
|
290
|
+
' `{"t":<seconds>,"phase":"setup|content|design|output","decision":"...","reason":"..."}`',
|
|
291
|
+
].join('\n')
|
|
292
|
+
|
|
293
|
+
return { deckPath: params.deckPath, deckContent: params.deckContent, workDir: params.workDir, skills, hasSkills, agentsMd }
|
|
294
|
+
}
|
|
295
|
+
|
|
198
296
|
// ── formatSkillWarnings ──────────────────────────────────────────────────
|
|
199
297
|
|
|
200
298
|
/**
|