@lythos/skill-arena 0.13.3 → 0.14.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,140 +1,155 @@
1
1
  # @lythos/skill-arena
2
2
 
3
- ![CI](https://img.shields.io/badge/CI-41%20unit%20tests-brightgreen) ![Intent/Plan](https://img.shields.io/badge/arch-intent%2Fplan%2Fexecute-8A2BE2)
3
+ > Controlled-variable benchmark for AI agent skills. Test single decks or compare A/B — agent-orchestrated by default, cross-player when you need it.
4
4
 
5
- > Controlled-variable benchmark for AI agent skills. Compare skills, decks, or configurations on the same task — single-skill A/B or full-deck Pareto frontier analysis. Now with declarative `arena.toml` (k8s-manifest style) and deterministic Pareto frontier.
5
+ ## Modes at a Glance
6
6
 
7
- ## Why
7
+ | Mode | How | When |
8
+ |------|-----|------|
9
+ | **Agent-Orchestrated** (DEFAULT) | Agent tool spawns subagents, parallel dispatch, native judge | Single deck test, cross-deck A/B comparison |
10
+ | **Cross-Player** (OPT-IN) | CLI runner spawns different agent binaries via Bun.spawn | Comparing kimi vs codex vs claude |
8
11
 
9
- "Which skill is better?" is the wrong question. The right question is "which skill is better for what."
10
-
11
- `skill-arena` scaffolds isolated environments where subagents complete the same task under different decks. A judge agent scores outputs across multiple dimensions. Supports:
12
-
13
- - **Mode 1**: Single-skill comparison (controlled variable — same helper skills, different test skill).
14
- - **Mode 2**: Full-deck comparison (Pareto frontier — no single winner, only optimal trade-offs).
15
-
16
- ## Prerequisites
17
-
18
- Arena runs AI agents as subprocesses. You need at least one agent CLI installed:
19
-
20
- ### Kimi CLI (recommended default)
21
-
22
- Kimi Code CLI is the default player for arena — it has reliable headless execution with eager tool loading (no deferred tool deadlock).
23
-
24
- ```bash
25
- # Install via uv (recommended) — uv is Python's bunx equivalent
26
- uv tool install kimi-cli
27
- # Or run without installing:
28
- uvx kimi-cli --print -p "hello"
29
-
30
- # Authenticate
31
- kimi login
32
- # Or set API key:
33
- export KIMI_API_KEY=your_key
34
- ```
35
-
36
- Docs: [https://github.com/MoonshotAI/kimi-cli](https://github.com/MoonshotAI/kimi-cli)
37
-
38
- ### Claude CLI (secondary)
39
-
40
- ```bash
41
- npm install -g @anthropic-ai/claude-code
42
- claude --version # should be ≥ 2.1.128
43
- ```
44
-
45
- Note: Claude `-p` mode has known issues with web tools in Bun.spawn (deferred tool deadlock). Kimi is the default for reliability.
12
+ **95% of arena use is agent-orchestrated.** The Agent tool can spawn parallel subagents with isolated workdirs and different decks — zero CLI. Cross-player mode is ONLY needed when comparing different agent CLIs (the Agent tool can only spawn same-type agents).
46
13
 
47
14
  ## Install
48
15
 
49
16
  ```bash
50
17
  bun add -d @lythos/skill-arena
51
18
  # or use directly
52
- bunx @lythos/skill-arena@0.13.3 <command>
19
+ bunx @lythos/skill-arena@0.14.1 <command>
53
20
  ```
54
21
 
55
22
  ## Quick Start
56
23
 
57
24
  ```bash
58
- # Single: test a deck with one agent (most common)
25
+ # single test one deck (most common)
59
26
  bunx @lythos/skill-arena@latest single \
60
27
  --deck ./examples/decks/scout.toml \
61
28
  --brief "Generate auth flow diagram" \
62
- --player kimi \
63
- --timeout 300000 \
64
29
  --out ./output
65
30
 
66
- # Single with remote deck (URL auto-fetched)
67
- bunx @lythos/skill-arena@latest single \
68
- --deck https://raw.githubusercontent.com/lythos-labs/lythoskill/main/examples/decks/scout.toml \
69
- --brief "Generate auth flow diagram" \
70
- --out ./output
71
-
72
- # Vs: compare multiple decks side by side
73
- curl -fsSL https://raw.githubusercontent.com/lythos-labs/lythoskill/main/examples/arena/research-compare/arena.toml > arena.toml
31
+ # cross-deck vs compare two decks (agent-orchestrated)
32
+ # Create arena.toml declaring sides with different decks, then:
74
33
  bunx @lythos/skill-arena@latest vs --config ./arena.toml
34
+
35
+ # cross-player vs — compare kimi vs codex (CLI only)
36
+ bunx @lythos/skill-arena@latest vs --config ./arena.toml --player kimi
75
37
  ```
76
38
 
77
- **Default behavior:**
78
- - Agent runs in an isolated `/tmp` workdir (no workspace pollution)
79
- - All artifacts are copied to `--out` after completion
80
- - Prompt template injects fixed contract (decision-log, robustness, tool preference) + your brief as variable
39
+ **What happens**: Agent creates isolated `/tmp` workdir per side, `deck link` skills, spawns parallel subagents, collects artifacts, judge scores outputs. Parent deck restored after.
81
40
 
82
41
  ## Commands
83
42
 
84
- ### Declarative mode (k8s-style, recommended)
43
+ ### `single` one deck, one task
85
44
 
86
45
  ```bash
87
- # Print execution plan without running
88
- bunx @lythos/skill-arena@0.13.3 vs --config arena.toml --dry-run
89
-
90
- # Execute with per-side runs_per_side and statistical aggregation
91
- bunx @lythos/skill-arena@0.13.3 vs --config arena.toml
46
+ bunx @lythos/skill-arena@latest single \
47
+ --deck ./deck.toml \
48
+ --brief "Produce a .docx report with radar chart" \
49
+ --timeout 600000 \
50
+ --out ./output
92
51
  ```
93
52
 
94
- ### Scaffold mode (legacy, manual execution)
53
+ ### `vs` multi-deck comparison
95
54
 
55
+ ```bash
56
+ bunx @lythos/skill-arena@latest vs --config ./arena.toml
57
+ bunx @lythos/skill-arena@latest vs --config ./arena.toml --dry-run
96
58
  ```
97
- bunx @lythos/skill-arena@0.13.3 scaffold --task "Generate auth flow diagram" \
98
- --decks https://raw.githubusercontent.com/lythos-labs/lythoskill/main/examples/decks/scout.toml,https://raw.githubusercontent.com/lythos-labs/lythoskill/main/examples/decks/documents.toml
59
+
60
+ ### `scaffold` — legacy directory setup
61
+
62
+ ```bash
63
+ bunx @lythos/skill-arena@latest scaffold \
64
+ --task "Generate auth flow diagram" \
65
+ --decks "./decks/minimal.toml,./decks/rich.toml"
99
66
  ```
100
67
 
101
- ### Viz
68
+ ### `prepare-workdir` — isolate + link skills (agent-orchestrated)
102
69
 
103
70
  ```bash
104
- bunx @lythos/skill-arena@0.13.3 viz runs/arena-<id>/
71
+ bunx @lythos/skill-arena@latest prepare-workdir \
72
+ --deck ./skill-deck.toml \
73
+ --out /tmp/arena-side-a \
74
+ --brief "task description"
75
+
76
+ # Plan-first: review before executing
77
+ bunx @lythos/skill-arena@latest prepare-workdir \
78
+ --deck ./skill-deck.toml \
79
+ --out /tmp/arena-side-a \
80
+ --brief "task" \
81
+ --dry-run
105
82
  ```
106
83
 
107
- ## Skill Documentation
84
+ Creates `/tmp`-isolated workdir with deck copied, AGENTS.md written, and `deck link` run. `--dry-run` prints the plan (skills, workdir path, link needed) without creating anything.
108
85
 
109
- This package is the **Starter** layer (CLI implementation).
110
- The agent-visible **Skill** layer documentation is here:
111
- [packages/lythoskill-arena/skill/SKILL.md](../../packages/lythoskill-arena/skill/SKILL.md)
86
+ ### `archive` collect agent outputs (agent-orchestrated)
112
87
 
113
- ## Architecture
88
+ ```bash
89
+ bunx @lythos/skill-arena@latest archive \
90
+ --from /tmp/arena-side-a \
91
+ --to ./playground/output \
92
+ --sides side-a
93
+
94
+ # Plan-first: review what would be copied
95
+ bunx @lythos/skill-arena@latest archive \
96
+ --from /tmp/arena-side-a \
97
+ --to ./playground/output \
98
+ --sides side-a \
99
+ --dry-run
100
+ ```
114
101
 
115
- Part of the [lythoskill](https://github.com/lythos-labs/lythoskill) ecosystem the thin-skill pattern separates heavy logic (this npm package) from lightweight agent instructions (SKILL.md).
102
+ Copies agent artifacts from workdir(s) to output, skipping internal files (`.claude`, `skill-deck.toml`, `skill-deck.lock`, `AGENTS.md`). Single-side archives fall back to workdir root when the named side subdirectory doesn't exist. `--dry-run` shows the per-side plan before copying.
116
103
 
104
+ ### `viz` — render results
105
+
106
+ ```bash
107
+ bunx @lythos/skill-arena@latest viz runs/arena-<id>/
117
108
  ```
118
- Starter (this package) → npm publish → bunx @lythos/skill-arena@0.13.3 ...
119
- Skill (packages/<name>/skill/) → build → SKILL.md + thin scripts
120
- Output (skills/<name>/) → git commit → agent-visible skill
109
+
110
+ ## Parameters
111
+
112
+ | Flag | Command | Description |
113
+ |------|---------|-------------|
114
+ | `--brief "<text>"` | single | Inline task brief |
115
+ | `--deck <path\|url>` | single | Deck file (URL auto-fetched) |
116
+ | `--player <name>` | single, vs | Only for cross-player: kimi\|codex\|deepseek\|claude |
117
+ | `--timeout <ms>` | single | Subagent timeout (300000–600000 for complex tasks) |
118
+ | `--from <dir>` | archive | Source workdir |
119
+ | `--to <dir>` | archive | Output directory |
120
+ | `--sides <names>` | archive | Comma-separated side names (default: `.`) |
121
+ | `--out <dir>` | single, vs, prepare-workdir | Output / workdir directory |
122
+ | `--config <path>` | vs | arena.toml |
123
+ | `--dry-run` | vs, prepare-workdir, archive | Print plan without execution |
124
+
125
+ ## Prerequisites (cross-player only)
126
+
127
+ For cross-player mode, install at least one agent CLI:
128
+
129
+ ```bash
130
+ uv tool install kimi-cli # kimi (recommended default)
131
+ npm i -g @openai/codex # codex
132
+ # deepseek: bundled with desktop app or pip install deepseek-cli
133
+ # claude: set ANTHROPIC_API_KEY (SDK, no CLI binary needed)
121
134
  ```
122
135
 
123
- ### Runtime architecture (intent/plan/execute)
136
+ ## Skill Documentation
137
+
138
+ The agent-visible skill layer: [skill/SKILL.md](./skill/SKILL.md)
139
+
140
+ ## Architecture
124
141
 
125
142
  ```
126
143
  arena.toml → ArenaToml (Zod) → ExecutionPlan (pure) → per-cell agent spawn (IO)
127
-
128
- aggregateAllStats (pure) ← verdicts[]
129
-
130
- runComparativeJudge (IO) → report.md + Pareto frontier
144
+
145
+ aggregateAllStats (pure) ← verdicts[]
146
+
147
+ runComparativeJudge (IO) → report.md + Pareto frontier
131
148
  ```
132
149
 
133
- - **Intent**: `arena.toml` declarative config (k8s-manifest style)
150
+ - **Intent**: `arena.toml` declarative config
134
151
  - **Plan**: `buildExecutionPlan()`, `aggregateSideStats()`, `computePareto()` — pure functions
135
- - **Execute**: `runAgentScenario` per cell, `runComparativeJudge` IO via `AgentAdapter`
136
-
137
- Built on `@lythos/test-utils` shared infrastructure.
152
+ - **Execute**: Agent tool spawn (agent-orchestrated) or `AgentAdapter` (cross-player)
138
153
 
139
154
  ## License
140
155
 
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@lythos/skill-arena",
3
- "version": "0.13.3",
3
+ "version": "0.14.1",
4
4
  "description": "Skill Arena — benchmark skill effectiveness with controlled-variable comparison",
5
5
  "keywords": [
6
6
  "ai-agent",
@@ -42,13 +42,13 @@
42
42
  "bun": ">=1.0.0"
43
43
  },
44
44
  "dependencies": {
45
- "@lythos/cold-pool": "^0.13.3",
46
- "@lythos/infra": "^0.13.3",
47
- "@lythos/test-utils": "^0.13.3",
45
+ "@lythos/cold-pool": "^0.14.1",
46
+ "@lythos/infra": "^0.14.1",
47
+ "@lythos/test-utils": "^0.14.1",
48
48
  "zod": "^3.24.0",
49
49
  "zod-to-json-schema": "^3.25.2"
50
50
  },
51
51
  "optionalDependencies": {
52
- "@lythos/agent-adapter-claude-sdk": "^0.13.3"
52
+ "@lythos/agent-adapter-claude-sdk": "^0.14.1"
53
53
  }
54
54
  }
package/src/cli.ts CHANGED
@@ -5,7 +5,7 @@ import { homedir, tmpdir } from 'node:os'
5
5
  import { ZodError } from 'zod'
6
6
  import { formatPlanOutput, type ArenaResult, buildArenaPrompt } from './runner'
7
7
  import { parseArenaToml, buildExecutionPlan } from './arena-toml'
8
- import { buildCopyPlan, parseDeckSkills } from './preflight'
8
+ import { buildArchiveSidePlan, buildCopyPlan, buildPreparePlan, parseDeckSkills } from './preflight'
9
9
  import { checkSkillExistence, formatSkillWarnings, resolveColdPoolDir } from './preflight'
10
10
 
11
11
  // ─── fetchWithProxy (infra dependency, no package boundary) ─────────────────
@@ -100,6 +100,8 @@ Examples:
100
100
  lythoskill-arena vs --config arena.toml --dry-run
101
101
  lythoskill-arena vs --config arena.toml
102
102
  lythoskill-arena viz runs/arena-20260504
103
+ lythoskill-arena prepare-workdir --deck ./decks/scout.toml --out /tmp/arena-20260517-side-a
104
+ lythoskill-arena archive --from /tmp/arena-20260517 --to playground/arena-20260517 --sides side-a,side-b
103
105
  `)
104
106
  process.exit(0)
105
107
  }
@@ -113,6 +115,8 @@ function cli(args: string[]) {
113
115
  if (cmd === 'vs' || cmd === 'compare') return vsRun(rest)
114
116
  if (cmd === 'single' || cmd === 'run') return singleRun(rest)
115
117
  if (cmd === 'viz') return vizRun(rest)
118
+ if (cmd === 'prepare-workdir') return prepareWorkdir(rest)
119
+ if (cmd === 'archive') return archiveRun(rest)
116
120
 
117
121
  console.error(`Unknown command: ${cmd}`)
118
122
  process.exit(1)
@@ -431,6 +435,182 @@ async function vizRun(args: string[]) {
431
435
  console.log(`📈 Arena HTML report not yet implemented. See report.md in ${runsDir}/`)
432
436
  }
433
437
 
438
+ // ═══════════════════════════════════════════════════════════════════════════
439
+ // ── prepare-workdir: reusable workdir setup (used by both CLI and agent) ──
440
+ // Intent: create an isolated arena workdir with deck linked and ready to run
441
+
442
+ async function prepareWorkdir(args: string[]) {
443
+ const opts: Record<string, string | undefined> = {}
444
+ let dryRun = false
445
+ for (let i = 0; i < args.length; i++) {
446
+ if (args[i] === '--deck' || args[i] === '-d') opts.deck = args[++i]
447
+ else if (args[i] === '--out' || args[i] === '-o') opts.out = args[++i]
448
+ else if (args[i] === '--brief' || args[i] === '-b') opts.brief = args[++i]
449
+ else if (args[i] === '--dry-run') dryRun = true
450
+ }
451
+
452
+ if (!opts.deck) {
453
+ console.error(`❌ --deck <path> is required.
454
+ lythoskill-arena prepare-workdir --deck ./skill-deck.toml --out /tmp/arena-side-a`)
455
+ process.exit(1)
456
+ }
457
+
458
+ const deckPath = resolve(opts.deck)
459
+ if (!existsSync(deckPath)) {
460
+ console.error(`❌ Deck file not found: ${deckPath}`)
461
+ process.exit(1)
462
+ }
463
+
464
+ const workDir = opts.out
465
+ ? resolve(opts.out)
466
+ : join(tmpdir(), `arena-${Date.now()}`)
467
+ const deckContent = readFileSync(deckPath, 'utf-8')
468
+
469
+ // ── Plan (pure computation — what WOULD be created) ────────────────────
470
+ const plan = buildPreparePlan({
471
+ deckPath,
472
+ deckContent,
473
+ workDir,
474
+ skillCount: 0, // computed inside from deckContent
475
+ brief: opts.brief,
476
+ })
477
+
478
+ console.log('📋 Prepare plan:')
479
+ console.log(` deck: ${plan.deckPath}`)
480
+ console.log(` workdir: ${plan.workDir}`)
481
+ console.log(` skills: ${plan.skills.length} declared (${plan.skills.map(s => s.name).join(', ') || 'none'})`)
482
+ console.log(` link: ${plan.hasSkills ? 'Bun.spawn deck link' : 'skip (no skills)'}`)
483
+ console.log(` AGENTS.md: write (${plan.agentsMd.split('\n').length} lines)`)
484
+ if (opts.brief) console.log(` brief: ${opts.brief!.slice(0, 60)}...`)
485
+
486
+ if (dryRun) {
487
+ console.log(`\n🏁 Dry-run complete (no files created). Remove --dry-run to execute.`)
488
+ return
489
+ }
490
+
491
+ // ── Execute: create workdir ──────────────────────────────────────────
492
+ mkdirSync(workDir, { recursive: true })
493
+ writeFileSync(join(workDir, 'skill-deck.toml'), deckContent)
494
+ writeFileSync(join(workDir, 'AGENTS.md'), plan.agentsMd)
495
+
496
+ if (plan.hasSkills) {
497
+ const { existsSync: es2 } = await import('node:fs')
498
+ const localDeckCli = join(import.meta.dir, '..', '..', 'lythoskill-deck', 'src', 'cli.ts')
499
+ const linkCmd = es2(localDeckCli)
500
+ ? ['bun', localDeckCli, 'link']
501
+ : ['bunx', '@lythos/skill-deck', 'link']
502
+ const linkProc = Bun.spawn(linkCmd,
503
+ { cwd: workDir, env: { ...process.env, HOME: process.env.HOME! } },
504
+ )
505
+ await linkProc.exited
506
+ const linkStderr = await new Response(linkProc.stderr).text()
507
+ const linkResult = validateLinkResult(linkProc.exitCode, linkStderr)
508
+ if (!linkResult.ok) {
509
+ console.error(`❌ ${linkResult.error}`)
510
+ process.exit(1)
511
+ }
512
+ } else {
513
+ console.log('ℹ️ No skills declared in deck — skipping link')
514
+ }
515
+
516
+ // Skill existence check
517
+ try {
518
+ const coldPoolDefault = join(homedir(), '.agents', 'skill-repos')
519
+ const coldPoolDir = resolveColdPoolDir(Bun.TOML.parse(deckContent)?.deck?.cold_pool, homedir(), coldPoolDefault)
520
+ const checks = checkSkillExistence(plan.skills, coldPoolDir, existsSync)
521
+ for (const warning of formatSkillWarnings(checks)) {
522
+ console.warn(`⚠️ ${warning}`)
523
+ }
524
+ } catch (e) {
525
+ console.warn('⚠️ Could not check skill existence:', e instanceof Error ? e.message : e)
526
+ }
527
+
528
+ console.log(`✅ Workdir ready → ${workDir}`)
529
+ console.log(` deck: ${deckPath}`)
530
+ if (opts.brief) console.log(` brief: ${opts.brief!.slice(0, 60)}...`)
531
+ }
532
+
533
+ // ═══════════════════════════════════════════════════════════════════════════
534
+ // ── archive: copy agent outputs from workdir(s) to outDir ─────────────────
535
+ // Intent: same copy behavior as CLI singleRun, reusable for agent-orchestrated
536
+
537
+ async function archiveRun(args: string[]) {
538
+ const opts: Record<string, string | undefined> = {}
539
+ let dryRun = false
540
+ for (let i = 0; i < args.length; i++) {
541
+ if (args[i] === '--from' || args[i] === '-f') opts.from = args[++i]
542
+ else if (args[i] === '--to' || args[i] === '-o') opts.to = args[++i]
543
+ else if (args[i] === '--sides') opts.sides = args[++i]
544
+ else if (args[i] === '--report') opts.report = args[++i]
545
+ else if (args[i] === '--dry-run') dryRun = true
546
+ }
547
+
548
+ if (!opts.from || !opts.to) {
549
+ console.error(`❌ --from <workdir> and --to <outdir> are required.
550
+ lythoskill-arena archive --from /tmp/arena-20260517 --to playground/arena-20260517 --sides side-a,side-b --report ./report.md`)
551
+ process.exit(1)
552
+ }
553
+
554
+ const fromDir = resolve(opts.from)
555
+ const outDir = resolve(opts.to)
556
+
557
+ const sides = opts.sides ? opts.sides.split(',') : ['.']
558
+ const plan = buildArchiveSidePlan(fromDir, sides, existsSync)
559
+
560
+ // ── Plan output (always shown, also serves as dry-run) ──────────────────
561
+ console.log('📋 Archive plan:')
562
+ for (const pe of plan) {
563
+ if (!pe.found) {
564
+ console.log(` ⚠️ ${pe.side}: not found (${pe.sourceDir}) — will skip`)
565
+ } else if (pe.sourceDir === fromDir && pe.side !== '.') {
566
+ console.log(` ${pe.side}: ${pe.sourceDir} (fallback → root) → ${join(outDir, pe.side)}`)
567
+ } else {
568
+ console.log(` ${pe.side}: ${pe.sourceDir} → ${join(outDir, pe.side)}`)
569
+ }
570
+ }
571
+ if (dryRun) {
572
+ console.log(`\n🏁 Dry-run complete (no files copied). Remove --dry-run to execute.`)
573
+ return
574
+ }
575
+
576
+ // ── Execute: copy files ───────────────────────────────────────────────
577
+ mkdirSync(outDir, { recursive: true })
578
+
579
+ if (opts.report && existsSync(resolve(opts.report))) {
580
+ const { cpSync: cpR } = await import('node:fs')
581
+ cpR(resolve(opts.report), join(outDir, 'report.md'))
582
+ console.log(`📄 report.md → ${outDir}/report.md`)
583
+ }
584
+
585
+ const { cpSync, readdirSync } = await import('node:fs')
586
+ const skipSet = new Set(['.claude', 'skill-deck.toml', 'skill-deck.lock', 'AGENTS.md'])
587
+
588
+ for (const planEntry of plan) {
589
+ if (!planEntry.found) {
590
+ console.warn(`⚠️ Side workdir not found: ${planEntry.sourceDir}`)
591
+ continue
592
+ }
593
+
594
+ const sideOutDir = join(outDir, planEntry.side)
595
+ mkdirSync(sideOutDir, { recursive: true })
596
+
597
+ const entries = readdirSync(planEntry.sourceDir, { withFileTypes: true })
598
+ for (const entry of entries) {
599
+ if (skipSet.has(entry.name)) continue
600
+ const src = join(planEntry.sourceDir, entry.name)
601
+ const dest = join(sideOutDir, entry.name)
602
+ try {
603
+ cpSync(src, dest, { recursive: entry.isDirectory() })
604
+ console.log(` ${planEntry.side}/${entry.name} → ${dest}`)
605
+ } catch (e) {
606
+ console.warn(`⚠️ Failed to copy ${planEntry.side}/${entry.name}: ${e instanceof Error ? e.message : e}`)
607
+ }
608
+ }
609
+ }
610
+
611
+ console.log(`✅ Archive complete → ${outDir}`)
612
+ }
613
+
434
614
  // ── Entry point ────────────────────────────────────────────────────────────
435
615
  if (import.meta.main) {
436
616
  main().catch(err => {
@@ -393,3 +393,192 @@ describe('formatSkillWarnings', () => {
393
393
  expect(formatSkillWarnings(checks)[0]).toContain('[transient]')
394
394
  })
395
395
  })
396
+
397
+
398
+ // ═══════════════════════════════════════════════════════════════════════════
399
+ // buildArchiveSidePlan
400
+ // ═══════════════════════════════════════════════════════════════════════════
401
+
402
+
403
+ // ═══════════════════════════════════════════════════════════════════════════
404
+ // buildArchiveSidePlan
405
+ // ═══════════════════════════════════════════════════════════════════════════
406
+
407
+ import { buildArchiveSidePlan } from './preflight'
408
+ import { join as pathJoin } from 'node:path'
409
+
410
+ const TMP = '/tmp/arena-test'
411
+
412
+ describe('buildArchiveSidePlan', () => {
413
+
414
+ test('default: sides=["."] maps to fromDir', () => {
415
+ const plan = buildArchiveSidePlan(TMP, ['.'], _p => true)
416
+ expect(plan).toEqual([
417
+ { side: '.', sourceDir: TMP, found: true },
418
+ ])
419
+ })
420
+
421
+ test('single side, subdirectory exists → source = fromDir/side', () => {
422
+ const exists = (p: string) => p === pathJoin(TMP, 'side-a')
423
+ const plan = buildArchiveSidePlan(TMP, ['side-a'], exists)
424
+ expect(plan).toEqual([
425
+ { side: 'side-a', sourceDir: pathJoin(TMP, 'side-a'), found: true },
426
+ ])
427
+ })
428
+
429
+ test('single side, subdirectory MISSING → fallback to fromDir root', () => {
430
+ const plan = buildArchiveSidePlan(TMP, ['side-a'], _p => false)
431
+ expect(plan).toEqual([
432
+ { side: 'side-a', sourceDir: TMP, found: true },
433
+ ])
434
+ })
435
+
436
+ test('multi side, all subdirectories exist', () => {
437
+ const exists = (p: string) =>
438
+ p === pathJoin(TMP, 'side-a') || p === pathJoin(TMP, 'side-b')
439
+ const plan = buildArchiveSidePlan(TMP, ['side-a', 'side-b'], exists)
440
+ expect(plan).toEqual([
441
+ { side: 'side-a', sourceDir: pathJoin(TMP, 'side-a'), found: true },
442
+ { side: 'side-b', sourceDir: pathJoin(TMP, 'side-b'), found: true },
443
+ ])
444
+ })
445
+
446
+ test('multi side, one missing → found=false (caller handles warn+skip)', () => {
447
+ const exists = (p: string) => p === pathJoin(TMP, 'side-a')
448
+ const plan = buildArchiveSidePlan(TMP, ['side-a', 'side-b'], exists)
449
+ expect(plan).toEqual([
450
+ { side: 'side-a', sourceDir: pathJoin(TMP, 'side-a'), found: true },
451
+ { side: 'side-b', sourceDir: pathJoin(TMP, 'side-b'), found: false },
452
+ ])
453
+ })
454
+
455
+ test('"." side does NOT trigger fallback when missing (found=false)', () => {
456
+ const plan = buildArchiveSidePlan(TMP, ['.'], _p => false)
457
+ expect(plan).toEqual([
458
+ { side: '.', sourceDir: TMP, found: false },
459
+ ])
460
+ })
461
+
462
+ test('empty sides array → empty plan', () => {
463
+ const plan = buildArchiveSidePlan(TMP, [], _p => true)
464
+ expect(plan).toEqual([])
465
+ })
466
+
467
+ test('three sides, middle missing', () => {
468
+ const exists = (p: string) =>
469
+ p === pathJoin(TMP, 'side-a') || p === pathJoin(TMP, 'side-c')
470
+ const plan = buildArchiveSidePlan(TMP, ['side-a', 'side-b', 'side-c'], exists)
471
+ expect(plan).toEqual([
472
+ { side: 'side-a', sourceDir: pathJoin(TMP, 'side-a'), found: true },
473
+ { side: 'side-b', sourceDir: pathJoin(TMP, 'side-b'), found: false },
474
+ { side: 'side-c', sourceDir: pathJoin(TMP, 'side-c'), found: true },
475
+ ])
476
+ })
477
+ })
478
+
479
+ // ═══════════════════════════════════════════════════════════════════════════
480
+ // buildPreparePlan
481
+ // ═══════════════════════════════════════════════════════════════════════════
482
+
483
+ import { buildPreparePlan } from './preflight'
484
+
485
+ const DECK_ONE_SKILL = `
486
+ [deck]
487
+ max_cards = 10
488
+ cold_pool = "~/.agents/skill-repos"
489
+
490
+ [tool.skills.pdf]
491
+ path = "github.com/anthropics/skills/skills/pdf"
492
+ `
493
+
494
+ const DECK_EMPTY = `
495
+ [deck]
496
+ max_cards = 5
497
+ `
498
+
499
+ const DECK_TWO_SKILLS = `
500
+ [deck]
501
+ max_cards = 10
502
+
503
+ [innate.skills.deck]
504
+ path = "github.com/lythos-labs/lythoskill/skills/lythoskill-deck"
505
+
506
+ [tool.skills.pdf]
507
+ path = "github.com/anthropics/skills/skills/pdf"
508
+ `
509
+
510
+ describe('buildPreparePlan', () => {
511
+
512
+ test('single skill deck → plan with 1 skill, hasSkills=true', () => {
513
+ const plan = buildPreparePlan({
514
+ deckPath: '/tmp/test-deck.toml',
515
+ deckContent: DECK_ONE_SKILL,
516
+ workDir: '/tmp/arena-test',
517
+ skillCount: 0,
518
+ })
519
+ expect(plan.skills).toHaveLength(1)
520
+ expect(plan.skills[0].name).toBe('pdf')
521
+ expect(plan.skills[0].section).toBe('tool')
522
+ expect(plan.hasSkills).toBe(true)
523
+ expect(plan.workDir).toBe('/tmp/arena-test')
524
+ expect(plan.deckPath).toBe('/tmp/test-deck.toml')
525
+ })
526
+
527
+ test('empty deck → skills=[], hasSkills=false', () => {
528
+ const plan = buildPreparePlan({
529
+ deckPath: '/tmp/empty.toml',
530
+ deckContent: DECK_EMPTY,
531
+ workDir: '/tmp/arena-empty',
532
+ skillCount: 0,
533
+ })
534
+ expect(plan.skills).toEqual([])
535
+ expect(plan.hasSkills).toBe(false)
536
+ })
537
+
538
+ test('two skills (innate + tool) → both parsed with correct sections', () => {
539
+ const plan = buildPreparePlan({
540
+ deckPath: '/tmp/two.toml',
541
+ deckContent: DECK_TWO_SKILLS,
542
+ workDir: '/tmp/arena-two',
543
+ skillCount: 0,
544
+ })
545
+ expect(plan.skills).toHaveLength(2)
546
+ expect(plan.skills[0]).toEqual({ name: 'deck', path: 'github.com/lythos-labs/lythoskill/skills/lythoskill-deck', section: 'innate' })
547
+ expect(plan.skills[1]).toEqual({ name: 'pdf', path: 'github.com/anthropics/skills/skills/pdf', section: 'tool' })
548
+ expect(plan.hasSkills).toBe(true)
549
+ })
550
+
551
+ test('AGENTS.md contains mandatory sections', () => {
552
+ const plan = buildPreparePlan({
553
+ deckPath: '/tmp/d.toml',
554
+ deckContent: DECK_ONE_SKILL,
555
+ workDir: '/tmp/arena-md',
556
+ skillCount: 0,
557
+ })
558
+ expect(plan.agentsMd).toContain('Arena Test Environment')
559
+ expect(plan.agentsMd).toContain('Setup Order')
560
+ expect(plan.agentsMd).toContain('decision-log.jsonl')
561
+ expect(plan.agentsMd).toContain('skill-deck.toml')
562
+ })
563
+
564
+ test('invalid TOML → skills=[], hasSkills=false (no crash)', () => {
565
+ const plan = buildPreparePlan({
566
+ deckPath: '/tmp/bad.toml',
567
+ deckContent: 'this is not toml {{{',
568
+ workDir: '/tmp/arena-bad',
569
+ skillCount: 0,
570
+ })
571
+ expect(plan.skills).toEqual([])
572
+ expect(plan.hasSkills).toBe(false)
573
+ })
574
+
575
+ test('deckContent is preserved in plan', () => {
576
+ const plan = buildPreparePlan({
577
+ deckPath: '/tmp/d.toml',
578
+ deckContent: DECK_ONE_SKILL,
579
+ workDir: '/tmp/arena-preserve',
580
+ skillCount: 0,
581
+ })
582
+ expect(plan.deckContent).toBe(DECK_ONE_SKILL)
583
+ })
584
+ })
package/src/preflight.ts CHANGED
@@ -195,6 +195,104 @@ export function resolveColdPoolDir(
195
195
  return raw.startsWith('~') ? `${homeDir}${raw.slice(1)}` : raw
196
196
  }
197
197
 
198
+ // ── buildArchiveSidePlan ──────────────────────────────────────────────────
199
+
200
+ /**
201
+ * A single side's source mapping in an archive plan.
202
+ * Pure data — no IO, no console.
203
+ */
204
+ export interface ArchiveSideEntry {
205
+ side: string
206
+ sourceDir: string
207
+ found: boolean
208
+ }
209
+
210
+ /**
211
+ * Build the per-side source directory plan for archive.
212
+ *
213
+ * Pure: strings + existence function → ArchiveSideEntry[].
214
+ * IO (`existsSync`) is injected via `existsFn` — test with mock, run with real.
215
+ *
216
+ * Single-side fallback: when --sides specifies exactly one named side and its
217
+ * subdirectory doesn't exist (agent put files in workdir root, prepare-workdir
218
+ * didn't create per-side dirs), fall back to `fromDir` as source (found=true).
219
+ *
220
+ * Default (no --sides): sides = ['.'] → sourceDir = fromDir.
221
+ */
222
+ export function buildArchiveSidePlan(
223
+ fromDir: string,
224
+ sides: string[],
225
+ existsFn: (path: string) => boolean
226
+ ): ArchiveSideEntry[] {
227
+ const plan: ArchiveSideEntry[] = []
228
+ for (const side of sides) {
229
+ let sourceDir = side === '.' ? fromDir : join(fromDir, side)
230
+ let found = existsFn(sourceDir)
231
+ if (!found && sides.length === 1 && side !== '.') {
232
+ sourceDir = fromDir
233
+ found = true
234
+ }
235
+ plan.push({ side, sourceDir, found })
236
+ }
237
+ return plan
238
+ }
239
+
240
+ // ── buildPreparePlan ─────────────────────────────────────────────────────
241
+
242
+ /**
243
+ * Plan-only result for prepare-workdir — what WOULD be created.
244
+ * Pure data, no IO. Caller renders this before executing.
245
+ */
246
+ export interface PreparePlan {
247
+ deckPath: string
248
+ deckContent: string
249
+ workDir: string
250
+ skills: SkillDecl[]
251
+ hasSkills: boolean
252
+ agentsMd: string
253
+ }
254
+
255
+ /**
256
+ * Build the prepare-workdir plan from raw inputs.
257
+ *
258
+ * Pure computation: deck path + content → what workdir would contain.
259
+ * Caller does IO (reading deck, computing timestamp) and injects results.
260
+ */
261
+ export function buildPreparePlan(params: {
262
+ deckPath: string
263
+ deckContent: string
264
+ workDir: string
265
+ skillCount: number
266
+ brief?: string
267
+ }): PreparePlan {
268
+ let deckParsed: Record<string, any> = {}
269
+ try { deckParsed = Bun.TOML.parse(params.deckContent) as Record<string, any> } catch {}
270
+ const skills = parseDeckSkills(deckParsed)
271
+ const hasSkills = skills.length > 0
272
+
273
+ const agentsMd = [
274
+ '# Arena Test Environment',
275
+ '**Mode**: agent-orchestrated cell',
276
+ '',
277
+ '## Setup Order (why this sequence)',
278
+ '1. `skill-deck.toml` copied here → declares which skills you can use',
279
+ '2. `deck link` runs → cold pool skills become visible in `.claude/skills/`',
280
+ '3. Skill existence checked → warns if any declared skill is missing from cold pool',
281
+ '4. `AGENTS.md` written last → confirms setup succeeded before agent starts',
282
+ 'If setup fails mid-sequence, the workdir is incomplete and nothing runs.',
283
+ '',
284
+ '## How This Works',
285
+ '- Write ALL output files to this directory (CWD).',
286
+ '- Use available skills — check `ls .claude/skills/`.',
287
+ '',
288
+ '## Output Contract',
289
+ '- MANDATORY: `decision-log.jsonl` — one JSON line per decision:',
290
+ ' `{"t":<seconds>,"phase":"setup|content|design|output","decision":"...","reason":"..."}`',
291
+ ].join('\n')
292
+
293
+ return { deckPath: params.deckPath, deckContent: params.deckContent, workDir: params.workDir, skills, hasSkills, agentsMd }
294
+ }
295
+
198
296
  // ── formatSkillWarnings ──────────────────────────────────────────────────
199
297
 
200
298
  /**