npm - @lythos/skill-arena - Versions diffs - 0.13.3 → 0.14.1 - Mend

@lythos/skill-arena 0.13.3 → 0.14.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (5) hide show

package/README.md CHANGED Viewed

@@ -1,140 +1,155 @@
 # @lythos/skill-arena
-![CI](https://img.shields.io/badge/CI-41%20unit%20tests-brightgreen) ![Intent/Plan](https://img.shields.io/badge/arch-intent%2Fplan%2Fexecute-8A2BE2)
+> Controlled-variable benchmark for AI agent skills. Test single decks or compare A/B — agent-orchestrated by default, cross-player when you need it.
-> Controlled-variable benchmark for AI agent skills. Compare skills, decks, or configurations on the same task — single-skill A/B or full-deck Pareto frontier analysis. Now with declarative `arena.toml` (k8s-manifest style) and deterministic Pareto frontier.
+## Modes at a Glance
-## Why
+| Mode | How | When |
+|------|-----|------|
+| **Agent-Orchestrated** (DEFAULT) | Agent tool spawns subagents, parallel dispatch, native judge | Single deck test, cross-deck A/B comparison |
+| **Cross-Player** (OPT-IN) | CLI runner spawns different agent binaries via Bun.spawn | Comparing kimi vs codex vs claude |
-"Which skill is better?" is the wrong question. The right question is "which skill is better for what."
-`skill-arena` scaffolds isolated environments where subagents complete the same task under different decks. A judge agent scores outputs across multiple dimensions. Supports:
-- **Mode 1**: Single-skill comparison (controlled variable — same helper skills, different test skill).
-- **Mode 2**: Full-deck comparison (Pareto frontier — no single winner, only optimal trade-offs).
-## Prerequisites
-Arena runs AI agents as subprocesses. You need at least one agent CLI installed:
-### Kimi CLI (recommended default)
-Kimi Code CLI is the default player for arena — it has reliable headless execution with eager tool loading (no deferred tool deadlock).
-```bash
-# Install via uv (recommended) — uv is Python's bunx equivalent
-uv tool install kimi-cli
-# Or run without installing:
-uvx kimi-cli --print -p "hello"
-# Authenticate
-kimi login
-# Or set API key:
-export KIMI_API_KEY=your_key
-```
-Docs: [https://github.com/MoonshotAI/kimi-cli](https://github.com/MoonshotAI/kimi-cli)
-### Claude CLI (secondary)
-```bash
-npm install -g @anthropic-ai/claude-code
-claude --version  # should be ≥ 2.1.128
-```
-Note: Claude `-p` mode has known issues with web tools in Bun.spawn (deferred tool deadlock). Kimi is the default for reliability.
+**95% of arena use is agent-orchestrated.** The Agent tool can spawn parallel subagents with isolated workdirs and different decks — zero CLI. Cross-player mode is ONLY needed when comparing different agent CLIs (the Agent tool can only spawn same-type agents).
 ## Install
 ```bash
 bun add -d @lythos/skill-arena
 # or use directly
-bunx @lythos/skill-arena@0.13.3 <command>
+bunx @lythos/skill-arena@0.14.1 <command>
 ```
 ## Quick Start
 ```bash
-# Single: test a deck with one agent (most common)
+# single — test one deck (most common)
 bunx @lythos/skill-arena@latest single \
   --deck ./examples/decks/scout.toml \
   --brief "Generate auth flow diagram" \
-  --player kimi \
-  --timeout 300000 \
   --out ./output
-# Single with remote deck (URL auto-fetched)
-bunx @lythos/skill-arena@latest single \
-  --deck https://raw.githubusercontent.com/lythos-labs/lythoskill/main/examples/decks/scout.toml \
-  --brief "Generate auth flow diagram" \
-  --out ./output
-# Vs: compare multiple decks side by side
-curl -fsSL https://raw.githubusercontent.com/lythos-labs/lythoskill/main/examples/arena/research-compare/arena.toml > arena.toml
+# cross-deck vs — compare two decks (agent-orchestrated)
+# Create arena.toml declaring sides with different decks, then:
 bunx @lythos/skill-arena@latest vs --config ./arena.toml
+# cross-player vs — compare kimi vs codex (CLI only)
+bunx @lythos/skill-arena@latest vs --config ./arena.toml --player kimi
 ```
-**Default behavior:**
-- Agent runs in an isolated `/tmp` workdir (no workspace pollution)
-- All artifacts are copied to `--out` after completion
-- Prompt template injects fixed contract (decision-log, robustness, tool preference) + your brief as variable
+**What happens**: Agent creates isolated `/tmp` workdir per side, `deck link` skills, spawns parallel subagents, collects artifacts, judge scores outputs. Parent deck restored after.
 ## Commands
-### Declarative mode (k8s-style, recommended)
+### `single` — one deck, one task
 ```bash
-# Print execution plan without running
-bunx @lythos/skill-arena@0.13.3 vs --config arena.toml --dry-run
-# Execute with per-side runs_per_side and statistical aggregation
-bunx @lythos/skill-arena@0.13.3 vs --config arena.toml
+bunx @lythos/skill-arena@latest single \
+  --deck ./deck.toml \
+  --brief "Produce a .docx report with radar chart" \
+  --timeout 600000 \
+  --out ./output
 ```
-### Scaffold mode (legacy, manual execution)
+### `vs` — multi-deck comparison
+```bash
+bunx @lythos/skill-arena@latest vs --config ./arena.toml
+bunx @lythos/skill-arena@latest vs --config ./arena.toml --dry-run
 ```
-bunx @lythos/skill-arena@0.13.3 scaffold --task "Generate auth flow diagram" \
-  --decks https://raw.githubusercontent.com/lythos-labs/lythoskill/main/examples/decks/scout.toml,https://raw.githubusercontent.com/lythos-labs/lythoskill/main/examples/decks/documents.toml
+### `scaffold` — legacy directory setup
+```bash
+bunx @lythos/skill-arena@latest scaffold \
+  --task "Generate auth flow diagram" \
+  --decks "./decks/minimal.toml,./decks/rich.toml"
 ```
-### Viz
+### `prepare-workdir` — isolate + link skills (agent-orchestrated)
 ```bash
-bunx @lythos/skill-arena@0.13.3 viz runs/arena-<id>/
+bunx @lythos/skill-arena@latest prepare-workdir \
+  --deck ./skill-deck.toml \
+  --out /tmp/arena-side-a \
+  --brief "task description"
+# Plan-first: review before executing
+bunx @lythos/skill-arena@latest prepare-workdir \
+  --deck ./skill-deck.toml \
+  --out /tmp/arena-side-a \
+  --brief "task" \
+  --dry-run
 ```
-## Skill Documentation
+Creates `/tmp`-isolated workdir with deck copied, AGENTS.md written, and `deck link` run. `--dry-run` prints the plan (skills, workdir path, link needed) without creating anything.
-This package is the **Starter** layer (CLI implementation).
-The agent-visible **Skill** layer documentation is here:
-[packages/lythoskill-arena/skill/SKILL.md](../../packages/lythoskill-arena/skill/SKILL.md)
+### `archive` — collect agent outputs (agent-orchestrated)
-## Architecture
+```bash
+bunx @lythos/skill-arena@latest archive \
+  --from /tmp/arena-side-a \
+  --to ./playground/output \
+  --sides side-a
+# Plan-first: review what would be copied
+bunx @lythos/skill-arena@latest archive \
+  --from /tmp/arena-side-a \
+  --to ./playground/output \
+  --sides side-a \
+  --dry-run
+```
-Part of the [lythoskill](https://github.com/lythos-labs/lythoskill) ecosystem — the thin-skill pattern separates heavy logic (this npm package) from lightweight agent instructions (SKILL.md).
+Copies agent artifacts from workdir(s) to output, skipping internal files (`.claude`, `skill-deck.toml`, `skill-deck.lock`, `AGENTS.md`). Single-side archives fall back to workdir root when the named side subdirectory doesn't exist. `--dry-run` shows the per-side plan before copying.
+### `viz` — render results
+```bash
+bunx @lythos/skill-arena@latest viz runs/arena-<id>/
 ```
-Starter (this package) → npm publish → bunx @lythos/skill-arena@0.13.3 ...
-Skill   (packages/<name>/skill/)     → build → SKILL.md + thin scripts
-Output  (skills/<name>/)             → git commit → agent-visible skill
+## Parameters
+| Flag | Command | Description |
+|------|---------|-------------|
+| `--brief "<text>"` | single | Inline task brief |
+| `--deck <path\|url>` | single | Deck file (URL auto-fetched) |
+| `--player <name>` | single, vs | Only for cross-player: kimi\|codex\|deepseek\|claude |
+| `--timeout <ms>` | single | Subagent timeout (300000–600000 for complex tasks) |
+| `--from <dir>` | archive | Source workdir |
+| `--to <dir>` | archive | Output directory |
+| `--sides <names>` | archive | Comma-separated side names (default: `.`) |
+| `--out <dir>` | single, vs, prepare-workdir | Output / workdir directory |
+| `--config <path>` | vs | arena.toml |
+| `--dry-run` | vs, prepare-workdir, archive | Print plan without execution |
+## Prerequisites (cross-player only)
+For cross-player mode, install at least one agent CLI:
+```bash
+uv tool install kimi-cli           # kimi (recommended default)
+npm i -g @openai/codex             # codex
+# deepseek: bundled with desktop app or pip install deepseek-cli
+# claude: set ANTHROPIC_API_KEY (SDK, no CLI binary needed)
 ```
-### Runtime architecture (intent/plan/execute)
+## Skill Documentation
+The agent-visible skill layer: [skill/SKILL.md](./skill/SKILL.md)
+## Architecture
 ```
 arena.toml  →  ArenaToml (Zod)  →  ExecutionPlan (pure)  →  per-cell agent spawn (IO)
-                                    ↓
-                aggregateAllStats (pure)  ←  verdicts[]
-                                    ↓
-                runComparativeJudge (IO)  →  report.md + Pareto frontier
+                                   ↓
+               aggregateAllStats (pure)  ←  verdicts[]
+                                   ↓
+               runComparativeJudge (IO)  →  report.md + Pareto frontier
 ```
-- **Intent**: `arena.toml` declarative config (k8s-manifest style)
+- **Intent**: `arena.toml` declarative config
 - **Plan**: `buildExecutionPlan()`, `aggregateSideStats()`, `computePareto()` — pure functions
-- **Execute**: `runAgentScenario` per cell, `runComparativeJudge` — IO via `AgentAdapter`
-Built on `@lythos/test-utils` shared infrastructure.
+- **Execute**: Agent tool spawn (agent-orchestrated) or `AgentAdapter` (cross-player)
 ## License

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@lythos/skill-arena",
-  "version": "0.13.3",
+  "version": "0.14.1",
   "description": "Skill Arena — benchmark skill effectiveness with controlled-variable comparison",
   "keywords": [
     "ai-agent",
@@ -42,13 +42,13 @@
     "bun": ">=1.0.0"
   },
   "dependencies": {
-    "@lythos/cold-pool": "^0.13.3",
-    "@lythos/infra": "^0.13.3",
-    "@lythos/test-utils": "^0.13.3",
+    "@lythos/cold-pool": "^0.14.1",
+    "@lythos/infra": "^0.14.1",
+    "@lythos/test-utils": "^0.14.1",
     "zod": "^3.24.0",
     "zod-to-json-schema": "^3.25.2"
   },
   "optionalDependencies": {
-    "@lythos/agent-adapter-claude-sdk": "^0.13.3"
+    "@lythos/agent-adapter-claude-sdk": "^0.14.1"
   }
 }

package/src/cli.ts CHANGED Viewed

@@ -5,7 +5,7 @@ import { homedir, tmpdir } from 'node:os'
 import { ZodError } from 'zod'
 import { formatPlanOutput, type ArenaResult, buildArenaPrompt } from './runner'
 import { parseArenaToml, buildExecutionPlan } from './arena-toml'
-import { buildCopyPlan, parseDeckSkills } from './preflight'
+import { buildArchiveSidePlan, buildCopyPlan, buildPreparePlan, parseDeckSkills } from './preflight'
 import { checkSkillExistence, formatSkillWarnings, resolveColdPoolDir } from './preflight'
 // ─── fetchWithProxy (infra dependency, no package boundary) ─────────────────
@@ -100,6 +100,8 @@ Examples:
   lythoskill-arena vs --config arena.toml --dry-run
   lythoskill-arena vs --config arena.toml
   lythoskill-arena viz runs/arena-20260504
+  lythoskill-arena prepare-workdir --deck ./decks/scout.toml --out /tmp/arena-20260517-side-a
+  lythoskill-arena archive --from /tmp/arena-20260517 --to playground/arena-20260517 --sides side-a,side-b
 `)
     process.exit(0)
   }
@@ -113,6 +115,8 @@ function cli(args: string[]) {
   if (cmd === 'vs' || cmd === 'compare') return vsRun(rest)
   if (cmd === 'single' || cmd === 'run') return singleRun(rest)
   if (cmd === 'viz') return vizRun(rest)
+  if (cmd === 'prepare-workdir') return prepareWorkdir(rest)
+  if (cmd === 'archive') return archiveRun(rest)
   console.error(`Unknown command: ${cmd}`)
   process.exit(1)
@@ -431,6 +435,182 @@ async function vizRun(args: string[]) {
   console.log(`📈 Arena HTML report not yet implemented. See report.md in ${runsDir}/`)
 }
+// ═══════════════════════════════════════════════════════════════════════════
+// ── prepare-workdir: reusable workdir setup (used by both CLI and agent) ──
+// Intent: create an isolated arena workdir with deck linked and ready to run
+async function prepareWorkdir(args: string[]) {
+  const opts: Record<string, string | undefined> = {}
+  let dryRun = false
+  for (let i = 0; i < args.length; i++) {
+    if (args[i] === '--deck' || args[i] === '-d') opts.deck = args[++i]
+    else if (args[i] === '--out' || args[i] === '-o') opts.out = args[++i]
+    else if (args[i] === '--brief' || args[i] === '-b') opts.brief = args[++i]
+    else if (args[i] === '--dry-run') dryRun = true
+  }
+  if (!opts.deck) {
+    console.error(`❌ --deck <path> is required.
+   lythoskill-arena prepare-workdir --deck ./skill-deck.toml --out /tmp/arena-side-a`)
+    process.exit(1)
+  }
+  const deckPath = resolve(opts.deck)
+  if (!existsSync(deckPath)) {
+    console.error(`❌ Deck file not found: ${deckPath}`)
+    process.exit(1)
+  }
+  const workDir = opts.out
+    ? resolve(opts.out)
+    : join(tmpdir(), `arena-${Date.now()}`)
+  const deckContent = readFileSync(deckPath, 'utf-8')
+  // ── Plan (pure computation — what WOULD be created) ────────────────────
+  const plan = buildPreparePlan({
+    deckPath,
+    deckContent,
+    workDir,
+    skillCount: 0, // computed inside from deckContent
+    brief: opts.brief,
+  })
+  console.log('📋 Prepare plan:')
+  console.log(`   deck:    ${plan.deckPath}`)
+  console.log(`   workdir: ${plan.workDir}`)
+  console.log(`   skills:  ${plan.skills.length} declared (${plan.skills.map(s => s.name).join(', ') || 'none'})`)
+  console.log(`   link:    ${plan.hasSkills ? 'Bun.spawn deck link' : 'skip (no skills)'}`)
+  console.log(`   AGENTS.md: write (${plan.agentsMd.split('\n').length} lines)`)
+  if (opts.brief) console.log(`   brief:   ${opts.brief!.slice(0, 60)}...`)
+  if (dryRun) {
+    console.log(`\n🏁 Dry-run complete (no files created). Remove --dry-run to execute.`)
+    return
+  }
+  // ── Execute: create workdir ──────────────────────────────────────────
+  mkdirSync(workDir, { recursive: true })
+  writeFileSync(join(workDir, 'skill-deck.toml'), deckContent)
+  writeFileSync(join(workDir, 'AGENTS.md'), plan.agentsMd)
+  if (plan.hasSkills) {
+    const { existsSync: es2 } = await import('node:fs')
+    const localDeckCli = join(import.meta.dir, '..', '..', 'lythoskill-deck', 'src', 'cli.ts')
+    const linkCmd = es2(localDeckCli)
+      ? ['bun', localDeckCli, 'link']
+      : ['bunx', '@lythos/skill-deck', 'link']
+    const linkProc = Bun.spawn(linkCmd,
+      { cwd: workDir, env: { ...process.env, HOME: process.env.HOME! } },
+    )
+    await linkProc.exited
+    const linkStderr = await new Response(linkProc.stderr).text()
+    const linkResult = validateLinkResult(linkProc.exitCode, linkStderr)
+    if (!linkResult.ok) {
+      console.error(`❌ ${linkResult.error}`)
+      process.exit(1)
+    }
+  } else {
+    console.log('ℹ️  No skills declared in deck — skipping link')
+  }
+  // Skill existence check
+  try {
+    const coldPoolDefault = join(homedir(), '.agents', 'skill-repos')
+    const coldPoolDir = resolveColdPoolDir(Bun.TOML.parse(deckContent)?.deck?.cold_pool, homedir(), coldPoolDefault)
+    const checks = checkSkillExistence(plan.skills, coldPoolDir, existsSync)
+    for (const warning of formatSkillWarnings(checks)) {
+      console.warn(`⚠️  ${warning}`)
+    }
+  } catch (e) {
+    console.warn('⚠️  Could not check skill existence:', e instanceof Error ? e.message : e)
+  }
+  console.log(`✅ Workdir ready → ${workDir}`)
+  console.log(`   deck: ${deckPath}`)
+  if (opts.brief) console.log(`   brief: ${opts.brief!.slice(0, 60)}...`)
+}
+// ═══════════════════════════════════════════════════════════════════════════
+// ── archive: copy agent outputs from workdir(s) to outDir ─────────────────
+// Intent: same copy behavior as CLI singleRun, reusable for agent-orchestrated
+async function archiveRun(args: string[]) {
+  const opts: Record<string, string | undefined> = {}
+  let dryRun = false
+  for (let i = 0; i < args.length; i++) {
+    if (args[i] === '--from' || args[i] === '-f') opts.from = args[++i]
+    else if (args[i] === '--to' || args[i] === '-o') opts.to = args[++i]
+    else if (args[i] === '--sides') opts.sides = args[++i]
+    else if (args[i] === '--report') opts.report = args[++i]
+    else if (args[i] === '--dry-run') dryRun = true
+  }
+  if (!opts.from || !opts.to) {
+    console.error(`❌ --from <workdir> and --to <outdir> are required.
+   lythoskill-arena archive --from /tmp/arena-20260517 --to playground/arena-20260517 --sides side-a,side-b --report ./report.md`)
+    process.exit(1)
+  }
+  const fromDir = resolve(opts.from)
+  const outDir = resolve(opts.to)
+  const sides = opts.sides ? opts.sides.split(',') : ['.']
+  const plan = buildArchiveSidePlan(fromDir, sides, existsSync)
+  // ── Plan output (always shown, also serves as dry-run) ──────────────────
+  console.log('📋 Archive plan:')
+  for (const pe of plan) {
+    if (!pe.found) {
+      console.log(`   ⚠️  ${pe.side}: not found (${pe.sourceDir}) — will skip`)
+    } else if (pe.sourceDir === fromDir && pe.side !== '.') {
+      console.log(`   ${pe.side}: ${pe.sourceDir} (fallback → root) → ${join(outDir, pe.side)}`)
+    } else {
+      console.log(`   ${pe.side}: ${pe.sourceDir} → ${join(outDir, pe.side)}`)
+    }
+  }
+  if (dryRun) {
+    console.log(`\n🏁 Dry-run complete (no files copied). Remove --dry-run to execute.`)
+    return
+  }
+  // ── Execute: copy files ───────────────────────────────────────────────
+  mkdirSync(outDir, { recursive: true })
+  if (opts.report && existsSync(resolve(opts.report))) {
+    const { cpSync: cpR } = await import('node:fs')
+    cpR(resolve(opts.report), join(outDir, 'report.md'))
+    console.log(`📄 report.md → ${outDir}/report.md`)
+  }
+  const { cpSync, readdirSync } = await import('node:fs')
+  const skipSet = new Set(['.claude', 'skill-deck.toml', 'skill-deck.lock', 'AGENTS.md'])
+  for (const planEntry of plan) {
+    if (!planEntry.found) {
+      console.warn(`⚠️  Side workdir not found: ${planEntry.sourceDir}`)
+      continue
+    }
+    const sideOutDir = join(outDir, planEntry.side)
+    mkdirSync(sideOutDir, { recursive: true })
+    const entries = readdirSync(planEntry.sourceDir, { withFileTypes: true })
+    for (const entry of entries) {
+      if (skipSet.has(entry.name)) continue
+      const src = join(planEntry.sourceDir, entry.name)
+      const dest = join(sideOutDir, entry.name)
+      try {
+        cpSync(src, dest, { recursive: entry.isDirectory() })
+        console.log(`   ${planEntry.side}/${entry.name} → ${dest}`)
+      } catch (e) {
+        console.warn(`⚠️  Failed to copy ${planEntry.side}/${entry.name}: ${e instanceof Error ? e.message : e}`)
+      }
+    }
+  }
+  console.log(`✅ Archive complete → ${outDir}`)
+}
 // ── Entry point ────────────────────────────────────────────────────────────
 if (import.meta.main) {
   main().catch(err => {

package/src/preflight.test.ts CHANGED Viewed

@@ -393,3 +393,192 @@ describe('formatSkillWarnings', () => {
     expect(formatSkillWarnings(checks)[0]).toContain('[transient]')
   })
 })
+// ═══════════════════════════════════════════════════════════════════════════
+// buildArchiveSidePlan
+// ═══════════════════════════════════════════════════════════════════════════
+// ═══════════════════════════════════════════════════════════════════════════
+// buildArchiveSidePlan
+// ═══════════════════════════════════════════════════════════════════════════
+import { buildArchiveSidePlan } from './preflight'
+import { join as pathJoin } from 'node:path'
+const TMP = '/tmp/arena-test'
+describe('buildArchiveSidePlan', () => {
+  test('default: sides=["."] maps to fromDir', () => {
+    const plan = buildArchiveSidePlan(TMP, ['.'], _p => true)
+    expect(plan).toEqual([
+      { side: '.', sourceDir: TMP, found: true },
+    ])
+  })
+  test('single side, subdirectory exists → source = fromDir/side', () => {
+    const exists = (p: string) => p === pathJoin(TMP, 'side-a')
+    const plan = buildArchiveSidePlan(TMP, ['side-a'], exists)
+    expect(plan).toEqual([
+      { side: 'side-a', sourceDir: pathJoin(TMP, 'side-a'), found: true },
+    ])
+  })
+  test('single side, subdirectory MISSING → fallback to fromDir root', () => {
+    const plan = buildArchiveSidePlan(TMP, ['side-a'], _p => false)
+    expect(plan).toEqual([
+      { side: 'side-a', sourceDir: TMP, found: true },
+    ])
+  })
+  test('multi side, all subdirectories exist', () => {
+    const exists = (p: string) =>
+      p === pathJoin(TMP, 'side-a') || p === pathJoin(TMP, 'side-b')
+    const plan = buildArchiveSidePlan(TMP, ['side-a', 'side-b'], exists)
+    expect(plan).toEqual([
+      { side: 'side-a', sourceDir: pathJoin(TMP, 'side-a'), found: true },
+      { side: 'side-b', sourceDir: pathJoin(TMP, 'side-b'), found: true },
+    ])
+  })
+  test('multi side, one missing → found=false (caller handles warn+skip)', () => {
+    const exists = (p: string) => p === pathJoin(TMP, 'side-a')
+    const plan = buildArchiveSidePlan(TMP, ['side-a', 'side-b'], exists)
+    expect(plan).toEqual([
+      { side: 'side-a', sourceDir: pathJoin(TMP, 'side-a'), found: true },
+      { side: 'side-b', sourceDir: pathJoin(TMP, 'side-b'), found: false },
+    ])
+  })
+  test('"." side does NOT trigger fallback when missing (found=false)', () => {
+    const plan = buildArchiveSidePlan(TMP, ['.'], _p => false)
+    expect(plan).toEqual([
+      { side: '.', sourceDir: TMP, found: false },
+    ])
+  })
+  test('empty sides array → empty plan', () => {
+    const plan = buildArchiveSidePlan(TMP, [], _p => true)
+    expect(plan).toEqual([])
+  })
+  test('three sides, middle missing', () => {
+    const exists = (p: string) =>
+      p === pathJoin(TMP, 'side-a') || p === pathJoin(TMP, 'side-c')
+    const plan = buildArchiveSidePlan(TMP, ['side-a', 'side-b', 'side-c'], exists)
+    expect(plan).toEqual([
+      { side: 'side-a', sourceDir: pathJoin(TMP, 'side-a'), found: true },
+      { side: 'side-b', sourceDir: pathJoin(TMP, 'side-b'), found: false },
+      { side: 'side-c', sourceDir: pathJoin(TMP, 'side-c'), found: true },
+    ])
+  })
+})
+// ═══════════════════════════════════════════════════════════════════════════
+// buildPreparePlan
+// ═══════════════════════════════════════════════════════════════════════════
+import { buildPreparePlan } from './preflight'
+const DECK_ONE_SKILL = `
+[deck]
+max_cards = 10
+cold_pool = "~/.agents/skill-repos"
+[tool.skills.pdf]
+path = "github.com/anthropics/skills/skills/pdf"
+`
+const DECK_EMPTY = `
+[deck]
+max_cards = 5
+`
+const DECK_TWO_SKILLS = `
+[deck]
+max_cards = 10
+[innate.skills.deck]
+path = "github.com/lythos-labs/lythoskill/skills/lythoskill-deck"
+[tool.skills.pdf]
+path = "github.com/anthropics/skills/skills/pdf"
+`
+describe('buildPreparePlan', () => {
+  test('single skill deck → plan with 1 skill, hasSkills=true', () => {
+    const plan = buildPreparePlan({
+      deckPath: '/tmp/test-deck.toml',
+      deckContent: DECK_ONE_SKILL,
+      workDir: '/tmp/arena-test',
+      skillCount: 0,
+    })
+    expect(plan.skills).toHaveLength(1)
+    expect(plan.skills[0].name).toBe('pdf')
+    expect(plan.skills[0].section).toBe('tool')
+    expect(plan.hasSkills).toBe(true)
+    expect(plan.workDir).toBe('/tmp/arena-test')
+    expect(plan.deckPath).toBe('/tmp/test-deck.toml')
+  })
+  test('empty deck → skills=[], hasSkills=false', () => {
+    const plan = buildPreparePlan({
+      deckPath: '/tmp/empty.toml',
+      deckContent: DECK_EMPTY,
+      workDir: '/tmp/arena-empty',
+      skillCount: 0,
+    })
+    expect(plan.skills).toEqual([])
+    expect(plan.hasSkills).toBe(false)
+  })
+  test('two skills (innate + tool) → both parsed with correct sections', () => {
+    const plan = buildPreparePlan({
+      deckPath: '/tmp/two.toml',
+      deckContent: DECK_TWO_SKILLS,
+      workDir: '/tmp/arena-two',
+      skillCount: 0,
+    })
+    expect(plan.skills).toHaveLength(2)
+    expect(plan.skills[0]).toEqual({ name: 'deck', path: 'github.com/lythos-labs/lythoskill/skills/lythoskill-deck', section: 'innate' })
+    expect(plan.skills[1]).toEqual({ name: 'pdf', path: 'github.com/anthropics/skills/skills/pdf', section: 'tool' })
+    expect(plan.hasSkills).toBe(true)
+  })
+  test('AGENTS.md contains mandatory sections', () => {
+    const plan = buildPreparePlan({
+      deckPath: '/tmp/d.toml',
+      deckContent: DECK_ONE_SKILL,
+      workDir: '/tmp/arena-md',
+      skillCount: 0,
+    })
+    expect(plan.agentsMd).toContain('Arena Test Environment')
+    expect(plan.agentsMd).toContain('Setup Order')
+    expect(plan.agentsMd).toContain('decision-log.jsonl')
+    expect(plan.agentsMd).toContain('skill-deck.toml')
+  })
+  test('invalid TOML → skills=[], hasSkills=false (no crash)', () => {
+    const plan = buildPreparePlan({
+      deckPath: '/tmp/bad.toml',
+      deckContent: 'this is not toml {{{',
+      workDir: '/tmp/arena-bad',
+      skillCount: 0,
+    })
+    expect(plan.skills).toEqual([])
+    expect(plan.hasSkills).toBe(false)
+  })
+  test('deckContent is preserved in plan', () => {
+    const plan = buildPreparePlan({
+      deckPath: '/tmp/d.toml',
+      deckContent: DECK_ONE_SKILL,
+      workDir: '/tmp/arena-preserve',
+      skillCount: 0,
+    })
+    expect(plan.deckContent).toBe(DECK_ONE_SKILL)
+  })
+})

package/src/preflight.ts CHANGED Viewed

@@ -195,6 +195,104 @@ export function resolveColdPoolDir(
   return raw.startsWith('~') ? `${homeDir}${raw.slice(1)}` : raw
 }
+// ── buildArchiveSidePlan ──────────────────────────────────────────────────
+/**
+ * A single side's source mapping in an archive plan.
+ * Pure data — no IO, no console.
+ */
+export interface ArchiveSideEntry {
+  side: string
+  sourceDir: string
+  found: boolean
+}
+/**
+ * Build the per-side source directory plan for archive.
+ *
+ * Pure: strings + existence function → ArchiveSideEntry[].
+ * IO (`existsSync`) is injected via `existsFn` — test with mock, run with real.
+ *
+ * Single-side fallback: when --sides specifies exactly one named side and its
+ * subdirectory doesn't exist (agent put files in workdir root, prepare-workdir
+ * didn't create per-side dirs), fall back to `fromDir` as source (found=true).
+ *
+ * Default (no --sides): sides = ['.'] → sourceDir = fromDir.
+ */
+export function buildArchiveSidePlan(
+  fromDir: string,
+  sides: string[],
+  existsFn: (path: string) => boolean
+): ArchiveSideEntry[] {
+  const plan: ArchiveSideEntry[] = []
+  for (const side of sides) {
+    let sourceDir = side === '.' ? fromDir : join(fromDir, side)
+    let found = existsFn(sourceDir)
+    if (!found && sides.length === 1 && side !== '.') {
+      sourceDir = fromDir
+      found = true
+    }
+    plan.push({ side, sourceDir, found })
+  }
+  return plan
+}
+// ── buildPreparePlan ─────────────────────────────────────────────────────
+/**
+ * Plan-only result for prepare-workdir — what WOULD be created.
+ * Pure data, no IO. Caller renders this before executing.
+ */
+export interface PreparePlan {
+  deckPath: string
+  deckContent: string
+  workDir: string
+  skills: SkillDecl[]
+  hasSkills: boolean
+  agentsMd: string
+}
+/**
+ * Build the prepare-workdir plan from raw inputs.
+ *
+ * Pure computation: deck path + content → what workdir would contain.
+ * Caller does IO (reading deck, computing timestamp) and injects results.
+ */
+export function buildPreparePlan(params: {
+  deckPath: string
+  deckContent: string
+  workDir: string
+  skillCount: number
+  brief?: string
+}): PreparePlan {
+  let deckParsed: Record<string, any> = {}
+  try { deckParsed = Bun.TOML.parse(params.deckContent) as Record<string, any> } catch {}
+  const skills = parseDeckSkills(deckParsed)
+  const hasSkills = skills.length > 0
+  const agentsMd = [
+    '# Arena Test Environment',
+    '**Mode**: agent-orchestrated cell',
+    '',
+    '## Setup Order (why this sequence)',
+    '1. `skill-deck.toml` copied here → declares which skills you can use',
+    '2. `deck link` runs → cold pool skills become visible in `.claude/skills/`',
+    '3. Skill existence checked → warns if any declared skill is missing from cold pool',
+    '4. `AGENTS.md` written last → confirms setup succeeded before agent starts',
+    'If setup fails mid-sequence, the workdir is incomplete and nothing runs.',
+    '',
+    '## How This Works',
+    '- Write ALL output files to this directory (CWD).',
+    '- Use available skills — check `ls .claude/skills/`.',
+    '',
+    '## Output Contract',
+    '- MANDATORY: `decision-log.jsonl` — one JSON line per decision:',
+    '  `{"t":<seconds>,"phase":"setup|content|design|output","decision":"...","reason":"..."}`',
+  ].join('\n')
+  return { deckPath: params.deckPath, deckContent: params.deckContent, workDir: params.workDir, skills, hasSkills, agentsMd }
+}
 // ── formatSkillWarnings ──────────────────────────────────────────────────
 /**