@kradle/cli 0.0.7 → 0.0.8

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -2,19 +2,29 @@
2
2
 
3
3
  Kradle's CLI for managing Minecraft challenges, evaluations, agents, and more!
4
4
 
5
+ * [Installation](#installation)
6
+ * [Autocomplete](#autocomplete)
7
+ * [Configuration](#configuration)
8
+ * [Challenge](#challenge-commands)
9
+ * [Evaluations](#evaluation-commands)
10
+ * [Publishing a New Version](#publishing-a-new-version)
11
+ * [Development](#development)
12
+ * [Architecture](#architecture)
13
+
5
14
  ## Installation
6
15
 
7
16
  1. Install Kradle's CLI globally
8
17
  ```
9
18
  npm i -g @kradle/cli
10
19
  ```
11
- 2. Initialize a new project
20
+ 2. Initialize a new directory to store challenges and evaluations
12
21
  ```
13
22
  kradle init
14
23
  ```
15
- 3. Congrats 🎉 You can now create a new challenge:
24
+ 3. Congrats 🎉 You can now create a new challenge or a new evaluation:
16
25
  ```
17
26
  kradle challenge create <challenge-name>
27
+ kradle evaluation create <evaluation-name>
18
28
  ```
19
29
 
20
30
  In addition, you can enable [autocomplete](#Autocomplete).
@@ -35,19 +45,6 @@ After setup, you will be able to use Tab to autocomplete:
35
45
  kradle challenge <TAB> # Shows: build, create, list, run, upload, watch, etc.
36
46
  ```
37
47
 
38
- ## Configuration
39
-
40
- The `.env` should have the following variables:
41
-
42
- ```env
43
- WEB_API_URL=https://api.kradle.ai
44
- WEB_URL=https://kradle.ai
45
- STUDIO_API_URL=http://localhost:8080
46
- STUDIO_URL=kradle-studio://
47
- KRADLE_API_KEY=your-api-key
48
- KRADLE_CHALLENGES_PATH=~/Documents/kradle-studio/challenges
49
- ```
50
-
51
48
  ## Challenge Commands
52
49
 
53
50
  ### Create Challenge
@@ -125,28 +122,60 @@ kradle challenge multi-upload
125
122
 
126
123
  Provides an interactive UI to select multiple challenges and uploads them in parallel.
127
124
 
128
- ## Evaluations commands
129
-
130
- Plan and execute batches of runs across challenges/agents, with resumable iterations and a TUI.
131
-
132
- - **Init**: scaffold an evaluation config `evaluations/<name>/config.ts`
133
- ```bash
134
- kradle evaluation init <name>
135
- ```
136
- - **List**: list local evaluations
137
- ```bash
138
- kradle evaluation list
139
- ```
140
- - **Run**: execute or resume an evaluation (iterations stored under `evaluations/<name>/iterations/`)
141
- ```bash
142
- kradle evaluation run <name> [--new] [--max-concurrent N]
143
- ```
144
-
145
- Features:
146
- - Iterations: `--new` starts a new iteration; otherwise resumes the latest.
147
- - Resumable state: progress is persisted per iteration; in-flight runs are re-polled on resume, completed runs stay completed.
148
- - Ink TUI: live status counts, elapsed times, scrollable run list; keys `q/Ctrl+C` quit, `↑/↓/j/k` move, `o` open run URL.
149
- - Per-iteration manifest: generated from the evaluation `config.ts` into `manifest.json` before runs start.
125
+ ## Evaluation Commands
126
+
127
+ Evaluations allow you to run batches of challenge runs with different agents and configurations, then analyze the results. This is useful for benchmarking agents, testing challenge difficulty, or gathering statistics across many runs.
128
+
129
+ ### Concepts
130
+
131
+ **Evaluation**: A named collection of run configurations defined in a `config.ts` file. Each evaluation lives in `evaluations/<name>/`.
132
+
133
+ **Iteration**: A snapshot of an evaluation execution. When you run an evaluation, it creates an iteration containing:
134
+ - A copy of the `config.ts` at that point in time
135
+ - A `manifest.json` with the generated list of runs
136
+ - A `progress.json` tracking the status of each run
137
+
138
+ Iterations are stored in `evaluations/<name>/iterations/001/`, `002/`, etc. This allows you to:
139
+ - Resume an interrupted evaluation from where it left off
140
+ - Re-run the same evaluation with `--new` to create a fresh iteration
141
+ - Compare results across different iterations
142
+
143
+ ### Create Evaluation
144
+
145
+ Create a new evaluation with a template config file:
146
+
147
+ ```bash
148
+ kradle evaluation create <name>
149
+ ```
150
+
151
+ This creates `evaluations/<name>/config.ts` with a template that you can customize. The config exports a `main()` function that returns a manifest with:
152
+ - `runs`: Array of run configurations (challenge + participants)
153
+ - `tags`: Optional tags applied to all runs for filtering in analytics
154
+
155
+ ### Run Evaluation
156
+
157
+ Execute or resume an evaluation:
158
+
159
+ ```bash
160
+ kradle evaluation run <name> # Resume current iteration or create first one
161
+ kradle evaluation run <name> --new # Start a new iteration
162
+ kradle evaluation run <name> --max-concurrent 10 # Control parallelism (default: 5)
163
+ ```
164
+
165
+ The run command:
166
+ 1. Creates a new iteration (or resumes the current one)
167
+ 2. Generates a manifest by executing `config.ts`
168
+ 3. Displays an interactive TUI showing run progress
169
+ 4. Saves progress periodically (allows resuming if interrupted)
170
+ 5. Opens Metabase dashboard with results when complete
171
+
172
+ ### List Evaluations
173
+
174
+ List all local evaluations:
175
+
176
+ ```bash
177
+ kradle evaluation list
178
+ ```
150
179
 
151
180
  ## Publishing a New Version
152
181
 
@@ -161,9 +190,6 @@ The CLI uses GitHub Actions for automated releases. To publish a new version:
161
190
  4. **Review and merge** the automatically created PR
162
191
  5. **Done!** The package is automatically published to npm when the PR is merged
163
192
 
164
- ### Setup (one-time)
165
-
166
- For the publish workflow to work, we're using [NPM Trusted Publishers](https://docs.npmjs.com/trusted-publishers).
167
193
 
168
194
  ## Development
169
195
 
@@ -2,7 +2,9 @@ import { exec } from "node:child_process";
2
2
  import fs from "node:fs/promises";
3
3
  import path from "node:path";
4
4
  import { Args, Command } from "@oclif/core";
5
+ import enquirer from "enquirer";
5
6
  import pc from "picocolors";
7
+ import { ApiClient } from "../../lib/api-client.js";
6
8
  import { loadConfig } from "../../lib/config.js";
7
9
  import { getStaticResourcePath } from "../../lib/utils.js";
8
10
  export default class Create extends Command {
@@ -29,9 +31,28 @@ export default class Create extends Command {
29
31
  }
30
32
  // Create evaluation directory
31
33
  await fs.mkdir(evaluationDir, { recursive: true });
32
- // Copy template
34
+ // Ask for the slug of the challenge to evaluate
35
+ const config = loadConfig();
36
+ const api = new ApiClient(config);
37
+ const [kradleChallenges, cloudChallenges] = await Promise.all([api.listKradleChallenges(), api.listChallenges()]);
38
+ const choices = [...kradleChallenges, ...cloudChallenges]
39
+ .map((c) => c.slug)
40
+ .toSorted()
41
+ .map((s) => ({
42
+ name: s,
43
+ message: s,
44
+ }));
45
+ const response = await enquirer.prompt({
46
+ type: "select",
47
+ name: "challenge",
48
+ message: "Select the challenge to evaluate",
49
+ choices: choices,
50
+ });
51
+ // Read template file and fill in the challenge slug, then write to config file
33
52
  const templatePath = getStaticResourcePath("evaluation_template.ts");
34
- await fs.copyFile(templatePath, configPath);
53
+ const template = await fs.readFile(templatePath, "utf-8");
54
+ const filledTemplate = template.replace("[INSERT CHALLENGE SLUG HERE]", response.challenge);
55
+ await fs.writeFile(configPath, filledTemplate);
35
56
  this.log(pc.green(`✓ Created evaluation '${args.name}'`));
36
57
  this.log(pc.dim(` Config: ${configPath}`));
37
58
  // Offer to open in editor on macOS
@@ -29,6 +29,7 @@ export declare class ApiClient {
29
29
  getHuman(): Promise<z.infer<typeof HumanSchema>>;
30
30
  listChallenges(): Promise<ChallengeSchemaType[]>;
31
31
  listKradleAgents(): Promise<AgentSchemaType[]>;
32
+ listKradleChallenges(): Promise<ChallengeSchemaType[]>;
32
33
  getChallenge(challengeId: string): Promise<ChallengeSchemaType>;
33
34
  /**
34
35
  * Check if a challenge exists in the cloud.
@@ -117,6 +117,9 @@ export class ApiClient {
117
117
  async listKradleAgents() {
118
118
  return this.listResource("humans/team-kradle/agents", "agents", AgentsResponseSchema);
119
119
  }
120
+ async listKradleChallenges() {
121
+ return this.listResource("humans/team-kradle/challenges", "challenges", ChallengesResponseSchema);
122
+ }
120
123
  async getChallenge(challengeId) {
121
124
  const url = `challenges/${challengeId}`;
122
125
  return this.get("web", url, {}, ChallengeSchema);
@@ -21,9 +21,9 @@ export declare const ChallengeSchema: z.ZodObject<{
21
21
  }>;
22
22
  }, z.core.$strip>;
23
23
  description: z.ZodOptional<z.ZodString>;
24
- task: z.ZodString;
24
+ task: z.ZodOptional<z.ZodString>;
25
25
  roles: z.ZodRecord<z.ZodString, z.ZodObject<{
26
- description: z.ZodString;
26
+ description: z.ZodOptional<z.ZodString>;
27
27
  specificTask: z.ZodString;
28
28
  minParticipants: z.ZodOptional<z.ZodNumber>;
29
29
  maxParticipants: z.ZodOptional<z.ZodNumber>;
@@ -63,9 +63,9 @@ export declare const ChallengesResponseSchema: z.ZodObject<{
63
63
  }>;
64
64
  }, z.core.$strip>;
65
65
  description: z.ZodOptional<z.ZodString>;
66
- task: z.ZodString;
66
+ task: z.ZodOptional<z.ZodString>;
67
67
  roles: z.ZodRecord<z.ZodString, z.ZodObject<{
68
- description: z.ZodString;
68
+ description: z.ZodOptional<z.ZodString>;
69
69
  specificTask: z.ZodString;
70
70
  minParticipants: z.ZodOptional<z.ZodNumber>;
71
71
  maxParticipants: z.ZodOptional<z.ZodNumber>;
@@ -12,9 +12,9 @@ export const ChallengeSchema = z.object({
12
12
  gameMode: z.enum(["survival", "creative", "adventure", "spectator"]),
13
13
  }),
14
14
  description: z.string().optional(),
15
- task: z.string(),
15
+ task: z.string().optional(),
16
16
  roles: z.record(z.string(), z.object({
17
- description: z.string(),
17
+ description: z.string().optional(),
18
18
  specificTask: z.string(),
19
19
  minParticipants: z.number().optional(),
20
20
  maxParticipants: z.number().optional(),
@@ -327,36 +327,6 @@
327
327
  "create.js"
328
328
  ]
329
329
  },
330
- "evaluation:init": {
331
- "aliases": [],
332
- "args": {
333
- "name": {
334
- "description": "Name of the evaluation",
335
- "name": "name",
336
- "required": true
337
- }
338
- },
339
- "description": "Initialize a new evaluation",
340
- "examples": [
341
- "<%= config.bin %> <%= command.id %> my-evaluation"
342
- ],
343
- "flags": {},
344
- "hasDynamicHelp": false,
345
- "hiddenAliases": [],
346
- "id": "evaluation:init",
347
- "pluginAlias": "@kradle/cli",
348
- "pluginName": "@kradle/cli",
349
- "pluginType": "core",
350
- "strict": true,
351
- "enableJsonFlag": false,
352
- "isESM": true,
353
- "relativePath": [
354
- "dist",
355
- "commands",
356
- "evaluation",
357
- "init.js"
358
- ]
359
- },
360
330
  "evaluation:list": {
361
331
  "aliases": [],
362
332
  "args": {},
@@ -431,5 +401,5 @@
431
401
  ]
432
402
  }
433
403
  },
434
- "version": "0.0.7"
404
+ "version": "0.0.8"
435
405
  }
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@kradle/cli",
3
- "version": "0.0.7",
3
+ "version": "0.0.8",
4
4
  "description": "Kradle's CLI. Manage challenges, evaluations, agents and more!",
5
5
  "keywords": [
6
6
  "cli"
@@ -1,9 +0,0 @@
1
- import { Command } from "@oclif/core";
2
- export default class Init extends Command {
3
- static description: string;
4
- static examples: string[];
5
- static args: {
6
- name: import("@oclif/core/interfaces").Arg<string, Record<string, unknown>>;
7
- };
8
- run(): Promise<void>;
9
- }
@@ -1,58 +0,0 @@
1
- import { exec } from "node:child_process";
2
- import fs from "node:fs/promises";
3
- import path from "node:path";
4
- import { Args, Command } from "@oclif/core";
5
- import pc from "picocolors";
6
- import { loadConfig } from "../../lib/config.js";
7
- import { getStaticResourcePath } from "../../lib/utils.js";
8
- export default class Init extends Command {
9
- static description = "Initialize a new evaluation";
10
- static examples = ["<%= config.bin %> <%= command.id %> my-evaluation"];
11
- static args = {
12
- name: Args.string({
13
- description: "Name of the evaluation",
14
- required: true,
15
- }),
16
- };
17
- async run() {
18
- const { args } = await this.parse(Init);
19
- loadConfig(); // Validate config is available
20
- const evaluationDir = path.resolve(process.cwd(), "evaluations", args.name);
21
- const configPath = path.join(evaluationDir, "config.ts");
22
- // Check if evaluation already exists
23
- try {
24
- await fs.access(evaluationDir);
25
- this.error(pc.red(`Evaluation '${args.name}' already exists at ${evaluationDir}`));
26
- }
27
- catch {
28
- // Directory doesn't exist, which is what we want
29
- }
30
- // Create evaluation directory
31
- await fs.mkdir(evaluationDir, { recursive: true });
32
- // Copy template
33
- const templatePath = getStaticResourcePath("evaluation_template.ts");
34
- await fs.copyFile(templatePath, configPath);
35
- this.log(pc.green(`✓ Created evaluation '${args.name}'`));
36
- this.log(pc.dim(` Config: ${configPath}`));
37
- // Offer to open in editor on macOS
38
- if (process.platform === "darwin") {
39
- this.log("");
40
- this.log(pc.blue(">> Opening config.ts in your editor..."));
41
- // Try Cursor first, then VS Code, then fallback to default
42
- exec(`cursor "${configPath}" || code "${configPath}" || open "${configPath}"`, (error) => {
43
- if (error) {
44
- this.log(pc.dim(` Could not open editor automatically. Please open: ${configPath}`));
45
- }
46
- });
47
- }
48
- else {
49
- this.log("");
50
- this.log(pc.blue(`>> Edit the config file to define your runs:`));
51
- this.log(pc.dim(` ${configPath}`));
52
- }
53
- this.log("");
54
- this.log(pc.blue(">> Next steps:"));
55
- this.log(pc.dim(` 1. Edit ${path.basename(configPath)} to define your evaluation runs, and `));
56
- this.log(pc.dim(` 2. Run: kradle evaluation run ${args.name}`));
57
- }
58
- }