npm - @kradle/cli - Versions diffs - 0.0.7 → 0.0.9 - Mend

@kradle/cli 0.0.7 → 0.0.9

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (11) hide show

package/README.md +66 -40
package/dist/commands/evaluation/create.js +23 -2
package/dist/lib/api-client.d.ts +1 -0
package/dist/lib/api-client.js +3 -0
package/dist/lib/evaluation/evaluator.js +1 -1
package/dist/lib/schemas.d.ts +6 -6
package/dist/lib/schemas.js +3 -3
package/oclif.manifest.json +1 -31
package/package.json +1 -1
package/dist/commands/evaluation/init.d.ts +0 -9
package/dist/commands/evaluation/init.js +0 -58

package/README.md CHANGED Viewed

@@ -2,19 +2,29 @@
 Kradle's CLI for managing Minecraft challenges, evaluations, agents, and more!
+* [Installation](#installation)
+* [Autocomplete](#autocomplete)
+* [Configuration](#configuration)
+* [Challenge](#challenge-commands)
+* [Evaluations](#evaluation-commands)
+* [Publishing a New Version](#publishing-a-new-version)
+* [Development](#development)
+* [Architecture](#architecture)
 ## Installation
 1. Install Kradle's CLI globally
 ```
 npm i -g @kradle/cli
 ```
-2. Initialize a new project
+2. Initialize a new directory to store challenges and evaluations
 ```
 kradle init
 ```
-3. Congrats 🎉 You can now create a new challenge:
+3. Congrats 🎉 You can now create a new challenge or a new evaluation:
 ```
 kradle challenge create <challenge-name>
+kradle evaluation create <evaluation-name>
 ```
 In addition, you can enable [autocomplete](#Autocomplete).
@@ -35,19 +45,6 @@ After setup, you will be able to use Tab to autocomplete:
 kradle challenge <TAB>        # Shows: build, create, list, run, upload, watch, etc.
 ```
-## Configuration
-The `.env` should have the following variables:
-```env
-WEB_API_URL=https://api.kradle.ai
-WEB_URL=https://kradle.ai
-STUDIO_API_URL=http://localhost:8080
-STUDIO_URL=kradle-studio://
-KRADLE_API_KEY=your-api-key
-KRADLE_CHALLENGES_PATH=~/Documents/kradle-studio/challenges
-```
 ## Challenge Commands
 ### Create Challenge
@@ -125,28 +122,60 @@ kradle challenge multi-upload
 Provides an interactive UI to select multiple challenges and uploads them in parallel.
-## Evaluations commands
-Plan and execute batches of runs across challenges/agents, with resumable iterations and a TUI.
-- **Init**: scaffold an evaluation config `evaluations/<name>/config.ts`
-  ```bash
-  kradle evaluation init <name>
-  ```
-- **List**: list local evaluations
-  ```bash
-  kradle evaluation list
-  ```
-- **Run**: execute or resume an evaluation (iterations stored under `evaluations/<name>/iterations/`)
-  ```bash
-  kradle evaluation run <name> [--new] [--max-concurrent N]
-  ```
-Features:
-- Iterations: `--new` starts a new iteration; otherwise resumes the latest.
-- Resumable state: progress is persisted per iteration; in-flight runs are re-polled on resume, completed runs stay completed.
-- Ink TUI: live status counts, elapsed times, scrollable run list; keys `q/Ctrl+C` quit, `↑/↓/j/k` move, `o` open run URL.
-- Per-iteration manifest: generated from the evaluation `config.ts` into `manifest.json` before runs start.
+## Evaluation Commands
+Evaluations allow you to run batches of challenge runs with different agents and configurations, then analyze the results. This is useful for benchmarking agents, testing challenge difficulty, or gathering statistics across many runs.
+### Concepts
+**Evaluation**: A named collection of run configurations defined in a `config.ts` file. Each evaluation lives in `evaluations/<name>/`.
+**Iteration**: A snapshot of an evaluation execution. When you run an evaluation, it creates an iteration containing:
+- A copy of the `config.ts` at that point in time
+- A `manifest.json` with the generated list of runs
+- A `progress.json` tracking the status of each run
+Iterations are stored in `evaluations/<name>/iterations/001/`, `002/`, etc. This allows you to:
+- Resume an interrupted evaluation from where it left off
+- Re-run the same evaluation with `--new` to create a fresh iteration
+- Compare results across different iterations
+### Create Evaluation
+Create a new evaluation with a template config file:
+```bash
+kradle evaluation create <name>
+```
+This creates `evaluations/<name>/config.ts` with a template that you can customize. The config exports a `main()` function that returns a manifest with:
+- `runs`: Array of run configurations (challenge + participants)
+- `tags`: Optional tags applied to all runs for filtering in analytics
+### Run Evaluation
+Execute or resume an evaluation:
+```bash
+kradle evaluation run <name>                  # Resume current iteration or create first one
+kradle evaluation run <name> --new            # Start a new iteration
+kradle evaluation run <name> --max-concurrent 10  # Control parallelism (default: 5)
+```
+The run command:
+1. Creates a new iteration (or resumes the current one)
+2. Generates a manifest by executing `config.ts`
+3. Displays an interactive TUI showing run progress
+4. Saves progress periodically (allows resuming if interrupted)
+5. Opens Metabase dashboard with results when complete
+### List Evaluations
+List all local evaluations:
+```bash
+kradle evaluation list
+```
 ## Publishing a New Version
@@ -161,9 +190,6 @@ The CLI uses GitHub Actions for automated releases. To publish a new version:
 4. **Review and merge** the automatically created PR
 5. **Done!** The package is automatically published to npm when the PR is merged
-### Setup (one-time)
-For the publish workflow to work, we're using [NPM Trusted Publishers](https://docs.npmjs.com/trusted-publishers).
 ## Development

package/dist/commands/evaluation/create.js CHANGED Viewed

@@ -2,7 +2,9 @@ import { exec } from "node:child_process";
 import fs from "node:fs/promises";
 import path from "node:path";
 import { Args, Command } from "@oclif/core";
+import enquirer from "enquirer";
 import pc from "picocolors";
+import { ApiClient } from "../../lib/api-client.js";
 import { loadConfig } from "../../lib/config.js";
 import { getStaticResourcePath } from "../../lib/utils.js";
 export default class Create extends Command {
@@ -29,9 +31,28 @@ export default class Create extends Command {
         }
         // Create evaluation directory
         await fs.mkdir(evaluationDir, { recursive: true });
-        // Copy template
+        // Ask for the slug of the challenge to evaluate
+        const config = loadConfig();
+        const api = new ApiClient(config);
+        const [kradleChallenges, cloudChallenges] = await Promise.all([api.listKradleChallenges(), api.listChallenges()]);
+        const choices = [...kradleChallenges, ...cloudChallenges]
+            .map((c) => c.slug)
+            .toSorted()
+            .map((s) => ({
+            name: s,
+            message: s,
+        }));
+        const response = await enquirer.prompt({
+            type: "select",
+            name: "challenge",
+            message: "Select the challenge to evaluate",
+            choices: choices,
+        });
+        // Read template file and fill in the challenge slug, then write to config file
         const templatePath = getStaticResourcePath("evaluation_template.ts");
-        await fs.copyFile(templatePath, configPath);
+        const template = await fs.readFile(templatePath, "utf-8");
+        const filledTemplate = template.replace("[INSERT CHALLENGE SLUG HERE]", response.challenge);
+        await fs.writeFile(configPath, filledTemplate);
         this.log(pc.green(`✓ Created evaluation '${args.name}'`));
         this.log(pc.dim(`  Config: ${configPath}`));
         // Offer to open in editor on macOS

package/dist/lib/api-client.d.ts CHANGED Viewed

@@ -29,6 +29,7 @@ export declare class ApiClient {
     getHuman(): Promise<z.infer<typeof HumanSchema>>;
     listChallenges(): Promise<ChallengeSchemaType[]>;
     listKradleAgents(): Promise<AgentSchemaType[]>;
+    listKradleChallenges(): Promise<ChallengeSchemaType[]>;
     getChallenge(challengeId: string): Promise<ChallengeSchemaType>;
     /**
      * Check if a challenge exists in the cloud.

package/dist/lib/api-client.js CHANGED Viewed

@@ -117,6 +117,9 @@ export class ApiClient {
     async listKradleAgents() {
         return this.listResource("humans/team-kradle/agents", "agents", AgentsResponseSchema);
     }
+    async listKradleChallenges() {
+        return this.listResource("humans/team-kradle/challenges", "challenges", ChallengesResponseSchema);
+    }
     async getChallenge(challengeId) {
         const url = `challenges/${challengeId}`;
         return this.get("web", url, {}, ChallengeSchema);

package/dist/lib/evaluation/evaluator.js CHANGED Viewed

@@ -234,7 +234,7 @@ export class Evaluator {
             throw new Error(`${errors.map((error) => error.error).join("\n\n")}`);
         }
         if (options.openMetabase ?? true) {
-            openInBrowser(`https://daunt-fair.metabaseapp.com/dashboard/10-runs-analysis&tags=${iterationTag}`);
+            openInBrowser(`https://daunt-fair.metabaseapp.com/dashboard/10-runs-analysis?tags=${iterationTag}`);
         }
     }
     /**

package/dist/lib/schemas.d.ts CHANGED Viewed

@@ -21,10 +21,10 @@ export declare const ChallengeSchema: z.ZodObject<{
         }>;
     }, z.core.$strip>;
     description: z.ZodOptional<z.ZodString>;
-    task: z.ZodString;
+    task: z.ZodOptional<z.ZodString>;
     roles: z.ZodRecord<z.ZodString, z.ZodObject<{
-        description: z.ZodString;
-        specificTask: z.ZodString;
+        description: z.ZodOptional<z.ZodString>;
+        specificTask: z.ZodOptional<z.ZodString>;
         minParticipants: z.ZodOptional<z.ZodNumber>;
         maxParticipants: z.ZodOptional<z.ZodNumber>;
     }, z.core.$strip>>;
@@ -63,10 +63,10 @@ export declare const ChallengesResponseSchema: z.ZodObject<{
             }>;
         }, z.core.$strip>;
         description: z.ZodOptional<z.ZodString>;
-        task: z.ZodString;
+        task: z.ZodOptional<z.ZodString>;
         roles: z.ZodRecord<z.ZodString, z.ZodObject<{
-            description: z.ZodString;
-            specificTask: z.ZodString;
+            description: z.ZodOptional<z.ZodString>;
+            specificTask: z.ZodOptional<z.ZodString>;
             minParticipants: z.ZodOptional<z.ZodNumber>;
             maxParticipants: z.ZodOptional<z.ZodNumber>;
         }, z.core.$strip>>;

package/dist/lib/schemas.js CHANGED Viewed

@@ -12,10 +12,10 @@ export const ChallengeSchema = z.object({
         gameMode: z.enum(["survival", "creative", "adventure", "spectator"]),
     }),
     description: z.string().optional(),
-    task: z.string(),
+    task: z.string().optional(),
     roles: z.record(z.string(), z.object({
-        description: z.string(),
-        specificTask: z.string(),
+        description: z.string().optional(),
+        specificTask: z.string().optional(),
         minParticipants: z.number().optional(),
         maxParticipants: z.number().optional(),
     })),

package/oclif.manifest.json CHANGED Viewed

@@ -327,36 +327,6 @@
         "create.js"
       ]
     },
-    "evaluation:init": {
-      "aliases": [],
-      "args": {
-        "name": {
-          "description": "Name of the evaluation",
-          "name": "name",
-          "required": true
-        }
-      },
-      "description": "Initialize a new evaluation",
-      "examples": [
-        "<%= config.bin %> <%= command.id %> my-evaluation"
-      ],
-      "flags": {},
-      "hasDynamicHelp": false,
-      "hiddenAliases": [],
-      "id": "evaluation:init",
-      "pluginAlias": "@kradle/cli",
-      "pluginName": "@kradle/cli",
-      "pluginType": "core",
-      "strict": true,
-      "enableJsonFlag": false,
-      "isESM": true,
-      "relativePath": [
-        "dist",
-        "commands",
-        "evaluation",
-        "init.js"
-      ]
-    },
     "evaluation:list": {
       "aliases": [],
       "args": {},
@@ -431,5 +401,5 @@
       ]
     }
   },
-  "version": "0.0.7"
+  "version": "0.0.9"
 }

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
 	"name": "@kradle/cli",
-	"version": "0.0.7",
+	"version": "0.0.9",
 	"description": "Kradle's CLI. Manage challenges, evaluations, agents and more!",
 	"keywords": [
 		"cli"

package/dist/commands/evaluation/init.d.ts DELETED Viewed

@@ -1,9 +0,0 @@
-import { Command } from "@oclif/core";
-export default class Init extends Command {
-    static description: string;
-    static examples: string[];
-    static args: {
-        name: import("@oclif/core/interfaces").Arg<string, Record<string, unknown>>;
-    };
-    run(): Promise<void>;
-}

package/dist/commands/evaluation/init.js DELETED Viewed

@@ -1,58 +0,0 @@
-import { exec } from "node:child_process";
-import fs from "node:fs/promises";
-import path from "node:path";
-import { Args, Command } from "@oclif/core";
-import pc from "picocolors";
-import { loadConfig } from "../../lib/config.js";
-import { getStaticResourcePath } from "../../lib/utils.js";
-export default class Init extends Command {
-    static description = "Initialize a new evaluation";
-    static examples = ["<%= config.bin %> <%= command.id %> my-evaluation"];
-    static args = {
-        name: Args.string({
-            description: "Name of the evaluation",
-            required: true,
-        }),
-    };
-    async run() {
-        const { args } = await this.parse(Init);
-        loadConfig(); // Validate config is available
-        const evaluationDir = path.resolve(process.cwd(), "evaluations", args.name);
-        const configPath = path.join(evaluationDir, "config.ts");
-        // Check if evaluation already exists
-        try {
-            await fs.access(evaluationDir);
-            this.error(pc.red(`Evaluation '${args.name}' already exists at ${evaluationDir}`));
-        }
-        catch {
-            // Directory doesn't exist, which is what we want
-        }
-        // Create evaluation directory
-        await fs.mkdir(evaluationDir, { recursive: true });
-        // Copy template
-        const templatePath = getStaticResourcePath("evaluation_template.ts");
-        await fs.copyFile(templatePath, configPath);
-        this.log(pc.green(`✓ Created evaluation '${args.name}'`));
-        this.log(pc.dim(`  Config: ${configPath}`));
-        // Offer to open in editor on macOS
-        if (process.platform === "darwin") {
-            this.log("");
-            this.log(pc.blue(">> Opening config.ts in your editor..."));
-            // Try Cursor first, then VS Code, then fallback to default
-            exec(`cursor "${configPath}" || code "${configPath}" || open "${configPath}"`, (error) => {
-                if (error) {
-                    this.log(pc.dim(`  Could not open editor automatically. Please open: ${configPath}`));
-                }
-            });
-        }
-        else {
-            this.log("");
-            this.log(pc.blue(`>> Edit the config file to define your runs:`));
-            this.log(pc.dim(`   ${configPath}`));
-        }
-        this.log("");
-        this.log(pc.blue(">> Next steps:"));
-        this.log(pc.dim(`   1. Edit ${path.basename(configPath)} to define your evaluation runs, and `));
-        this.log(pc.dim(`   2. Run: kradle evaluation run ${args.name}`));
-    }
-}