npm - archal - Versions diffs - 0.9.13 → 0.9.14 - Mend

archal 0.9.13 → 0.9.14

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (6) hide show

package/dist/index.cjs +22 -6
package/package.json +1 -1
package/skills/{test → eval}/SKILL.md +3 -3
package/skills/onboard/SKILL.md +25 -9
package/skills/vitest/SKILL.md +1 -1
package/skills/audit/SKILL.md +0 -55

package/dist/index.cjs CHANGED Viewed

@@ -76567,6 +76567,19 @@ function isNoAgentModelSentinel(value) {
 // src/run/auth-seeds.ts
 init_config_merger();
+function singleQuote(arg) {
+  return `'${arg.replace(/'/g, `'\\''`)}'`;
+}
+function buildRerunCommand(scenarioArg, opts) {
+  if (opts.task) {
+    const parts = ["archal", "run", "--task", singleQuote(opts.task)];
+    for (const t of opts.twin ?? []) {
+      parts.push("--twin", singleQuote(t));
+    }
+    return parts.join(" ");
+  }
+  return `archal run ${singleQuote(scenarioArg)}`;
+}
 async function resolveHostedScenarioPath(scenarioArg) {
   const credentials = getCredentials2();
   if (!credentials) {
@@ -76633,7 +76646,7 @@ async function resolveCredentialsAndEntitlements(scenarioArg, scenario, opts) {
   if (!opts.preflightOnly) {
     const required2 = requireAuth({
       action: "run a scenario",
-      nextCommand: `archal run ${scenarioArg}`
+      nextCommand: buildRerunCommand(scenarioArg, opts)
     });
     credentials = required2 ?? getCredentials2();
     if (!credentials) {
@@ -83480,6 +83493,13 @@ async function resolveRunCommandScenarios(scenarioArg, opts, command) {
     info('Generated inline scenario for task: "' + opts.task + '"');
   }
   if (scenariosToRun.length === 0) {
+    if (archalFile) {
+      const configHasTwins = (archalFile.config.twins?.length ?? 0) > 0 || Object.keys(archalFile.config.seeds ?? {}).length > 0;
+      const taskHint = configHasTwins ? '  Or run an inline task:                 archal run --task "Create an issue"' : '  Or run an inline task with a twin:     archal run --task "Create an issue" --twin github';
+      throw new CliUsageError(
+        'Found .archal.json but no scenarios to run.\n  Add scenarios:                         { "scenarios": ["scenarios/foo.md"] }\n  Or pass a scenario directly:           archal run scenario.md\n' + taskHint
+      );
+    }
     throw new CliUsageError(
       'No .archal.json config found and no scenario specified.\n  Create .archal.json with your twins:    { "twins": ["github"] }\n  Or pass a scenario directly:            archal run scenario.md\n  Or run an inline task with a twin:      archal run --task "Create an issue" --twin github'
     );
@@ -87167,9 +87187,7 @@ ${GREEN5}${BOLD7}Archal skills installed${RESET8} ${DIM8}(v${version3})${RESET8}
 `);
   }
   log3(`
-${DIM8}Skills: ${skillDirs.join(", ")}${RESET8}
-`);
-  log3(`${DIM8}Try: "/archal-onboard" or "/archal-test"${RESET8}
+${DIM8}Next: /archal-onboard${RESET8}
 `);
   return {
@@ -87373,8 +87391,6 @@ function createInitCommand() {
 `);
           } else {
             const pm = detectPackageManager2(cwd);
-            process.stdout.write(`Installing archal@${CLI_VERSION} with ${pm}\u2026
-`);
             runPmAdd(pm, cwd, `archal@${CLI_VERSION}`);
           }
         } else {

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "archal",
-  "version": "0.9.13",
+  "version": "0.9.14",
   "description": "Test your agents & integrations against digital twins",
   "type": "module",
   "main": "dist/index.cjs",

package/skills/{test → eval}/SKILL.md RENAMED Viewed

@@ -1,11 +1,11 @@
 ---
-name: test
-description: Run Archal scenarios or inline tasks against hosted twins, diagnose failed runs, and interpret satisfaction scores. Triggers on "run my scenario", "test my agent", "archal run X", "debug this failing run", "what does this satisfaction score mean".
+name: eval
+description: Run Archal scenarios or inline tasks against hosted twins, diagnose failed runs, and interpret satisfaction scores. Triggers on "run my scenario", "evaluate my agent", "archal run X", "debug this failing run", "what does this satisfaction score mean".
 user-invocable: true
 argument-hint: "[scenario.md or task description]"
 ---
-# Archal Test Runner
+# Archal Eval Runner
 You run Archal scenarios and inline tasks, then help the user interpret the results. For setting up the agent path or `.archal.json` in a fresh repo, hand off to the `onboard` skill.

package/skills/onboard/SKILL.md CHANGED Viewed

@@ -60,27 +60,43 @@ npx archal init --skills-only  # re-stage skills if they drifted
 Confirm detected twins, then ask which of these the user wants. Each delegates to a sub-skill where appropriate — don't inline those flows.
-### Option A — Test an agent with scenarios
+### The `agent` command (Options A and B both need this)
+`archal run` spawns the agent as a child process, headlessly — no UI, no browser auth. The `agent` field in `.archal.json` is the shell command that invokes it. Typical shapes:
+- `"agent": "npx tsx ./.archal/harness.ts"` — custom TS entrypoint, most common
+- `"agent": "node ./agent.js"` — plain Node script
+- `"agent": "python agent.py"` — Python agent
+If the user doesn't have a harness yet, scaffold one at `./.archal/harness.ts` that reads `ARCHAL_ENGINE_TASK` from env and calls their agent's runtime. Alternative: skip `agent` in `.archal.json` and pass `--harness <path>` per-run.
+### Option A — Evaluate an agent with scenarios
 Write markdown scenario files that describe setup, prompt, and success criteria; `archal run` executes them against twins.
 1. Create `.archal.json`:
    ```json
-   { "agent": "<agent command>", "twins": ["<detected twins>"] }
+   {
+     "agent": "npx tsx ./.archal/harness.ts",
+     "twins": ["<detected twins>"]
+   }
    ```
 2. **Delegate to the `scenario` skill** to author a starter scenario. Don't paste a canned example here — the skill knows the markdown format and success-criteria syntax.
-3. Run: `archal run scenarios/<first>.md`.
+3. Run: `archal run scenarios/<first>.md`. **Hand off to the `eval` skill** for result interpretation and failure diagnosis.
 ### Option B — Run quick inline tasks
-1. `.archal.json` with just twins:
+Same `.archal.json` as Option A (inline `--task` still needs an agent). Use this when the user wants ad-hoc runs before committing to scenario files.
+1. `.archal.json`:
    ```json
-   { "twins": ["<detected twins>"] }
+   {
+     "agent": "npx tsx ./.archal/harness.ts",
+     "twins": ["<detected twins>"]
+   }
    ```
 2. Demo: `archal run --task "Create an issue titled hello" --twin github`.
-No sub-skill needed — this is a one-shot.
 ### Option C — Twins in a Vitest suite
 **Delegate to the `vitest` skill.** It handles reading the existing vitest config, identifying which tests should route, picking the right composition pattern, and seeding the twins.
@@ -89,11 +105,11 @@ Do not paste a sample config here. The right shape depends on what's already in
 ### Option D — Persistent twins to develop against
-Run: `archal twin start <detected twins>` — gives live twin URLs the user's SDK clients can point at.
+Run: `archal twin start <detected twins>` — gives live twin URLs the user's SDK clients can point at. `archal twin status` shows the active session; `archal twin stop` tears down.
 ## Verify
-Run the first test or task and show the result.
+Run the first scenario or task. For Options A and B, hand off to the `eval` skill to interpret the satisfaction score and diagnose failures — that skill owns the runtime mental model (`[D]` vs `[P]` criteria, trace inspection, harness preflight).
 ## `.archal.json` schema

package/skills/vitest/SKILL.md CHANGED Viewed

@@ -17,7 +17,7 @@ Claude already knows what Vitest is and how a fetch interceptor works. These are
 - Twins are hosted on **ECS Fargate** in Archal's AWS. First run = ~30s cold start. Subsequent runs within the 30-min idle TTL = ~2s. Tell the user; they'll think it's hung otherwise.
 - Session cache key = `(projectName, services, seeds)` hash. Change any of those and the cache misses.
 - **Seeds = starting state.** Omit to get the twin's default. Named seeds give fixtures (e.g. `small-project` for GitHub, `small-business` for Stripe). Never ask "what seed?" open-ended — the user doesn't know the catalog.
-- Route-mode twins available: `github`, `slack`, `stripe`, `jira`, `supabase`, `google-workspace`. Not yet: `linear`, `ramp`.
+- Route-mode twins available: `discord`, `github`, `google-workspace`, `jira`, `linear`, `ramp`, `slack`, `stripe`, `supabase`. Not yet: `telegram`. (Source of truth: `SHARED_ROUTE_MANIFESTS` in `packages/route-runtime-core/src/manifests.ts` — don't invent services that aren't in that array.)
 ## Discover before you ask

package/skills/audit/SKILL.md DELETED Viewed

@@ -1,55 +0,0 @@
----
-name: audit
-description: Audit an Archal repository thoroughly. Trace real execution paths, identify concrete bugs and design flaws, distinguish root-cause fixes from architecture problems, and add regression tests for every confirmed issue.
-user-invocable: true
-argument-hint: "[repo path or scope]"
----
-# Archal Repository Audit
-Use this skill when the goal is to inspect an Archal repository deeply, find problems worth fixing, and avoid shallow or local-only patches.
-## Audit standard
-- Trace real execution paths from entrypoints before proposing fixes.
-- Prefer root-cause fixes over guards, silencing, or narrow special cases.
-- If the real problem is architectural, report it instead of applying a monkey patch.
-- For every confirmed bug you fix, add the narrowest regression test that would have caught it earlier.
-- Always include at least one regression test that covers a stale-data row or pre-migration row when the touched path has compatibility logic.
-## Working pattern
-1. Map the hot paths first.
-   - Identify the actual entrypoints: CLI commands, web routes, background jobs, and core runtime/session flows.
-   - Ignore dead-looking surfaces until the primary paths are understood.
-2. Read the execution path end to end.
-   - Follow inputs through parsing, validation, persistence, normalization, and response shaping.
-   - Inspect nearby invariants and adjacent edge cases before deciding on a fix.
-3. Separate findings into two buckets.
-   - **Fix now**: clear bug, contained scope, root cause understood, regression test is obvious.
-   - **Escalate**: the defect comes from a bad abstraction or architectural boundary and a local patch would hide the real problem.
-4. Validate narrowly, then broadly.
-   - Run the smallest meaningful tests for the changed path first.
-   - If code changed, also run the relevant package build/typecheck before concluding.
-## What to look for
-- Compatibility shims that silently drop data from old rows or partially migrated schemas
-- Session lifecycle bugs around start, ready, teardown, stale state, and idempotency
-- Projection code that derives canonical state from stale denormalized fields
-- Fallback behavior that changes semantics instead of preserving them
-- Query builders that filter on derived fields inconsistently across list/count paths
-- Evidence, trace, or normalization code that double-counts, hides, or misattributes records
-## Output format
-For each finding, report:
-- Problem
-- Technical cause
-- Simple explanation
-- Optimal fix
-- Why that fix is better than narrower alternatives
-- Regression test to add
-If no actionable problems are found in a slice, say that explicitly and note any remaining coverage gaps.