npm - @codemation/agent-skills - Versions diffs - 0.1.9 → 0.1.10 - Mend

@codemation/agent-skills 0.1.9 → 0.1.10

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (11) hide show

package/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,15 @@
 # @codemation/agent-skills
+## 0.1.10
+### Patch Changes
+- [#114](https://github.com/MadeRelevant/codemation/pull/114) [`ec985a3`](https://github.com/MadeRelevant/codemation/commit/ec985a3264696b421e8be7c84c7cead6a85cbe6c) Thanks [@cblokland90](https://github.com/cblokland90)! - Fix `pnpm create codemation <name>` failing with `ENOENT … node_modules/agent-skills/skills` when dlx'd from npm.
+  `@codemation/agent-skills`'s `exports` field only declared `.`, so `require.resolve("@codemation/agent-skills/package.json")` was blocked by Node's exports gate. `create-codemation`'s resolver fell back to a workspace-only relative path that doesn't exist outside the monorepo. Adds `./package.json` and `./skills/*` to the exports map so subpath access works for consumers — and bumps `create-codemation` patch so the next release pins the fixed agent-skills version.
+- [#110](https://github.com/MadeRelevant/codemation/pull/110) [`4902978`](https://github.com/MadeRelevant/codemation/commit/49029782243ece59ab6aa5bb46396db445cad47c) Thanks [@cblokland90](https://github.com/cblokland90)! - Add per-package `test:unit` scripts so Turbo can address each package individually for affected-only filtering. No runtime changes — dev-tooling only.
 ## 0.1.9
 ### Patch Changes

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@codemation/agent-skills",
-  "version": "0.1.9",
+  "version": "0.1.10",
   "description": "Reusable agent skills for Codemation projects and plugin development.",
   "publishConfig": {
     "access": "public"
@@ -19,7 +19,9 @@
       "types": "./lib/agent-skills-extractor.d.ts",
       "import": "./lib/agent-skills-extractor.mjs",
       "default": "./lib/agent-skills-extractor.mjs"
-    }
+    },
+    "./package.json": "./package.json",
+    "./skills/*": "./skills/*"
   },
   "bin": {
     "codemation-agent-skills": "./bin/codemation-agent-skills.mjs"
@@ -46,6 +48,7 @@
   ],
   "scripts": {
     "changeset:verify": "pnpm --workspace-root run changeset:verify",
-    "test": "vitest run"
+    "test": "vitest run",
+    "test:unit": "vitest run"
   }
 }

package/skills/codemation-credential-development/SKILL.md CHANGED Viewed

@@ -12,6 +12,12 @@ Use this skill for defining new credential types, wiring them into apps or plugi
 Do not use this skill for general workflow authoring unless credential slots or runtime sessions are the core problem.
+## Credential binding stability
+Credentials bind to a node via `(workflowId, nodeId, slotKey)`. The `nodeId` defaults to a slug of the node's `name` label (lowercase, non-alphanumeric runs replaced with `-`). Renaming a credential-using node's label silently changes its id and the binding appears unbound in the UI — the operator must re-attach manually.
+To prevent this: either keep the node's label stable across edits, or set an explicit `id:` on the node config so the id is decoupled from the label.
 ## Core mental model
 1. A credential type defines public config, secret material, session creation, and health testing.

package/skills/codemation-credential-development/references/credential-patterns.md CHANGED Viewed

@@ -1,5 +1,20 @@
 # Credential Patterns
+## Node id and binding stability
+A credential binding is stored as `(workflowId, nodeId, slotKey)`. The `nodeId` for each workflow node defaults to a slug of its `name` label. Changing the label changes the id, and the previously configured binding appears unbound.
+For production workflows with credential-using nodes, prefer an explicit `id:` on the node config:
+```ts
+.node("Fetch from API", MyApiNodeConfig, {
+  id: "fetch-from-api", // stable across label renames
+  credentials: { apiKey: myApiCredential },
+})
+```
+Without an explicit `id:`, keep the node's label constant or plan to re-bind after a rename.
 ## Standard shape
 Use `defineCredential(...)` to declare:
@@ -40,6 +55,34 @@ Optional or power-user fields (for example custom OAuth scopes) can be tucked be
 See **`packages/core/docs/credential-ui-fields.md`** in the repository root layout.
+## OAuth2 credentials (URL-template variant)
+For credentials that go through the OAuth2 redirect flow (Microsoft Graph, Slack, GitHub, Notion, etc.), declare the authorize and token URLs directly on the credential's `auth` definition. The host's `OAuth2ProviderRegistry` substitutes `{publicFieldKey}` placeholders from the credential's public config at connect time (URL-encoded).
+```ts
+auth: {
+  kind: "oauth2",
+  // providerId is a free-form label for telemetry / DB rows / Better Auth provider naming.
+  // It is NOT used for any registry lookup — URLs come from the fields below.
+  providerId: "microsoft",
+  authorizeUrl: "https://login.microsoftonline.com/{tenantId}/oauth2/v2.0/authorize",
+  tokenUrl: "https://login.microsoftonline.com/{tenantId}/oauth2/v2.0/token",
+  scopes: ["openid", "offline_access", "User.Read", "Mail.Read"],
+},
+```
+Three `auth` variants exist:
+1. **URL-template (preferred for new plugins).** Carries `authorizeUrl` / `tokenUrl` / optional `userInfoUrl` directly with `{fieldKey}` substitution. Self-contained — adding a new provider needs no core or host edits.
+2. **Built-in `providerId` shortcut.** Only `google` is recognized; kept for backwards compatibility. Do not add new providers here.
+3. **`providerFromPublicConfig`.** URLs read verbatim from public field values at runtime. Rare; the template variant covers almost every real case more ergonomically.
+Notes for plugin authors:
+- Host stores post-callback OAuth material with snake_case keys (`access_token`, `refresh_token`, `expiry`, `scope`, `token_type`). Read those keys inside `createSession` / `test`, NOT camelCase.
+- The redirect URI returned to providers rewrites loopback IPs (`127.0.0.1`, `[::1]`) to `localhost` so Azure AD (AADSTS50011) and other providers with the same restriction accept it.
+- The default `Mail.Read` (and similar single-mailbox) Microsoft scopes only cover the credential owner. To monitor a shared mailbox via `/users/{upn}/...`, request `Mail.Read.Shared` (delegated) or admin-consented application permissions.
 ## Health and activation
 - deploy the workflow and credential type

package/skills/codemation-custom-node-development/SKILL.md CHANGED Viewed

@@ -27,12 +27,20 @@ Do not use this skill for pure workflow chaining questions unless the node imple
 2. Keep nodes deterministic and focused.
 3. Request credentials through named slots instead of hard-coded secrets.
 4. Put **static** options (credentials, retry policy, labels) on **config**; put **per-item** behavior in **inputs** / wire JSON and optional **`itemExpr`** on config fields (consistent with built-in nodes).
-5. Drop to class-based node APIs only when you need constructor-injected collaborators, decorators, or deeper runtime metadata.
+5. **Emit files with `ctx.binary`, not base64 in `json`:** use **`attach`** + **`withAttachment`** on **`args.ctx.binary`** (`defineNode`) or **`ctx.binary`** (class nodes). Base64 in **`item.json`** bloats persisted run JSON in the database; binaries use **storage + references** only. See `references/node-patterns.md` and repo docs **Concepts → Execution model** / **Custom nodes**.
+6. Drop to class-based node APIs only when you need constructor-injected collaborators, decorators, or deeper runtime metadata.
 ## Testing with `WorkflowTestKit`
 For engine-backed tests without the host, use **`WorkflowTestKit`** from **`@codemation/core/testing`**: **`registerDefinedNodes([...])`**, then **`runNode`** or **`run`**. See the plugin development doc and `@codemation/core` tests for examples.
+## Custom assertion + test nodes
+When building **assertion** nodes that should record results into the framework's TestSuiteRun infrastructure, set **`emitsAssertions: true`** on the node config. The host's `TestSuiteRunTracker` listens for `nodeCompleted` events from runs with `ctx.testContext` set and persists each emitted item (matching the `AssertionResult` shape) as a `TestAssertion` row. Drop in a `defineNode` with a per-item `execute` that returns `AssertionResult[]` and you're done — no service injection required.
+Custom **per-item nodes** can also read **`ctx.testContext?.{testSuiteRunId, testCaseIndex}`** to branch on test mode without an `IsTestRun` upstream — useful for synthetic outputs or skipping irreversible side effects when running tests.
 ## Read next when needed
 - Read `references/node-patterns.md` for `defineNode(...)` patterns and packaging guidance.
+- Use the `codemation-workflow-dsl` skill's `references/workflow-testing.md` for the full TestTrigger / IsTestRun / Assertion authoring story.

package/skills/codemation-custom-node-development/references/node-patterns.md CHANGED Viewed

@@ -55,3 +55,26 @@ Reach for class-based node APIs when:
 - **`defineNode`** runs **`execute` once per item** (with optional **`inputSchema`** and **`itemExpr`** on config fields before **`execute`**)
 - **`defineBatchNode`** runs **`run`** once per activation batch
 - keep nodes deterministic and testable; prefer real code paths or in-memory collaborators over heavy mocking
+## Emitting items, fan-out, and binaries (for AI codegen)
+**Return shapes**
+- Return **plain JSON** → one output item with that **`json`** (unless the value is a **top-level array**, which **fans out** to one item per element).
+- Return **`emitPorts({ portName: [...] })`** for multi-port routing.
+- Return an **item-shaped** `{ json, binary?, meta?, paired? }` when you need explicit **`binary`** / **`meta`** / **`paired`** control.
+**Never put bulk file content in `item.json`**
+- Fields like `contentBase64`, `data`, or multi-megabyte strings are stored **inside persisted run / step JSON** in the database. That **scales poorly** (base64 is larger than raw bytes) and hurts snapshots and tooling.
+- **Correct:** `const attachment = await args.ctx.binary.attach({ name: "file", body: bytesOrStream, mimeType, filename })` then `return args.ctx.binary.withAttachment({ json: { ok: true } }, "file", attachment)` (or build `{ json, binary }` by hand).
+- **`body`** types match **`BinaryBody`**: `Uint8Array`, `ArrayBuffer`, `ReadableStream`, or async iterable of chunks (same idea as **`HttpRequest`** downloading a body).
+- **`keepBinaries: true`** only **preserves existing** **`item.binary`** through a plain JSON return; it does **not** convert base64 strings in **`json`** into attachments.
+**Triggers**
+- Emit **one `Item` per external record**; use **`item.binary`** per record for files—not one item whose **`json`** contains an array of embedded files.
+- For polling triggers that fetch records carrying file payloads (mail attachments, message media, etc.), do this in two phases:
+  1. In `runCycle` (the polling step), fetch only the **metadata** (id, name, contentType, size). The result is persisted into the trigger's setup state and into emitted item JSON, so it must stay small.
+  2. In `execute(items, ctx)`, when the cfg opts into downloads, fetch each blob's bytes from the source API and register them via `ctx.binary.attach(...)`. Then return items via `ctx.binary.withAttachment(item, slot, stored)`.
+- **Do not** request the full payload in the polling fetch (e.g. Microsoft Graph `$expand=attachments` returns base64 `contentBytes` inline; use `$expand=attachments($select=id,name,contentType,size)` to keep the response light). Large polling responses bloat the run state on every cycle, even when no item is emitted.

package/skills/codemation-framework-concepts/SKILL.md CHANGED Viewed

@@ -30,6 +30,7 @@ Do not use this skill as a substitute for detailed CLI, workflow DSL, or plugin
 - activation is framework-managed and happens in the UI
 - telemetry is observability-first: traces, spans, artifacts, and metric points are framework-owned runtime data
 - run retention and telemetry retention can differ, so trend data can outlive raw run state
+- **workflow testing** is a first-class primitive: a `TestTrigger` node yields one item per test case, the orchestrator dispatches a workflow run per case with `executionOptions.testContext` set, and `Assertion` nodes (`emitsAssertions: true`) record per-run results into `TestAssertion` rows; the canvas exposes a Tests tab parallel to Live and Executions
 ## Runtime rule of thumb
@@ -41,3 +42,4 @@ Do not use this skill as a substitute for detailed CLI, workflow DSL, or plugin
 ## Read next when needed
 - Read `references/architecture-map.md` for package ownership and runtime-mode guidance.
+- Use the `codemation-workflow-dsl` skill (and its `references/workflow-testing.md`) for hands-on test authoring with TestTrigger / IsTestRun / Assertion.

package/skills/codemation-plugin-development/references/plugin-structure.md CHANGED Viewed

@@ -33,6 +33,37 @@ That file is the plugin repository's source composition root. Consumers should d
 - start with `defineCredential(...)`
 - build typed sessions in `createSession(...)`
 - implement `test(...)` so operators can validate configuration before activation
+- for OAuth2 redirect flows, use the URL-template variant (`auth: { kind: "oauth2", providerId, authorizeUrl, tokenUrl, scopes }`) with `{publicFieldKey}` placeholders — no core or host edits needed per provider. See the credential-development skill for details.
+## Binary payloads — never put bytes on the item JSON
+**Rule:** if a node produces or fetches binary content (file attachments, image bytes, audio, PDFs, downloads, etc.), the bytes go through the framework's binary storage via `ctx.binary.attach(...)`. They MUST NOT be placed on the item's JSON payload.
+The runtime persists each item's JSON into the runs table for telemetry, replay, and debugging. Putting megabyte-scale base64 strings in there bloats the database, slows queries, and makes telemetry unreadable. The binary system exists exactly for this: blobs live in object storage; the item JSON only carries a `BinaryAttachment` reference (`{ id, storageKey, mimeType, size, ... }`) under `item.binary[<slot-name>]`.
+```ts
+// Inside execute(items, ctx) on a node that has fetched a file:
+const stored = await ctx.binary.attach({
+  name: "report.pdf", // slot name (also the key under item.binary)
+  body: Buffer.from(bytes), // Buffer / Uint8Array / Readable
+  mimeType: "application/pdf",
+  filename: "report.pdf", // hint for downloads
+});
+const enriched = ctx.binary.withAttachment(item, "report.pdf", stored);
+```
+Notes:
+- Attachment **metadata** (id, name, contentType, size) belongs on the item JSON — it is small and useful for branching. Only the **bytes** must go through `ctx.binary`.
+- For triggers, fetch metadata cheaply in `runCycle` (e.g. Graph's `$expand=attachments($select=id,name,contentType,size)`) and defer the byte download to `execute()` so persisted run state stays tiny on every poll.
+- Two attachments with the same filename within one item collide on `item.binary[name]`; suffix the slot name (`report-2.pdf`) to keep both.
+## Polling-trigger guidance
+- the engine ships a generic polling-trigger runtime in `@codemation/core` exposed via `ctx.polling` on the trigger setup context
+- call `ctx.polling.start({ intervalMs, runCycle })` from your trigger node's `setup()` — the runtime handles the loop, overlap guard, dedup window (`ctx.polling.dedup.merge(...)`), state persistence, and cleanup
+- on the first cycle, baseline-skip (record current ids, emit nothing) so the workflow does not flood with the existing backlog when the trigger is first set up
+- implement `TestableTriggerNode.getTestItems(ctx)` to power the workflow UI's **Test** button — return the most recent N items without consulting or mutating polling state, so users can preview live data without waiting
 ## Publishability

package/skills/codemation-workflow-dsl/SKILL.md CHANGED Viewed

@@ -27,14 +27,35 @@ Do not use this skill for CLI-only troubleshooting or deep host architecture que
 3. Use custom nodes when a callback grows into reusable product logic.
 4. Distinguish **batch activations** from **per-item node bodies**: custom nodes from **`defineNode`** implement **`execute`** per item unless you chose **`defineBatchNode`** for batch **`run`**.
+## Node ids and stability
+Every node in a workflow definition has an `id`. When no explicit `id:` is given, `WorkflowBuilder` derives one by slugifying the node's `name` label: lowercase, non-alphanumeric runs replaced with `-`, trimmed. `"Send Email"` becomes `"send-email"`.
+`.build()` throws `WorkflowDefinitionError` if any node ends up with an empty id (blank label and no explicit `id`) or if two nodes share the same id. The check covers agent connection children (model + tools) as well.
+For nodes that hold credential bindings, the binding is keyed by `(workflowId, nodeId, slotKey)`. Renaming a node's label changes its slug-derived id and orphans the binding — the operator must re-attach the credential in the UI. Prefer stable labels or set an explicit `id:` on credential-using nodes:
+```ts
+.node("Send notification", SendEmailNodeConfig, {
+  id: "send-notification", // stable even if the label is later renamed
+  // ...
+})
+```
 ## Typical flow
 1. Start with `workflow("wf.example.id")`.
 2. Name the workflow with `.name(...)`.
-3. Add a trigger such as `.manualTrigger(...)`.
+3. Add a trigger such as `.manualTrigger(...)` or `builder.trigger(new CronTrigger(...))`.
 4. Add transformations or nodes in execution order.
 5. End with `.build()`.
+## Built-in triggers
+- **`ManualTrigger`** — one-shot manual run, optionally seeded with default items. Use `.manualTrigger(name, items?)` on the fluent builder.
+- **`WebhookTrigger`** — fires on an incoming HTTP request. Construct with `new WebhookTrigger(name, { endpointKey, methods })` and attach with `builder.trigger(...)`.
+- **`CronTrigger`** — fires on a cron schedule. Construct with `new CronTrigger(name, { schedule, timezone? })` and attach with `builder.trigger(...)`. The expression is validated at workflow build time. Each tick emits one item: `{ firedAt: string, scheduledFor: string }` (both ISO-8601). Defaults to UTC — always supply `timezone` for DST-sensitive schedules.
 ## Agent tools (callable helpers)
 - For **inline** agent tools in workflow files (no separate `@tool()` class), use **`callableTool(...)`** from `@codemation/core`: supply `name`, Zod `inputSchema` / `outputSchema`, and `execute({ input, item, ctx, ... })`. **`CallableToolFactory.callableTool(...)`** is the same implementation if you prefer the factory style.
@@ -49,6 +70,19 @@ Do not use this skill for CLI-only troubleshooting or deep host architecture que
 - Use fluent `.map((item, ctx) => ...)` when workflow data itself needs reshaping before the agent step.
 - `model` may be a provider string such as `"openai:gpt-4o-mini"` or a `ChatModelConfig`.
+## Workflow testing nodes
+Codemation ships first-class **workflow tests**: each test case is one full workflow run, persisted with assertion records. Three nodes from `@codemation/core-nodes`:
+1. **`TestTrigger`** — drop alongside live triggers. Author callback `generateItems(ctx)` returns an `AsyncIterable<Item>`; the orchestrator dispatches one workflow run per yielded item with `executionOptions.testContext` set. `triggerKind: "test"` is set automatically — live activation skips it.
+2. **`IsTestRun`** — per-item router with `true` / `false` ports. Routes `true` iff `ctx.testContext` is set. Use it to skip side-effects in tests (don't actually send a real reply).
+3. **`Assertion`** — generic callback emitter; returns `AssertionResult[]`. Each result is `{ name, score: 0..1, passThreshold?, errored?, expected?, actual?, message?, details? }` — pass/fail derives from `score >= (passThreshold ?? 0.5)` (use `score: 1`/`0` for boolean checks, set `passThreshold` for continuous metrics, `errored: true` for assertion-code crashes). Each result becomes one emitted item on `main` and one persisted `TestAssertion` row when running inside a test. Sets `emitsAssertions: true` so the host persister identifies it.
+Authors invoke a TestSuiteRun from the canvas **Tests tab** or via `POST /api/workflows/:id/test-suite-runs`. The orchestrator caps concurrency (default 4, configurable per trigger) and aggregates results into `succeeded | failed | partial | cancelled | errored`.
+Custom nodes can also read `ctx.testContext?.{testSuiteRunId, testCaseIndex}` directly — useful for synthetic outputs in test mode without `IsTestRun` branching.
 ## Read next when needed
 - Read `references/builder-patterns.md` for item-flow rules and fluent authoring patterns.
+- Read `references/workflow-testing.md` for TestTrigger / IsTestRun / Assertion authoring with full examples.

package/skills/codemation-workflow-dsl/references/builder-patterns.md CHANGED Viewed

@@ -15,6 +15,22 @@ export default workflow("wf.example.id")
   .build();
 ```
+## Cron-triggered workflow
+```ts
+import { CronTrigger } from "@codemation/core-nodes";
+export default workflow("wf.nightly.id")
+  .name("Nightly job")
+  .trigger(new CronTrigger("Nightly", { schedule: "0 3 * * *", timezone: "Europe/Amsterdam" }))
+  .map("Process tick", (item, _ctx) => ({
+    firedAt: (item.json as { firedAt: string }).firedAt,
+  }))
+  .build();
+```
+The cron expression is validated at workflow build time. Each tick emits one item with `{ firedAt, scheduledFor }` ISO-8601 strings. Always supply `timezone` for DST-sensitive schedules — defaults to UTC.
 ## Use the fluent DSL by default
 - import `workflow` from `@codemation/host`
@@ -24,12 +40,25 @@ export default workflow("wf.example.id")
 ## Item rules
 - workflow data flows as items
-- items usually carry `json` data and optional `binary` data
+- items usually carry `json` data and optional `binary` data (**storage-backed attachments** via node **`ctx.binary.attach`**, not huge base64 strings in **`json`** — base64 in **`json`** inflates the persisted run payload in the DB; binaries stay as **references**)
 - runtime nodes receive batches of items, not just one record
 - author workflow steps with batching in mind
 - fluent `.map(...)`, `.if(...)`, and `.switch({ resolveCaseKey })` callbacks receive `(item, ctx)`
 - read row fields from `item.json` and earlier completed outputs from `ctx.data`
+## Node id assignment
+When no `id:` is provided, the builder slugifies the node's `name` label: lowercase, non-alphanumeric runs replaced with `-`, leading/trailing `-` stripped. Two nodes with the same effective label produce the same slug and `.build()` throws `WorkflowDefinitionError`. Fix: provide a unique `id:` on the colliding node configs.
+Credential bindings are stored as `(workflowId, nodeId, slotKey)`. Changing a node's label changes its slug-derived id and the binding appears unbound. For credential-using nodes, either keep the label stable or set an explicit `id:`:
+```ts
+.node("Send email", SendEmailNodeConfig, {
+  id: "send-email", // stable even after a label rename
+  credentials: { smtp: mySmtpCredential },
+})
+```
 ## When to move beyond callbacks
 Promote inline callbacks into custom nodes when:

package/skills/codemation-workflow-dsl/references/workflow-testing.md ADDED Viewed

@@ -0,0 +1,194 @@
+# Workflow Testing
+## Use this reference when
+You are authoring or reviewing a workflow that needs **end-to-end tests**: validate agent behavior, regression-test branching, score LLM outputs over time, or assert that a workflow produces the expected output for a known set of inputs.
+This is **not** for unit-testing individual nodes — use `WorkflowTestKit` from `@codemation/core/testing` for that.
+## Three building blocks
+1. **`TestTrigger`** — drops on the canvas alongside live triggers (Webhook / Cron / Gmail / etc.). Authored callback yields one item per test case.
+2. **`IsTestRun`** — per-item router with `true` / `false` ports. Branches based on whether the run was started by the test orchestrator.
+3. **`Assertion`** — generic per-item assertion node; returns one or more `AssertionResult`s per input item, one persisted `TestAssertion` row per result.
+## Typical workflow shape
+```
+[GmailTrigger: new email] ──┐
+                            │
+[TestTrigger: 10 fixtures]──┴─→ [ClassifyAgent]
+                                      │
+                                [IsTestRun?]
+                                  │   │
+                              true│   │false
+                                  ↓   ↓
+                          [Assertion]  [SendReply] (real side effect — skipped in tests)
+```
+## Authoring a TestTrigger
+```ts
+import { TestTrigger } from "@codemation/core-nodes";
+import { gmailCredentialType, type GmailSession } from "@codemation/core-nodes-gmail";
+export const fixtureMailsTrigger = new TestTrigger<{ subject: string; body: string }>({
+  name: "Email fixtures",
+  credentialRequirements: [
+    { slotKey: "gmail", label: "Gmail", acceptedTypes: [gmailCredentialType.definition.typeId] },
+  ],
+  async *generateItems(ctx) {
+    const gmail = await ctx.getCredential<GmailSession>("gmail");
+    const messages = await gmail.listMessages({ labelIds: ["Label_test_mails"] });
+    for (const message of messages) {
+      if (ctx.signal.aborted) break;
+      yield { json: { subject: message.subject, body: message.body } };
+    }
+  },
+  concurrency: 8, // optional; default 4
+  caseLabel: (item) => item.json.subject, // optional; rows fall back to runId
+});
+```
+Notes:
+- `triggerKind: "test"` is set automatically — `TriggerRuntimeService` skips it during live activation.
+- `ctx.signal` is an `AbortSignal` raised when the suite is cancelled; long pulls should bail out.
+- For hardcoded fixtures, just `yield { json: { ... } }` — no need to use credentials.
+- Set `caseLabel` so the Tests-tab tree-table shows something readable instead of opaque runIds.
+## Branching in the workflow
+```ts
+import { IsTestRun } from "@codemation/core-nodes";
+const isTestRun = new IsTestRun("Skip side effects in tests");
+```
+Or read `ctx.testContext` directly from a custom node:
+```ts
+async execute({ item, ctx }) {
+  if (ctx.testContext) {
+    return { json: { result: "synthetic-test-output" } };
+  }
+  return { json: await this.realApi.send(item.json) };
+}
+```
+## Authoring assertions
+```ts
+import { Assertion } from "@codemation/core-nodes";
+const checkClassification = new Assertion<{ label: string; confidence: number }>({
+  name: "Classification checks",
+  assertions: (item) => [
+    {
+      // Boolean-style: 1 = pass, 0 = fail. Default threshold (0.5) handles this.
+      name: "label is spam",
+      score: item.json.label === "spam" ? 1 : 0,
+      expected: "spam",
+      actual: item.json.label,
+    },
+    {
+      // Continuous-score: declare the threshold explicitly.
+      name: "confidence ≥ 0.8",
+      score: item.json.confidence,
+      passThreshold: 0.8,
+      expected: "≥ 0.8",
+      actual: item.json.confidence,
+    },
+  ],
+});
+```
+The `AssertionResult` shape (stable; persister + chart UIs key off these fields):
+```ts
+interface AssertionResult {
+  readonly name: string;
+  /** 0..1 score. Source of truth for pass/fail (compared against `passThreshold`). */
+  readonly score: number;
+  /** 0..1 threshold for "passed". When omitted, consumers default to 0.5. */
+  readonly passThreshold?: number;
+  /** True when evaluating the assertion threw — treated as fail regardless of `score`. */
+  readonly errored?: true;
+  readonly expected?: JsonValue;
+  readonly actual?: JsonValue;
+  readonly message?: string;
+  readonly details?: Readonly<Record<string, JsonValue>>;
+}
+```
+Pass/fail derivation (canonical, in `@codemation/core`):
+```ts
+import { deriveAssertionPassed } from "@codemation/core";
+// errored ? false : score >= (passThreshold ?? 0.5)
+```
+`errored: true` is for the assertion code itself crashing (judge agent crashed, JSON parse failed) — use it to separate "broken evaluator" from "wrong workflow output" in dashboards:
+```ts
+assertions: async (item, ctx) => {
+  try {
+    const j = await runJudge(item, ctx);
+    return [{ name: "polite reply", score: j.score, passThreshold: 0.7, message: j.reason }];
+  } catch (err) {
+    return [{ name: "polite reply", score: 0, errored: true, message: String(err) }];
+  }
+};
+```
+## Judge-by-Agent
+A judge-by-agent is just an AI agent step feeding into an Assertion callback. Run an agent that returns a structured judgment, then map its output to an `AssertionResult` (`score: 0..1`, set `passThreshold`).
+## Running tests
+- **From the UI**: open the workflow → **Tests** tab. Pick a TestTrigger from the dropdown (the picker lists every `triggerKind === "test"` node), click **Run tests**. Use the metric selector on the trend chart to plot pass-rate, per-assertion average scores, or case counts. Click two historical runs to compare them side-by-side.
+- **From code**: instantiate `TestSuiteOrchestrator` from `@codemation/core/bootstrap`, call `runSuite({ workflow, triggerNodeId })`.
+- **From HTTP**: `POST /api/workflows/:workflowId/test-suite-runs` with `{ triggerNodeId, concurrency? }`.
+## Status
+### Per case (`Run.testCaseStatus`)
+| Status      | Meaning                                                                               |
+| ----------- | ------------------------------------------------------------------------------------- |
+| `running`   | Workflow run dispatched, not yet finished.                                            |
+| `succeeded` | Workflow completed AND every assertion passed.                                        |
+| `failed`    | Assertion-rollup downgrade OR the workflow itself reported failure.                   |
+| `errored`   | Workflow run threw before reaching a terminal state (engine error, not an assertion). |
+| `cancelled` | Suite's `AbortSignal` fired before this case completed.                               |
+### Suite
+| Status      | Meaning                                                             |
+| ----------- | ------------------------------------------------------------------- |
+| `succeeded` | All cases passed (or zero cases yielded).                           |
+| `failed`    | Every case failed.                                                  |
+| `partial`   | Some passed, some failed — **the normal "1 of 10 failed" outcome**. |
+| `cancelled` | Suite was aborted before all cases finished.                        |
+| `errored`   | The `generateItems` callback itself threw.                          |
+The suite counters and status are re-derived from the final per-case statuses, so an "all workflows completed cleanly but assertions caught regressions" suite reports `partial` rather than `succeeded`.
+## Best practices
+- **Don't `throw` from `execute` to fail a case.** Throwing skips downstream nodes — including the Assertion node — so you lose all assertion data and only get a run-level error. Instead, let the workflow complete and assert on the (wrong) output. The assertion-rollup downgrades the case to `failed`.
+- Use `score: 1`/`score: 0` for boolean checks (equality, contains, regex). The default `passThreshold = 0.5` handles them.
+- Use `passThreshold` for continuous metrics (confidence, judge ratings, similarity).
+- Reserve `errored: true` for assertion-code crashes, not low scores.
+- Keep TestTriggers as source-controlled fixtures so historical chart comparisons are apples-to-apples.
+## What's deferred (Phase 2)
+- **Test-input snapshots** — Phase 1 fetches inputs live every run (rolling-input). Snapshotting will land in Phase 2 for stable judge-score charts.
+- **Declarative assertion shorthands** — `StringEqualsAssertion`, `JudgeByAgentAssertion`, etc. compose on top of the generic `Assertion` shipping today.
+- **CLI / cron / GitHub PR integration** — currently triggered manually via UI or HTTP only.
+## Read more
+- Top-level walkthrough: [`docs/workflow-testing.md`](../../../../docs/workflow-testing.md)