@codemation/agent-skills 0.1.8 → 0.1.10

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -1,5 +1,23 @@
1
1
  # @codemation/agent-skills
2
2
 
3
+ ## 0.1.10
4
+
5
+ ### Patch Changes
6
+
7
+ - [#114](https://github.com/MadeRelevant/codemation/pull/114) [`ec985a3`](https://github.com/MadeRelevant/codemation/commit/ec985a3264696b421e8be7c84c7cead6a85cbe6c) Thanks [@cblokland90](https://github.com/cblokland90)! - Fix `pnpm create codemation <name>` failing with `ENOENT … node_modules/agent-skills/skills` when dlx'd from npm.
8
+
9
+ `@codemation/agent-skills`'s `exports` field only declared `.`, so `require.resolve("@codemation/agent-skills/package.json")` was blocked by Node's exports gate. `create-codemation`'s resolver fell back to a workspace-only relative path that doesn't exist outside the monorepo. Adds `./package.json` and `./skills/*` to the exports map so subpath access works for consumers — and bumps `create-codemation` patch so the next release pins the fixed agent-skills version.
10
+
11
+ - [#110](https://github.com/MadeRelevant/codemation/pull/110) [`4902978`](https://github.com/MadeRelevant/codemation/commit/49029782243ece59ab6aa5bb46396db445cad47c) Thanks [@cblokland90](https://github.com/cblokland90)! - Add per-package `test:unit` scripts so Turbo can address each package individually for affected-only filtering. No runtime changes — dev-tooling only.
12
+
13
+ ## 0.1.9
14
+
15
+ ### Patch Changes
16
+
17
+ - [#87](https://github.com/MadeRelevant/codemation/pull/87) [`4c50f29`](https://github.com/MadeRelevant/codemation/commit/4c50f29763ad7bc1e39723a6711ca3cf9add5014) Thanks [@cblokland90](https://github.com/cblokland90)! - Disable automatic packaged skill refreshes inside the Codemation framework monorepo so framework-author workflows stop dirtying the local worktree.
18
+ - keep `codemation skills sync` as the explicit refresh path after upgrading `@codemation/cli` or `@codemation/agent-skills`
19
+ - document the monorepo behavior in the packaged CLI skill and agent-skills README
20
+
3
21
  ## 0.1.8
4
22
 
5
23
  ### Patch Changes
@@ -32,7 +50,7 @@
32
50
 
33
51
  `WorkflowAgentOptions` now takes `messages` (the same `AgentMessageConfig` as `AIAgent`) instead of
34
52
  `prompt`. The workflow helper passes `messages` through unchanged. Docs, workflow DSL skills, and the
35
- test-dev sample use `itemValue(...)` for per-item prompts; execution docs note `itemValue` on agent
53
+ test-dev sample use `itemExpr(...)` for per-item prompts; execution docs note `itemExpr` on agent
36
54
  `messages`.
37
55
 
38
56
  ## Unreleased
package/README.md CHANGED
@@ -19,7 +19,13 @@ The starter templates call the extractor automatically after `pnpm install`.
19
19
 
20
20
  ## Framework-managed copy
21
21
 
22
- The directory `.agents/skills/extracted` is **framework-managed**: Codemation overwrites packaged `codemation-*` skill folders there and removes stale packaged skill directories when you run `codemation dev`, `codemation build`, `codemation serve web`, or `codemation dev:plugin` (or `codemation skills sync`). Put project-local skills in sibling folders under `.agents/skills`, not inside `extracted`, unless you accept them being replaced.
22
+ The directory `.agents/skills/extracted` is **framework-managed**:
23
+
24
+ - In consumer projects, Codemation overwrites packaged `codemation-*` skill folders there and removes stale packaged skill directories when you run `codemation dev`, `codemation build`, `codemation serve web`, `codemation dev:plugin`, or `codemation skills sync`.
25
+ - Inside the Codemation framework monorepo, the automatic refresh path is disabled to avoid polluting the local git worktree during framework development.
26
+ - After upgrading `@codemation/cli` or `@codemation/agent-skills` while working in the monorepo, run `codemation skills sync` intentionally if you want the extracted copy refreshed.
27
+
28
+ Put project-local skills in sibling folders under `.agents/skills`, not inside `extracted`, unless you accept them being replaced.
23
29
 
24
30
  ## Programmatic use
25
31
 
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@codemation/agent-skills",
3
- "version": "0.1.8",
3
+ "version": "0.1.10",
4
4
  "description": "Reusable agent skills for Codemation projects and plugin development.",
5
5
  "publishConfig": {
6
6
  "access": "public"
@@ -19,7 +19,9 @@
19
19
  "types": "./lib/agent-skills-extractor.d.ts",
20
20
  "import": "./lib/agent-skills-extractor.mjs",
21
21
  "default": "./lib/agent-skills-extractor.mjs"
22
- }
22
+ },
23
+ "./package.json": "./package.json",
24
+ "./skills/*": "./skills/*"
23
25
  },
24
26
  "bin": {
25
27
  "codemation-agent-skills": "./bin/codemation-agent-skills.mjs"
@@ -46,6 +48,7 @@
46
48
  ],
47
49
  "scripts": {
48
50
  "changeset:verify": "pnpm --workspace-root run changeset:verify",
49
- "test": "vitest run"
51
+ "test": "vitest run",
52
+ "test:unit": "vitest run"
50
53
  }
51
54
  }
@@ -36,6 +36,8 @@ Do not use this skill for workflow graph design, custom node implementation, or
36
36
  4. When Redis-backed execution is involved, mention the shared PostgreSQL requirement instead of assuming local SQLite still fits.
37
37
  5. In consumer mode, discovered plugins are loaded from the built JavaScript path declared in `package.json#codemation.plugin`, not from TypeScript source under `node_modules`.
38
38
  6. In plugin mode, the CLI TypeScript-loads only the current plugin repo through the generated `.codemation/plugin-dev/codemation.config.ts`.
39
+ 7. In the Codemation framework monorepo, automatic refresh of `.agents/skills/extracted` is intentionally disabled to keep the worktree clean.
40
+ 8. After `@codemation/cli` or `@codemation/agent-skills` package upgrades in monorepo work, remind the user to run `codemation skills sync` if they want the extracted packaged skills refreshed.
39
41
 
40
42
  ## Read next when needed
41
43
 
@@ -12,6 +12,12 @@ Use this skill for defining new credential types, wiring them into apps or plugi
12
12
 
13
13
  Do not use this skill for general workflow authoring unless credential slots or runtime sessions are the core problem.
14
14
 
15
+ ## Credential binding stability
16
+
17
+ Credentials bind to a node via `(workflowId, nodeId, slotKey)`. The `nodeId` defaults to a slug of the node's `name` label (lowercase, non-alphanumeric runs replaced with `-`). Renaming a credential-using node's label silently changes its id and the binding appears unbound in the UI — the operator must re-attach manually.
18
+
19
+ To prevent this: either keep the node's label stable across edits, or set an explicit `id:` on the node config so the id is decoupled from the label.
20
+
15
21
  ## Core mental model
16
22
 
17
23
  1. A credential type defines public config, secret material, session creation, and health testing.
@@ -1,5 +1,20 @@
1
1
  # Credential Patterns
2
2
 
3
+ ## Node id and binding stability
4
+
5
+ A credential binding is stored as `(workflowId, nodeId, slotKey)`. The `nodeId` for each workflow node defaults to a slug of its `name` label. Changing the label changes the id, and the previously configured binding appears unbound.
6
+
7
+ For production workflows with credential-using nodes, prefer an explicit `id:` on the node config:
8
+
9
+ ```ts
10
+ .node("Fetch from API", MyApiNodeConfig, {
11
+ id: "fetch-from-api", // stable across label renames
12
+ credentials: { apiKey: myApiCredential },
13
+ })
14
+ ```
15
+
16
+ Without an explicit `id:`, keep the node's label constant or plan to re-bind after a rename.
17
+
3
18
  ## Standard shape
4
19
 
5
20
  Use `defineCredential(...)` to declare:
@@ -40,6 +55,34 @@ Optional or power-user fields (for example custom OAuth scopes) can be tucked be
40
55
 
41
56
  See **`packages/core/docs/credential-ui-fields.md`** in the repository root layout.
42
57
 
58
+ ## OAuth2 credentials (URL-template variant)
59
+
60
+ For credentials that go through the OAuth2 redirect flow (Microsoft Graph, Slack, GitHub, Notion, etc.), declare the authorize and token URLs directly on the credential's `auth` definition. The host's `OAuth2ProviderRegistry` substitutes `{publicFieldKey}` placeholders from the credential's public config at connect time (URL-encoded).
61
+
62
+ ```ts
63
+ auth: {
64
+ kind: "oauth2",
65
+ // providerId is a free-form label for telemetry / DB rows / Better Auth provider naming.
66
+ // It is NOT used for any registry lookup — URLs come from the fields below.
67
+ providerId: "microsoft",
68
+ authorizeUrl: "https://login.microsoftonline.com/{tenantId}/oauth2/v2.0/authorize",
69
+ tokenUrl: "https://login.microsoftonline.com/{tenantId}/oauth2/v2.0/token",
70
+ scopes: ["openid", "offline_access", "User.Read", "Mail.Read"],
71
+ },
72
+ ```
73
+
74
+ Three `auth` variants exist:
75
+
76
+ 1. **URL-template (preferred for new plugins).** Carries `authorizeUrl` / `tokenUrl` / optional `userInfoUrl` directly with `{fieldKey}` substitution. Self-contained — adding a new provider needs no core or host edits.
77
+ 2. **Built-in `providerId` shortcut.** Only `google` is recognized; kept for backwards compatibility. Do not add new providers here.
78
+ 3. **`providerFromPublicConfig`.** URLs read verbatim from public field values at runtime. Rare; the template variant covers almost every real case more ergonomically.
79
+
80
+ Notes for plugin authors:
81
+
82
+ - Host stores post-callback OAuth material with snake_case keys (`access_token`, `refresh_token`, `expiry`, `scope`, `token_type`). Read those keys inside `createSession` / `test`, NOT camelCase.
83
+ - The redirect URI returned to providers rewrites loopback IPs (`127.0.0.1`, `[::1]`) to `localhost` so Azure AD (AADSTS50011) and other providers with the same restriction accept it.
84
+ - The default `Mail.Read` (and similar single-mailbox) Microsoft scopes only cover the credential owner. To monitor a shared mailbox via `/users/{upn}/...`, request `Mail.Read.Shared` (delegated) or admin-consented application permissions.
85
+
43
86
  ## Health and activation
44
87
 
45
88
  - deploy the workflow and credential type
@@ -26,13 +26,21 @@ Do not use this skill for pure workflow chaining questions unless the node imple
26
26
  1. Prefer helper-based nodes first.
27
27
  2. Keep nodes deterministic and focused.
28
28
  3. Request credentials through named slots instead of hard-coded secrets.
29
- 4. Put **static** options (credentials, retry policy, labels) on **config**; put **per-item** behavior in **inputs** / wire JSON and optional **`itemValue`** on config fields (consistent with built-in nodes).
30
- 5. Drop to class-based node APIs only when you need constructor-injected collaborators, decorators, or deeper runtime metadata.
29
+ 4. Put **static** options (credentials, retry policy, labels) on **config**; put **per-item** behavior in **inputs** / wire JSON and optional **`itemExpr`** on config fields (consistent with built-in nodes).
30
+ 5. **Emit files with `ctx.binary`, not base64 in `json`:** use **`attach`** + **`withAttachment`** on **`args.ctx.binary`** (`defineNode`) or **`ctx.binary`** (class nodes). Base64 in **`item.json`** bloats persisted run JSON in the database; binaries use **storage + references** only. See `references/node-patterns.md` and repo docs **Concepts Execution model** / **Custom nodes**.
31
+ 6. Drop to class-based node APIs only when you need constructor-injected collaborators, decorators, or deeper runtime metadata.
31
32
 
32
33
  ## Testing with `WorkflowTestKit`
33
34
 
34
35
  For engine-backed tests without the host, use **`WorkflowTestKit`** from **`@codemation/core/testing`**: **`registerDefinedNodes([...])`**, then **`runNode`** or **`run`**. See the plugin development doc and `@codemation/core` tests for examples.
35
36
 
37
+ ## Custom assertion + test nodes
38
+
39
+ When building **assertion** nodes that should record results into the framework's TestSuiteRun infrastructure, set **`emitsAssertions: true`** on the node config. The host's `TestSuiteRunTracker` listens for `nodeCompleted` events from runs with `ctx.testContext` set and persists each emitted item (matching the `AssertionResult` shape) as a `TestAssertion` row. Drop in a `defineNode` with a per-item `execute` that returns `AssertionResult[]` and you're done — no service injection required.
40
+
41
+ Custom **per-item nodes** can also read **`ctx.testContext?.{testSuiteRunId, testCaseIndex}`** to branch on test mode without an `IsTestRun` upstream — useful for synthetic outputs or skipping irreversible side effects when running tests.
42
+
36
43
  ## Read next when needed
37
44
 
38
45
  - Read `references/node-patterns.md` for `defineNode(...)` patterns and packaging guidance.
46
+ - Use the `codemation-workflow-dsl` skill's `references/workflow-testing.md` for the full TestTrigger / IsTestRun / Assertion authoring story.
@@ -52,6 +52,29 @@ Reach for class-based node APIs when:
52
52
 
53
53
  ## Runtime reminder
54
54
 
55
- - **`defineNode`** runs **`execute` once per item** (with optional **`inputSchema`** and **`itemValue`** on config fields before **`execute`**)
55
+ - **`defineNode`** runs **`execute` once per item** (with optional **`inputSchema`** and **`itemExpr`** on config fields before **`execute`**)
56
56
  - **`defineBatchNode`** runs **`run`** once per activation batch
57
57
  - keep nodes deterministic and testable; prefer real code paths or in-memory collaborators over heavy mocking
58
+
59
+ ## Emitting items, fan-out, and binaries (for AI codegen)
60
+
61
+ **Return shapes**
62
+
63
+ - Return **plain JSON** → one output item with that **`json`** (unless the value is a **top-level array**, which **fans out** to one item per element).
64
+ - Return **`emitPorts({ portName: [...] })`** for multi-port routing.
65
+ - Return an **item-shaped** `{ json, binary?, meta?, paired? }` when you need explicit **`binary`** / **`meta`** / **`paired`** control.
66
+
67
+ **Never put bulk file content in `item.json`**
68
+
69
+ - Fields like `contentBase64`, `data`, or multi-megabyte strings are stored **inside persisted run / step JSON** in the database. That **scales poorly** (base64 is larger than raw bytes) and hurts snapshots and tooling.
70
+ - **Correct:** `const attachment = await args.ctx.binary.attach({ name: "file", body: bytesOrStream, mimeType, filename })` then `return args.ctx.binary.withAttachment({ json: { ok: true } }, "file", attachment)` (or build `{ json, binary }` by hand).
71
+ - **`body`** types match **`BinaryBody`**: `Uint8Array`, `ArrayBuffer`, `ReadableStream`, or async iterable of chunks (same idea as **`HttpRequest`** downloading a body).
72
+ - **`keepBinaries: true`** only **preserves existing** **`item.binary`** through a plain JSON return; it does **not** convert base64 strings in **`json`** into attachments.
73
+
74
+ **Triggers**
75
+
76
+ - Emit **one `Item` per external record**; use **`item.binary`** per record for files—not one item whose **`json`** contains an array of embedded files.
77
+ - For polling triggers that fetch records carrying file payloads (mail attachments, message media, etc.), do this in two phases:
78
+ 1. In `runCycle` (the polling step), fetch only the **metadata** (id, name, contentType, size). The result is persisted into the trigger's setup state and into emitted item JSON, so it must stay small.
79
+ 2. In `execute(items, ctx)`, when the cfg opts into downloads, fetch each blob's bytes from the source API and register them via `ctx.binary.attach(...)`. Then return items via `ctx.binary.withAttachment(item, slot, stored)`.
80
+ - **Do not** request the full payload in the polling fetch (e.g. Microsoft Graph `$expand=attachments` returns base64 `contentBytes` inline; use `$expand=attachments($select=id,name,contentType,size)` to keep the response light). Large polling responses bloat the run state on every cycle, even when no item is emitted.
@@ -1,6 +1,6 @@
1
1
  ---
2
2
  name: codemation-framework-concepts
3
- description: Explains Codemation package boundaries, runtime concepts, and the normal consumer mental model. Use when the user asks where code belongs across `@codemation/core`, `@codemation/host`, `@codemation/next-host`, `@codemation/cli`, workflows, plugins, credentials, activation, or runtime modes.
3
+ description: Explains Codemation package boundaries, runtime concepts, observability shape, and the normal consumer mental model. Use when the user asks where code belongs across `@codemation/core`, `@codemation/host`, `@codemation/next-host`, `@codemation/cli`, workflows, plugins, credentials, activation, telemetry, or runtime modes.
4
4
  compatibility: Designed for Codemation apps, plugins, and framework contributors.
5
5
  ---
6
6
 
@@ -8,7 +8,7 @@ compatibility: Designed for Codemation apps, plugins, and framework contributors
8
8
 
9
9
  ## Use this skill when
10
10
 
11
- Use this skill to explain package ownership, runtime shape, and the boundary between consumer code and framework code.
11
+ Use this skill to explain package ownership, runtime shape, observability boundaries, and the boundary between consumer code and framework code.
12
12
 
13
13
  Do not use this skill as a substitute for detailed CLI, workflow DSL, or plugin implementation guidance when the user already knows the concept they need.
14
14
 
@@ -28,13 +28,18 @@ Do not use this skill as a substitute for detailed CLI, workflow DSL, or plugin
28
28
  - items carry workflow data
29
29
  - credentials provide typed runtime resources
30
30
  - activation is framework-managed and happens in the UI
31
+ - telemetry is observability-first: traces, spans, artifacts, and metric points are framework-owned runtime data
32
+ - run retention and telemetry retention can differ, so trend data can outlive raw run state
33
+ - **workflow testing** is a first-class primitive: a `TestTrigger` node yields one item per test case, the orchestrator dispatches a workflow run per case with `executionOptions.testContext` set, and `Assertion` nodes (`emitsAssertions: true`) record per-run results into `TestAssertion` rows; the canvas exposes a Tests tab parallel to Live and Executions
31
34
 
32
35
  ## Runtime rule of thumb
33
36
 
34
37
  1. Start with the minimum setup.
35
38
  2. Move to shared PostgreSQL and Redis when execution needs separate worker infrastructure.
36
39
  3. Keep workflow code stable while the runtime shape grows around it.
40
+ 4. Treat telemetry as part of the runtime contract, not as ad-hoc node-local logging.
37
41
 
38
42
  ## Read next when needed
39
43
 
40
44
  - Read `references/architecture-map.md` for package ownership and runtime-mode guidance.
45
+ - Use the `codemation-workflow-dsl` skill (and its `references/workflow-testing.md`) for hands-on test authoring with TestTrigger / IsTestRun / Assertion.
@@ -40,6 +40,16 @@ That usually means:
40
40
  - Redis for queue-backed execution
41
41
  - BullMQ-backed scheduling
42
42
 
43
+ ## Observability data
44
+
45
+ Codemation stores observability data in the host runtime alongside other application state:
46
+
47
+ - traces and spans for execution correlation
48
+ - artifacts for drill-down payloads such as AI messages or tool I/O
49
+ - metric points for node-specific measurements without widening the span schema
50
+
51
+ Run retention and telemetry retention are intentionally separate so operators can keep trend data longer than raw run state.
52
+
43
53
  ## Activation flow
44
54
 
45
55
  - deploy the workflow definition and any supporting plugin changes
@@ -23,7 +23,7 @@ That file is the plugin repository's source composition root. Consumers should d
23
23
 
24
24
  ## Node guidance
25
25
 
26
- - start with `defineNode(...)` and **`execute(...)`** for simple reusable nodes (per-item pipeline; optional **`inputSchema`** and **`itemValue`** on config fields)
26
+ - start with `defineNode(...)` and **`execute(...)`** for simple reusable nodes (per-item pipeline; optional **`inputSchema`** and **`itemExpr`** on config fields)
27
27
  - use `defineBatchNode(...)` only when the node must process the **whole activation batch** in one **`run(items, ...)`**
28
28
  - keep runtime logic close to the node definition
29
29
  - move to class-based node APIs when you need constructor-injected collaborators or deeper runtime metadata
@@ -33,6 +33,37 @@ That file is the plugin repository's source composition root. Consumers should d
33
33
  - start with `defineCredential(...)`
34
34
  - build typed sessions in `createSession(...)`
35
35
  - implement `test(...)` so operators can validate configuration before activation
36
+ - for OAuth2 redirect flows, use the URL-template variant (`auth: { kind: "oauth2", providerId, authorizeUrl, tokenUrl, scopes }`) with `{publicFieldKey}` placeholders — no core or host edits needed per provider. See the credential-development skill for details.
37
+
38
+ ## Binary payloads — never put bytes on the item JSON
39
+
40
+ **Rule:** if a node produces or fetches binary content (file attachments, image bytes, audio, PDFs, downloads, etc.), the bytes go through the framework's binary storage via `ctx.binary.attach(...)`. They MUST NOT be placed on the item's JSON payload.
41
+
42
+ The runtime persists each item's JSON into the runs table for telemetry, replay, and debugging. Putting megabyte-scale base64 strings in there bloats the database, slows queries, and makes telemetry unreadable. The binary system exists exactly for this: blobs live in object storage; the item JSON only carries a `BinaryAttachment` reference (`{ id, storageKey, mimeType, size, ... }`) under `item.binary[<slot-name>]`.
43
+
44
+ ```ts
45
+ // Inside execute(items, ctx) on a node that has fetched a file:
46
+ const stored = await ctx.binary.attach({
47
+ name: "report.pdf", // slot name (also the key under item.binary)
48
+ body: Buffer.from(bytes), // Buffer / Uint8Array / Readable
49
+ mimeType: "application/pdf",
50
+ filename: "report.pdf", // hint for downloads
51
+ });
52
+ const enriched = ctx.binary.withAttachment(item, "report.pdf", stored);
53
+ ```
54
+
55
+ Notes:
56
+
57
+ - Attachment **metadata** (id, name, contentType, size) belongs on the item JSON — it is small and useful for branching. Only the **bytes** must go through `ctx.binary`.
58
+ - For triggers, fetch metadata cheaply in `runCycle` (e.g. Graph's `$expand=attachments($select=id,name,contentType,size)`) and defer the byte download to `execute()` so persisted run state stays tiny on every poll.
59
+ - Two attachments with the same filename within one item collide on `item.binary[name]`; suffix the slot name (`report-2.pdf`) to keep both.
60
+
61
+ ## Polling-trigger guidance
62
+
63
+ - the engine ships a generic polling-trigger runtime in `@codemation/core` exposed via `ctx.polling` on the trigger setup context
64
+ - call `ctx.polling.start({ intervalMs, runCycle })` from your trigger node's `setup()` — the runtime handles the loop, overlap guard, dedup window (`ctx.polling.dedup.merge(...)`), state persistence, and cleanup
65
+ - on the first cycle, baseline-skip (record current ids, emit nothing) so the workflow does not flood with the existing backlog when the trigger is first set up
66
+ - implement `TestableTriggerNode.getTestItems(ctx)` to power the workflow UI's **Test** button — return the most recent N items without consulting or mutating polling state, so users can preview live data without waiting
36
67
 
37
68
  ## Publishability
38
69
 
@@ -17,7 +17,7 @@ Do not use this skill for CLI-only troubleshooting or deep host architecture que
17
17
  1. A workflow definition describes how items move from a trigger through downstream steps.
18
18
  2. The fluent authoring chain is the normal starting point for Codemation apps.
19
19
  3. Finish fluent workflow definitions with `.build()`.
20
- 4. Activations are **batch-shaped** (`Items`); many steps use **per-item** execution (`execute`, including helper **`defineNode`**) with optional **`inputSchema`** and **`itemValue`** on config fields. Batch reshape steps (split/filter/aggregate, **`defineBatchNode`**) work on the whole batch.
20
+ 4. Activations are **batch-shaped** (`Items`); many steps use **per-item** execution (`execute`, including helper **`defineNode`**) with optional **`inputSchema`** and **`itemExpr`** on config fields. Batch reshape steps (split/filter/aggregate, **`defineBatchNode`**) work on the whole batch.
21
21
  5. Fluent callback helpers follow the runtime item contract: `.map(...)`, `.if(...)`, and `.switch({ resolveCaseKey })` receive `(item, ctx)`, so row fields live under `item.json` and earlier completed outputs are available through `ctx.data`.
22
22
 
23
23
  ## Authoring rules
@@ -27,14 +27,35 @@ Do not use this skill for CLI-only troubleshooting or deep host architecture que
27
27
  3. Use custom nodes when a callback grows into reusable product logic.
28
28
  4. Distinguish **batch activations** from **per-item node bodies**: custom nodes from **`defineNode`** implement **`execute`** per item unless you chose **`defineBatchNode`** for batch **`run`**.
29
29
 
30
+ ## Node ids and stability
31
+
32
+ Every node in a workflow definition has an `id`. When no explicit `id:` is given, `WorkflowBuilder` derives one by slugifying the node's `name` label: lowercase, non-alphanumeric runs replaced with `-`, trimmed. `"Send Email"` becomes `"send-email"`.
33
+
34
+ `.build()` throws `WorkflowDefinitionError` if any node ends up with an empty id (blank label and no explicit `id`) or if two nodes share the same id. The check covers agent connection children (model + tools) as well.
35
+
36
+ For nodes that hold credential bindings, the binding is keyed by `(workflowId, nodeId, slotKey)`. Renaming a node's label changes its slug-derived id and orphans the binding — the operator must re-attach the credential in the UI. Prefer stable labels or set an explicit `id:` on credential-using nodes:
37
+
38
+ ```ts
39
+ .node("Send notification", SendEmailNodeConfig, {
40
+ id: "send-notification", // stable even if the label is later renamed
41
+ // ...
42
+ })
43
+ ```
44
+
30
45
  ## Typical flow
31
46
 
32
47
  1. Start with `workflow("wf.example.id")`.
33
48
  2. Name the workflow with `.name(...)`.
34
- 3. Add a trigger such as `.manualTrigger(...)`.
49
+ 3. Add a trigger such as `.manualTrigger(...)` or `builder.trigger(new CronTrigger(...))`.
35
50
  4. Add transformations or nodes in execution order.
36
51
  5. End with `.build()`.
37
52
 
53
+ ## Built-in triggers
54
+
55
+ - **`ManualTrigger`** — one-shot manual run, optionally seeded with default items. Use `.manualTrigger(name, items?)` on the fluent builder.
56
+ - **`WebhookTrigger`** — fires on an incoming HTTP request. Construct with `new WebhookTrigger(name, { endpointKey, methods })` and attach with `builder.trigger(...)`.
57
+ - **`CronTrigger`** — fires on a cron schedule. Construct with `new CronTrigger(name, { schedule, timezone? })` and attach with `builder.trigger(...)`. The expression is validated at workflow build time. Each tick emits one item: `{ firedAt: string, scheduledFor: string }` (both ISO-8601). Defaults to UTC — always supply `timezone` for DST-sensitive schedules.
58
+
38
59
  ## Agent tools (callable helpers)
39
60
 
40
61
  - For **inline** agent tools in workflow files (no separate `@tool()` class), use **`callableTool(...)`** from `@codemation/core`: supply `name`, Zod `inputSchema` / `outputSchema`, and `execute({ input, item, ctx, ... })`. **`CallableToolFactory.callableTool(...)`** is the same implementation if you prefer the factory style.
@@ -45,10 +66,23 @@ Do not use this skill for CLI-only troubleshooting or deep host architecture que
45
66
  - Use `.agent(...)` for fluent workflow-defined agent steps.
46
67
  - Define agent messages with `messages`, not a workflow-specific prompt shortcut.
47
68
  - Use a static `messages` array for fixed prompts.
48
- - Use `itemValue(...)` when agent messages depend on the current item.
69
+ - Use `itemExpr(...)` when agent messages depend on the current item.
49
70
  - Use fluent `.map((item, ctx) => ...)` when workflow data itself needs reshaping before the agent step.
50
71
  - `model` may be a provider string such as `"openai:gpt-4o-mini"` or a `ChatModelConfig`.
51
72
 
73
+ ## Workflow testing nodes
74
+
75
+ Codemation ships first-class **workflow tests**: each test case is one full workflow run, persisted with assertion records. Three nodes from `@codemation/core-nodes`:
76
+
77
+ 1. **`TestTrigger`** — drop alongside live triggers. Author callback `generateItems(ctx)` returns an `AsyncIterable<Item>`; the orchestrator dispatches one workflow run per yielded item with `executionOptions.testContext` set. `triggerKind: "test"` is set automatically — live activation skips it.
78
+ 2. **`IsTestRun`** — per-item router with `true` / `false` ports. Routes `true` iff `ctx.testContext` is set. Use it to skip side-effects in tests (don't actually send a real reply).
79
+ 3. **`Assertion`** — generic callback emitter; returns `AssertionResult[]`. Each result is `{ name, score: 0..1, passThreshold?, errored?, expected?, actual?, message?, details? }` — pass/fail derives from `score >= (passThreshold ?? 0.5)` (use `score: 1`/`0` for boolean checks, set `passThreshold` for continuous metrics, `errored: true` for assertion-code crashes). Each result becomes one emitted item on `main` and one persisted `TestAssertion` row when running inside a test. Sets `emitsAssertions: true` so the host persister identifies it.
80
+
81
+ Authors invoke a TestSuiteRun from the canvas **Tests tab** or via `POST /api/workflows/:id/test-suite-runs`. The orchestrator caps concurrency (default 4, configurable per trigger) and aggregates results into `succeeded | failed | partial | cancelled | errored`.
82
+
83
+ Custom nodes can also read `ctx.testContext?.{testSuiteRunId, testCaseIndex}` directly — useful for synthetic outputs in test mode without `IsTestRun` branching.
84
+
52
85
  ## Read next when needed
53
86
 
54
87
  - Read `references/builder-patterns.md` for item-flow rules and fluent authoring patterns.
88
+ - Read `references/workflow-testing.md` for TestTrigger / IsTestRun / Assertion authoring with full examples.
@@ -15,6 +15,22 @@ export default workflow("wf.example.id")
15
15
  .build();
16
16
  ```
17
17
 
18
+ ## Cron-triggered workflow
19
+
20
+ ```ts
21
+ import { CronTrigger } from "@codemation/core-nodes";
22
+
23
+ export default workflow("wf.nightly.id")
24
+ .name("Nightly job")
25
+ .trigger(new CronTrigger("Nightly", { schedule: "0 3 * * *", timezone: "Europe/Amsterdam" }))
26
+ .map("Process tick", (item, _ctx) => ({
27
+ firedAt: (item.json as { firedAt: string }).firedAt,
28
+ }))
29
+ .build();
30
+ ```
31
+
32
+ The cron expression is validated at workflow build time. Each tick emits one item with `{ firedAt, scheduledFor }` ISO-8601 strings. Always supply `timezone` for DST-sensitive schedules — defaults to UTC.
33
+
18
34
  ## Use the fluent DSL by default
19
35
 
20
36
  - import `workflow` from `@codemation/host`
@@ -24,12 +40,25 @@ export default workflow("wf.example.id")
24
40
  ## Item rules
25
41
 
26
42
  - workflow data flows as items
27
- - items usually carry `json` data and optional `binary` data
43
+ - items usually carry `json` data and optional `binary` data (**storage-backed attachments** via node **`ctx.binary.attach`**, not huge base64 strings in **`json`** — base64 in **`json`** inflates the persisted run payload in the DB; binaries stay as **references**)
28
44
  - runtime nodes receive batches of items, not just one record
29
45
  - author workflow steps with batching in mind
30
46
  - fluent `.map(...)`, `.if(...)`, and `.switch({ resolveCaseKey })` callbacks receive `(item, ctx)`
31
47
  - read row fields from `item.json` and earlier completed outputs from `ctx.data`
32
48
 
49
+ ## Node id assignment
50
+
51
+ When no `id:` is provided, the builder slugifies the node's `name` label: lowercase, non-alphanumeric runs replaced with `-`, leading/trailing `-` stripped. Two nodes with the same effective label produce the same slug and `.build()` throws `WorkflowDefinitionError`. Fix: provide a unique `id:` on the colliding node configs.
52
+
53
+ Credential bindings are stored as `(workflowId, nodeId, slotKey)`. Changing a node's label changes its slug-derived id and the binding appears unbound. For credential-using nodes, either keep the label stable or set an explicit `id:`:
54
+
55
+ ```ts
56
+ .node("Send email", SendEmailNodeConfig, {
57
+ id: "send-email", // stable even after a label rename
58
+ credentials: { smtp: mySmtpCredential },
59
+ })
60
+ ```
61
+
33
62
  ## When to move beyond callbacks
34
63
 
35
64
  Promote inline callbacks into custom nodes when:
@@ -55,5 +84,5 @@ Promote inline callbacks into custom nodes when:
55
84
 
56
85
  - use `.agent(...)` for agent steps in fluent workflow definitions
57
86
  - define agent prompts with `messages`
58
- - use `itemValue(...)` when message content depends on `item.json`
87
+ - use `itemExpr(...)` when message content depends on `item.json`
59
88
  - use `outputSchema` when the workflow should expose typed structured agent output
@@ -0,0 +1,194 @@
1
+ # Workflow Testing
2
+
3
+ ## Use this reference when
4
+
5
+ You are authoring or reviewing a workflow that needs **end-to-end tests**: validate agent behavior, regression-test branching, score LLM outputs over time, or assert that a workflow produces the expected output for a known set of inputs.
6
+
7
+ This is **not** for unit-testing individual nodes — use `WorkflowTestKit` from `@codemation/core/testing` for that.
8
+
9
+ ## Three building blocks
10
+
11
+ 1. **`TestTrigger`** — drops on the canvas alongside live triggers (Webhook / Cron / Gmail / etc.). Authored callback yields one item per test case.
12
+ 2. **`IsTestRun`** — per-item router with `true` / `false` ports. Branches based on whether the run was started by the test orchestrator.
13
+ 3. **`Assertion`** — generic per-item assertion node; returns one or more `AssertionResult`s per input item, one persisted `TestAssertion` row per result.
14
+
15
+ ## Typical workflow shape
16
+
17
+ ```
18
+ [GmailTrigger: new email] ──┐
19
+
20
+ [TestTrigger: 10 fixtures]──┴─→ [ClassifyAgent]
21
+
22
+ [IsTestRun?]
23
+ │ │
24
+ true│ │false
25
+ ↓ ↓
26
+ [Assertion] [SendReply] (real side effect — skipped in tests)
27
+ ```
28
+
29
+ ## Authoring a TestTrigger
30
+
31
+ ```ts
32
+ import { TestTrigger } from "@codemation/core-nodes";
33
+ import { gmailCredentialType, type GmailSession } from "@codemation/core-nodes-gmail";
34
+
35
+ export const fixtureMailsTrigger = new TestTrigger<{ subject: string; body: string }>({
36
+ name: "Email fixtures",
37
+ credentialRequirements: [
38
+ { slotKey: "gmail", label: "Gmail", acceptedTypes: [gmailCredentialType.definition.typeId] },
39
+ ],
40
+ async *generateItems(ctx) {
41
+ const gmail = await ctx.getCredential<GmailSession>("gmail");
42
+ const messages = await gmail.listMessages({ labelIds: ["Label_test_mails"] });
43
+ for (const message of messages) {
44
+ if (ctx.signal.aborted) break;
45
+ yield { json: { subject: message.subject, body: message.body } };
46
+ }
47
+ },
48
+ concurrency: 8, // optional; default 4
49
+ caseLabel: (item) => item.json.subject, // optional; rows fall back to runId
50
+ });
51
+ ```
52
+
53
+ Notes:
54
+
55
+ - `triggerKind: "test"` is set automatically — `TriggerRuntimeService` skips it during live activation.
56
+ - `ctx.signal` is an `AbortSignal` raised when the suite is cancelled; long pulls should bail out.
57
+ - For hardcoded fixtures, just `yield { json: { ... } }` — no need to use credentials.
58
+ - Set `caseLabel` so the Tests-tab tree-table shows something readable instead of opaque runIds.
59
+
60
+ ## Branching in the workflow
61
+
62
+ ```ts
63
+ import { IsTestRun } from "@codemation/core-nodes";
64
+
65
+ const isTestRun = new IsTestRun("Skip side effects in tests");
66
+ ```
67
+
68
+ Or read `ctx.testContext` directly from a custom node:
69
+
70
+ ```ts
71
+ async execute({ item, ctx }) {
72
+ if (ctx.testContext) {
73
+ return { json: { result: "synthetic-test-output" } };
74
+ }
75
+ return { json: await this.realApi.send(item.json) };
76
+ }
77
+ ```
78
+
79
+ ## Authoring assertions
80
+
81
+ ```ts
82
+ import { Assertion } from "@codemation/core-nodes";
83
+
84
+ const checkClassification = new Assertion<{ label: string; confidence: number }>({
85
+ name: "Classification checks",
86
+ assertions: (item) => [
87
+ {
88
+ // Boolean-style: 1 = pass, 0 = fail. Default threshold (0.5) handles this.
89
+ name: "label is spam",
90
+ score: item.json.label === "spam" ? 1 : 0,
91
+ expected: "spam",
92
+ actual: item.json.label,
93
+ },
94
+ {
95
+ // Continuous-score: declare the threshold explicitly.
96
+ name: "confidence ≥ 0.8",
97
+ score: item.json.confidence,
98
+ passThreshold: 0.8,
99
+ expected: "≥ 0.8",
100
+ actual: item.json.confidence,
101
+ },
102
+ ],
103
+ });
104
+ ```
105
+
106
+ The `AssertionResult` shape (stable; persister + chart UIs key off these fields):
107
+
108
+ ```ts
109
+ interface AssertionResult {
110
+ readonly name: string;
111
+ /** 0..1 score. Source of truth for pass/fail (compared against `passThreshold`). */
112
+ readonly score: number;
113
+ /** 0..1 threshold for "passed". When omitted, consumers default to 0.5. */
114
+ readonly passThreshold?: number;
115
+ /** True when evaluating the assertion threw — treated as fail regardless of `score`. */
116
+ readonly errored?: true;
117
+ readonly expected?: JsonValue;
118
+ readonly actual?: JsonValue;
119
+ readonly message?: string;
120
+ readonly details?: Readonly<Record<string, JsonValue>>;
121
+ }
122
+ ```
123
+
124
+ Pass/fail derivation (canonical, in `@codemation/core`):
125
+
126
+ ```ts
127
+ import { deriveAssertionPassed } from "@codemation/core";
128
+ // errored ? false : score >= (passThreshold ?? 0.5)
129
+ ```
130
+
131
+ `errored: true` is for the assertion code itself crashing (judge agent crashed, JSON parse failed) — use it to separate "broken evaluator" from "wrong workflow output" in dashboards:
132
+
133
+ ```ts
134
+ assertions: async (item, ctx) => {
135
+ try {
136
+ const j = await runJudge(item, ctx);
137
+ return [{ name: "polite reply", score: j.score, passThreshold: 0.7, message: j.reason }];
138
+ } catch (err) {
139
+ return [{ name: "polite reply", score: 0, errored: true, message: String(err) }];
140
+ }
141
+ };
142
+ ```
143
+
144
+ ## Judge-by-Agent
145
+
146
+ A judge-by-agent is just an AI agent step feeding into an Assertion callback. Run an agent that returns a structured judgment, then map its output to an `AssertionResult` (`score: 0..1`, set `passThreshold`).
147
+
148
+ ## Running tests
149
+
150
+ - **From the UI**: open the workflow → **Tests** tab. Pick a TestTrigger from the dropdown (the picker lists every `triggerKind === "test"` node), click **Run tests**. Use the metric selector on the trend chart to plot pass-rate, per-assertion average scores, or case counts. Click two historical runs to compare them side-by-side.
151
+ - **From code**: instantiate `TestSuiteOrchestrator` from `@codemation/core/bootstrap`, call `runSuite({ workflow, triggerNodeId })`.
152
+ - **From HTTP**: `POST /api/workflows/:workflowId/test-suite-runs` with `{ triggerNodeId, concurrency? }`.
153
+
154
+ ## Status
155
+
156
+ ### Per case (`Run.testCaseStatus`)
157
+
158
+ | Status | Meaning |
159
+ | ----------- | ------------------------------------------------------------------------------------- |
160
+ | `running` | Workflow run dispatched, not yet finished. |
161
+ | `succeeded` | Workflow completed AND every assertion passed. |
162
+ | `failed` | Assertion-rollup downgrade OR the workflow itself reported failure. |
163
+ | `errored` | Workflow run threw before reaching a terminal state (engine error, not an assertion). |
164
+ | `cancelled` | Suite's `AbortSignal` fired before this case completed. |
165
+
166
+ ### Suite
167
+
168
+ | Status | Meaning |
169
+ | ----------- | ------------------------------------------------------------------- |
170
+ | `succeeded` | All cases passed (or zero cases yielded). |
171
+ | `failed` | Every case failed. |
172
+ | `partial` | Some passed, some failed — **the normal "1 of 10 failed" outcome**. |
173
+ | `cancelled` | Suite was aborted before all cases finished. |
174
+ | `errored` | The `generateItems` callback itself threw. |
175
+
176
+ The suite counters and status are re-derived from the final per-case statuses, so an "all workflows completed cleanly but assertions caught regressions" suite reports `partial` rather than `succeeded`.
177
+
178
+ ## Best practices
179
+
180
+ - **Don't `throw` from `execute` to fail a case.** Throwing skips downstream nodes — including the Assertion node — so you lose all assertion data and only get a run-level error. Instead, let the workflow complete and assert on the (wrong) output. The assertion-rollup downgrades the case to `failed`.
181
+ - Use `score: 1`/`score: 0` for boolean checks (equality, contains, regex). The default `passThreshold = 0.5` handles them.
182
+ - Use `passThreshold` for continuous metrics (confidence, judge ratings, similarity).
183
+ - Reserve `errored: true` for assertion-code crashes, not low scores.
184
+ - Keep TestTriggers as source-controlled fixtures so historical chart comparisons are apples-to-apples.
185
+
186
+ ## What's deferred (Phase 2)
187
+
188
+ - **Test-input snapshots** — Phase 1 fetches inputs live every run (rolling-input). Snapshotting will land in Phase 2 for stable judge-score charts.
189
+ - **Declarative assertion shorthands** — `StringEqualsAssertion`, `JudgeByAgentAssertion`, etc. compose on top of the generic `Assertion` shipping today.
190
+ - **CLI / cron / GitHub PR integration** — currently triggered manually via UI or HTTP only.
191
+
192
+ ## Read more
193
+
194
+ - Top-level walkthrough: [`docs/workflow-testing.md`](../../../../docs/workflow-testing.md)