npm - @codemation/agent-skills - Versions diffs - 0.3.0 → 0.4.0 - Mend

@codemation/agent-skills 0.3.0 → 0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (5) hide show

package/CHANGELOG.md +17 -0
package/dist/metadata.json +42 -5
package/package.json +1 -1
package/skills/codemation-document-scanner/SKILL.md +136 -0
package/skills/codemation-workspace-files/SKILL.md +142 -0

package/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,22 @@
 # @codemation/agent-skills
+## 0.4.0
+### Minor Changes
+- [#185](https://github.com/MadeRelevant/codemation/pull/185) [`be520d2`](https://github.com/MadeRelevant/codemation/commit/be520d2755144a3709ecc109019b84e2c502337e) Thanks [@cblokland90](https://github.com/cblokland90)! - feat(workspace-files): examples + skill (story 04)
+  Two CI-verified examples for workspace file nodes:
+  - node-workspace-files-read.example.ts — read a workspace CSV by filename (latest-wins) + parse
+  - concierge-digest-to-workflow.example.ts — concierge digests PDF → workflow reads derived JSON
+  New codemation-workspace-files skill documenting: read-by-filename vs pinned fileId, binary slot
+  handoff, read-only boundary, and the raw-upload → concierge-digests → workflow-reads pattern.
+### Patch Changes
+- [#184](https://github.com/MadeRelevant/codemation/pull/184) [`8c5b213`](https://github.com/MadeRelevant/codemation/commit/8c5b213092501bb48aa0c575b0489bbd36b46c79) Thanks [@cblokland90](https://github.com/cblokland90)! - Add `codemation-document-scanner` skill entry covering analyzerType routing table, confidence opt-in (LD6), output shape, and when to choose the managed node vs standalone OCR nodes.
 ## 0.3.0
 ### Minor Changes

package/dist/metadata.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
   "schemaVersion": 1,
   "packageName": "@codemation/agent-skills",
-  "packageVersion": "0.3.0",
+  "packageVersion": "0.4.0",
   "description": "Reusable agent skills for Codemation projects and plugin development.",
   "kind": "skills",
   "skills": [
@@ -15,7 +15,7 @@
       ],
       "sourcePath": "skills/codemation-ai-agent-node/SKILL.md",
       "dependencies": {
-        "@codemation/core-nodes": "0.9.0"
+        "@codemation/core-nodes": "0.10.0"
       },
       "code": "---\nname: codemation-ai-agent-node\ndescription: AIAgent constructor, message shape, managed and BYOK chatModel configs, outputSchema, mcpServers. Read before writing any workflow step that calls an LLM.\ncompatibility: Codemation core-nodes. Requires @codemation/core-nodes import.\ntags: agent, llm, ai\nuses: \"@codemation/core-nodes\"\n---\n\n# Codemation AI Agent Node\n\n## Mental model\n\n`AIAgent` is the single building block for any LLM step in a workflow. It receives items, runs a chat completion per item using the configured model and messages, and emits `{ output: string }` (or a parsed object when `outputSchema` is set) on its `main` port. The `chatModel` field determines whether the run consumes Codemation-managed quota (no credential needed) or a BYOK key the operator supplies. Every AIAgent emits exactly one output item per input item — it never fans out or filters.\n\n## When to use / when NOT\n\nUse `AIAgent` when a workflow step needs an LLM call: classification, extraction, summarisation, drafting, or decision.\nUse a plain `Callback` instead when the logic is deterministic code — no LLM needed.\nUse `mcpServers` (see `codemation-mcp-capabilities`) when the agent needs tool access to external services.\nRead `codemation-workflow-dsl` for the surrounding workflow structure.\n\n## Quickstart\n\n```ts\nimport { AIAgent, CodemationChatModelConfig } from \"@codemation/core-nodes\";\n\nnew AIAgent({\n  name: \"Classify email\",\n  messages: [\n    { role: \"system\", content: \"Classify the email as spam or not-spam.\" },\n    { role: \"user\", content: (args) => args.item.json.body as string },\n  ],\n  chatModel: new CodemationChatModelConfig(\"Claude Haiku\", \"anthropic/claude-haiku-4-5-20251001\"),\n});\n```\n\nFor full patterns — BYOK (`OpenAIChatModelConfig`), `outputSchema`, tools, multi-step pipelines, and gmail classification — use your harness's example-discovery tool: `find_examples({ query: \"AIAgent\" })`.\n\n## Decision branches & gotchas\n\n**Managed mode (default — no API key needed):** use `CodemationChatModelConfig(label, modelId)`. In managed mode the LLM broker **auto-authenticates via the workspace HMAC pairing** — no API key, no credential slot, no user setup required. This is the correct default for all managed-mode workflows. Do NOT tell managed users to \"get an API key\" — the broker handles authentication transparently.\n\n```ts\nchatModel: new CodemationChatModelConfig(\"Claude Haiku\", \"anthropic/claude-haiku-4-5-20251001\");\n// No credential slot created. Discover live model ids:\n// GET <CONTROL_PLANE_URL>/api/llm/managed-models\n```\n\n**BYOK (self-hosted / non-managed only):** use `OpenAIChatModelConfig(label, modelId, slotKey)` — it creates a credential slot the operator must bind with an API key. Only use this in self-hosted deployments where no managed broker is available.\n\n**Messages:** `content` is a plain string or a function `(args: { item, itemIndex, items, ctx }) => string`. Put instructions in the `system` message, per-item data in the `user` message. Use `\"assistant\"` role only for few-shot examples.\n\n**Structured output:** add `outputSchema: z.object({...})` to validate and parse the response. Without it, `item.json.output` is always a plain string.\n\n**Stable node id:** if the node has a credential binding (BYOK), set an explicit `id:` on the constructor. Without it the id derives from the `name` label — renaming the label orphans the binding. See `codemation-workflow-dsl` for the full id-stability rule.\n\n**Downstream access:** the next node sees `item.json.output` as the agent's text response. Cast it via a typed `Callback<{ output: string }>`.\n\n## Anti-patterns\n\n- Do not tell managed users to get an API key — use `CodemationChatModelConfig`; the broker authenticates automatically.\n- Do not use `OpenAIChatModelConfig` in managed mode — it creates an unnecessary credential slot and will prompt the user to bind a key they don't need.\n- Do not use `AIAgent` for deterministic logic; use `Callback` instead (cheaper, faster, no LLM billing).\n- Do not attempt to return multiple items from a single `AIAgent` step — it emits exactly one output per input.\n\nSee `references/anti-patterns.md` for version-specific gotchas (managed model id churn, chatModel string shorthand trap).\n"
     },
@@ -52,10 +52,29 @@
       ],
       "sourcePath": "skills/codemation-custom-node-development/SKILL.md",
       "dependencies": {
-        "@codemation/core": "0.12.0"
+        "@codemation/core": "0.13.0"
       },
       "code": "---\nname: codemation-custom-node-development\ndescription: Guides Codemation custom node development with `defineNode(...)` (`execute` per item), `defineBatchNode(...)` (batch `run`), reusable node modules, credential-aware nodes, and the class-based node fallback for advanced cases. Use when creating or updating custom nodes for apps or plugin packages.\ncompatibility: Designed for Codemation apps and plugin packages that define reusable nodes.\ntags: node, custom, plugin\nuses: \"@codemation/core\"\n---\n\n# Codemation Custom Node Development\n\n## Mental model\n\nCustom nodes are the extension point for reusable business logic that doesn't belong inline in a workflow callback. `defineNode(...)` wraps a per-item `execute` function with a typed contract (input schema, credential slots, output shape); the engine calls it once per item. `defineBatchNode(...)` is the batch variant for logic that must see all items at once. Nodes compose into workflows via config class instances — the node definition is separate from the config class used to wire it into a workflow.\n\n## Use this skill when\n\nUse this skill for reusable custom node work, whether the node lives inside an app or a published plugin package.\n\nDo not use this skill for pure workflow chaining questions unless the node implementation itself is changing.\n\n## Per-item vs batch\n\n**`defineNode(...)` (per-item)** — the engine calls `execute(args, context)` once per item. This is the right default for the vast majority of nodes: straightforward logic, credential slots, input schema, optional fan-out.\n\n**`defineBatchNode(...)` (batch)** — the engine calls `run(items, context)` with the full activation batch. Use only when the node genuinely needs to see all items at once (aggregation, bulk API calls, cross-item correlation).\n\nWhen in doubt, start with `defineNode`.\n\n## Node rules\n\n1. Keep nodes deterministic and focused.\n2. Request credentials through named slots — never hard-code secrets.\n3. Put **static** options (credentials, retry policy, labels) on **config**; put **per-item** behavior in **inputs** / wire JSON and optional `itemExpr` on config fields.\n4. **Emit files with `ctx.binary`, not base64 in `json`** — base64 in `item.json` bloats persisted run data. See `references/node-patterns.md`.\n5. Drop to class-based node APIs only when you need constructor-injected collaborators, decorators, or deeper runtime metadata.\n\n## Minimal `defineNode` example\n\n```ts\nimport { defineNode } from \"@codemation/core\";\nimport { z } from \"zod\";\n\nexport const uppercaseNode = defineNode({\n  key: \"example.uppercase\",\n  title: \"Uppercase field\",\n  icon: \"lucide:languages\",\n  inputSchema: z.object({ field: z.string() }),\n  async execute({ input }) {\n    return { ...input, field: input.field.toUpperCase() };\n  },\n});\n```\n\nFor full patterns — credential-slotted nodes, batch nodes, fan-out, binary payloads, and test kit usage — use your harness's example-discovery tool: `find_examples({ query: \"defineNode\" })` or `find_examples({ query: \"defineBatchNode\" })`.\n\n## Read next\n\n- `references/define-node-per-item.md` — full `defineNode(...)` contract, `inputSchema`, `itemExpr`, fan-out, assertion nodes, and `WorkflowTestKit` usage. Load this when writing or debugging a per-item node.\n- `references/define-batch-node.md` — `defineBatchNode(...)` contract and when to choose batch over per-item. Load this when the node must see the entire batch at once.\n- `references/credential-aware-nodes.md` — credential slots, typed sessions, and how to test credential-aware nodes. Load this when your node needs a credential.\n- `references/node-patterns.md` — binary payloads (`ctx.binary`, `attach`, `withAttachment`), fan-out return shapes, polling-trigger binary patterns, MS Graph attachment download, and HTTP binary round-trips. Load this when working with file data or HTTP binaries.\n"
     },
+    {
+      "name": "codemation-document-scanner",
+      "description": "CodemationDocumentScanner node — managed document/invoice/image extraction via the Codemation doc-scanner service. No Azure credentials required. Read before writing any workflow that scans documents, invoices, or images.",
+      "tags": [
+        "ocr",
+        "document",
+        "invoice",
+        "image",
+        "scan",
+        "extract",
+        "managed",
+        "confidence"
+      ],
+      "sourcePath": "skills/codemation-document-scanner/SKILL.md",
+      "dependencies": {
+        "@codemation/core-nodes": "0.10.0"
+      },
+      "code": "---\nname: codemation-document-scanner\ndescription: CodemationDocumentScanner node — managed document/invoice/image extraction via the Codemation doc-scanner service. No Azure credentials required. Read before writing any workflow that scans documents, invoices, or images.\ncompatibility: Codemation core-nodes. Requires @codemation/core-nodes import.\ntags: ocr, document, invoice, image, scan, extract, managed, confidence\nuses: \"@codemation/core-nodes\"\n---\n\n# Codemation Document Scanner Node\n\n> **Start here: call `find_examples` before reading further.**\n>\n> - `find_examples({ query: \"CodemationDocumentScanner\" })` — node-level usage and all analyzerType variants\n> - `find_examples({ query: \"invoice scan post accounting\" })` — end-to-end invoice extraction scenario\n> - `find_examples({ query: \"document scanner confidence fields\" })` — how to enable per-field confidence scores\n\n## Use this skill when\n\nWriting a workflow that extracts text and/or structured fields from documents, invoices, or images\nusing the Codemation managed scanning service. No Azure credentials are required — the service is\npre-wired to the platform.\n\nUse `codemation-workflow-dsl` for surrounding workflow structure.\nUse `codemation-ai-agent-node` if you need to pass the extracted markdown to an LLM for further processing.\n\n## When to use CodemationDocumentScanner vs the standalone OCR nodes\n\n| Situation                                        | Use                                                                                                 |\n| ------------------------------------------------ | --------------------------------------------------------------------------------------------------- |\n| Managed platform deployment, no Azure credential | `codemationDocumentScannerNode` (this skill)                                                        |\n| Self-hosted / BYOK Azure Content Understanding   | `analyzeDocumentNode` / `analyzeInvoiceNode` / `analyzeImageNode` from `@codemation/core-nodes-ocr` |\n\n`codemationDocumentScannerNode` calls the internal `doc-scanner` service via HMAC — the workspace holds\nno Azure key. The standalone OCR nodes call Azure directly using a per-workspace credential.\n\n## Choosing `analyzerType`\n\n| `analyzerType` | When to use                                     | Azure analyzer                                                          | Field extraction       | Confidence opt-in supported             |\n| -------------- | ----------------------------------------------- | ----------------------------------------------------------------------- | ---------------------- | --------------------------------------- |\n| `\"document\"`   | General PDFs, Word docs, HTML, text-heavy files | `prebuilt-document`                                                     | Yes                    | Yes                                     |\n| `\"invoice\"`    | Invoices, receipts — always prebuilt-invoice    | `prebuilt-invoice`                                                      | Yes                    | Yes                                     |\n| `\"image\"`      | Photos, screenshots, diagrams                   | `prebuilt-imageAnalyzer`                                                | No (markdown only)     | No — image carries no extraction charge |\n| `\"auto\"`       | Unknown mime type at author time                | Routes on `Content-Type`: `image/*` → image, everything else → document | Depends on routed type | Depends on routed type                  |\n\n**Default is `\"auto\"`.** Set an explicit type whenever you know the content class — it avoids\nunnecessary re-routing and makes the workflow self-documenting.\n\n## Output shape\n\n```ts\n{\n  markdown: string; // full text content\n  fields: Record<\n    string,\n    {\n      value: unknown; // extracted scalar, date ISO string, nested object, or array\n      confidence: number | null; // 0–1 when includeConfidence:true; null otherwise\n    }\n  >;\n}\n```\n\n`item.json.markdown` is the Markdown rendering of the document.\n`item.json.fields` is a flat-or-nested map of structured fields found by the analyzer.\nFor `analyzerType: \"image\"`, `fields` is always `{}`.\nFields may be sparse or absent for generic documents — extraction is best-effort.\n\n## WARNING: How to enable per-field confidence scores (LD6)\n\n**By default, `confidence` is `null` on every field.** This keeps token cost low for the majority\nof workflows that only need `value`.\n\nTo get a populated `confidence` (0–1 float) on each field, set `includeConfidence: true`:\n\n```ts\ncodemationDocumentScannerNode.create(\n  {\n    binaryField: \"data\",\n    analyzerType: \"invoice\",\n    includeConfidence: true, // ← opt-in: fields carry confidence 0–1\n  },\n  \"Scan invoice\",\n  \"scan-invoice\",\n);\n```\n\n**Cost implication:** enabling confidence routes the request to a confidence-enabled analyzer variant,\nwhich roughly doubles the contextualization token count for document/invoice analyzers.\nOnly enable it when your downstream logic actually reads `field.confidence`.\n\nImages (`analyzerType: \"image\"`) and auto-routed-to-image requests carry no field-extraction charge\nregardless of this flag — they silently ignore `includeConfidence` (confidence stays `null`, never a 400).\n\n## API usage\n\n```ts\nimport { codemationDocumentScannerNode } from \"@codemation/core-nodes\";\n\ncodemationDocumentScannerNode.create(\n  {\n    binaryField?: string;         // key on item.binary — default \"data\"\n    analyzerType?: \"document\" | \"invoice\" | \"image\" | \"auto\";  // default \"auto\"\n    contentType?: string;         // MIME type override — falls back to attachment.mimeType\n    includeConfidence?: boolean;  // default false — see cost note above\n    maxBytes?: number;            // size cap before reading; default 50 MiB (LD10)\n  },\n  label?: string,   // node label on the canvas\n  nodeId?: string,  // explicit stable id — set when output is used downstream\n)\n```\n\nSet an explicit `nodeId` whenever downstream nodes reference this node's output by id, or when\nthe node may be renamed later (avoids credential-binding orphaning).\n\n## Consuming fields downstream\n\n```ts\nimport type { DocScannerOutput, DocScannerField } from \"@codemation/core-nodes\";\n\n// item.json after codemationDocumentScannerNode:\n// DocScannerOutput = { markdown: string; fields: Record<string, DocScannerField> }\n// DocScannerField  = { value: unknown; confidence: number | null }\n\nnew Callback<DocScannerOutput>(\"Use fields\", (items, _ctx) =>\n  items.map((item) => {\n    const vendorName = item.json.fields[\"VendorName\"]?.value as string | undefined;\n    const vendorConf = item.json.fields[\"VendorName\"]?.confidence; // null unless includeConfidence:true\n    return { ...item, json: { ...item.json, vendorName, vendorConf } };\n  }),\n);\n```\n\n## Read next when needed\n\n- `codemation-workflow-dsl` — workflow builder, trigger types, fluent vs low-level API.\n- `codemation-ai-agent-node` — pass `item.json.markdown` to an LLM for summarisation or extraction.\n"
+    },
     {
       "name": "codemation-framework-concepts",
       "description": "Explains Codemation package boundaries, runtime concepts, observability shape, and the normal consumer mental model. Use when the user asks where code belongs across `@codemation/core`, `@codemation/host`, `@codemation/next-host`, `@codemation/cli`, workflows, plugins, credentials, activation, telemetry, or runtime modes. Read this first when starting any Codemation task — it points at the right skill for the work.",
@@ -101,10 +120,28 @@
       ],
       "sourcePath": "skills/codemation-workflow-dsl/SKILL.md",
       "dependencies": {
-        "@codemation/core-nodes": "0.9.0",
-        "@codemation/host": "0.9.1"
+        "@codemation/core-nodes": "0.10.0",
+        "@codemation/host": "0.10.0"
       },
       "code": "---\nname: codemation-workflow-dsl\ndescription: Guides Codemation workflow authoring. Use when creating or updating workflow definitions in `src/workflows` — manual-trigger flows via `workflow(\"...\").manualTrigger(...)`, or cron/webhook/other triggers via `createWorkflowBuilder({id, name}).trigger(...)`.\ncompatibility: Designed for Codemation apps and plugins that author workflows.\ntags: workflow, dsl, authoring\nuses: \"@codemation/core-nodes, @codemation/host\"\n---\n\n# Codemation Workflow DSL\n\n## Mental model\n\nA workflow definition describes how items move from a trigger through downstream node steps. Items carry data in `item.json`; earlier outputs are available through `ctx.data`. Activations are batch-shaped but most node steps execute per-item. Every workflow definition finishes with `.build()`, which validates node ids and emits a `WorkflowDefinitionError` on collision or empty id.\n\n## When to use / when NOT\n\nUse this skill when authoring or reviewing workflow definitions under `src/workflows/`.\nDo not use for CLI-only troubleshooting or deep host architecture questions unless they directly affect workflow authoring.\n\n## Quickstart — pick API by trigger type\n\n```ts\n// Manual trigger — full fluent sugar (.map, .if, .switch, .agent, .node, .then)\nimport { workflow } from \"@codemation/host\";\nexport default workflow(\"wf.example\")\n  .manualTrigger(\"Start\", {\n    /* seed items */\n  })\n  .map(/* ... */)\n  .build();\n\n// Cron / webhook / any other trigger — low-level .then(new NodeConfig(...)) only\nimport { createWorkflowBuilder, CronTrigger } from \"@codemation/core-nodes\";\nexport default createWorkflowBuilder({ id: \"wf.example\", name: \"Example\" })\n  .trigger(new CronTrigger(\"Daily\", { schedule: \"0 9 * * *\", timezone: \"UTC\" }))\n  .then(/* new SomeNodeConfig(...) */)\n  .build();\n```\n\nFor full patterns — multi-step pipelines, branching, SubWorkflow, binary, agent tools, TestTrigger, and complete working examples — use your harness's example-discovery tool: `find_examples({ query: \"...\" })`. Useful queries: `\"CronTrigger\"`, `\"if branch\"`, `\"AIAgent multi-step\"`, `\"SubWorkflow binary\"`, `\"TestTrigger assertion\"`.\n\n## Decision branches & gotchas\n\n**Two authoring APIs — pick by trigger type.** `workflow(\"id\").manualTrigger(...)` returns a `WorkflowChain` with full fluent helpers (`.map`, `.if`, `.switch`, `.split`, `.agent`, `.node`). `createWorkflowBuilder({id, name}).trigger(new XxxTrigger(...))` returns a `ChainCursor` whose only chain method is `.then(new NodeConfig(...))`. Do NOT call `.trigger(...)` on the `workflow(...)` builder — it doesn't exist there.\n\n**Node ids and stability.** When no explicit `id:` is given, the engine slugifies the node's `name` label (lowercase, non-alphanumeric → `-`). `\"Send Email\"` → `\"send-email\"`. Nodes sharing credential bindings use `(workflowId, nodeId, slotKey)` as the binding key — renaming a label orphans the binding. **Set explicit `id:` on every credential-using node.** `.build()` throws `WorkflowDefinitionError` on empty or duplicate ids.\n\n**Id collision pitfall.** A manual-trigger label and a downstream agent label that share the same string both slugify to the same id — `.build()` throws. Fix: add `id: \"...-agent\"` to disambiguate.\n\n**Collection nodes** use `.then(node.create(...))` instead of `.node(label, node, opts)` — TypeScript can't infer the `ParamDeep` constraint via the fluent helper. See `find_examples({ query: \"collection crud\" })`.\n\n**Install state in example results.** Every `find_examples` result includes `installed: boolean` and `requiresInstall: string[]`. If `installed` is `false` or `requiresInstall` is non-empty, call `install_package` for each missing package before writing any workflow code that imports them.\n\n**When no example matches — self-solving fallback chain.**\n\n1. Retry with intent variations (different verb, more generic term).\n2. For HTTP APIs: `find_examples({ query: \"defineRestNode\" })` — covers basic and credential-slotted REST.\n3. For one-shot inline HTTP: `find_examples({ query: \"HttpRequest\" })`.\n4. For non-HTTP custom logic: `find_examples({ query: \"defineNode template\" })`.\n   Do NOT ask the user to pick between primitives — they can't help; use the chain. Do NOT grep `node_modules/@codemation/*` for node implementations — examples are authoritative. Surface the technique used in your reply.\n\n**Workflow testing.** Three built-in nodes from `@codemation/core-nodes`: `TestTrigger` (yields one item per test case), `IsTestRun` (routes `true`/`false` by `ctx.testContext`), `Assertion` (emits `AssertionResult[]`, sets `emitsAssertions: true`). See `references/workflow-testing.md` for authoring details.\n\n**SubWorkflow binary.** `item.binary` slots pass transparently through SubWorkflow boundaries in both directions — no special config needed. Both runs share the same `BinaryStorage` singleton.\n\n**Verify your workflow.** Call `verify_workflow({ path: \"src/workflows/my-workflow.ts\" })` instead of running `pnpm typecheck` yourself. Returns `{ ok, data: { typecheck, lint, build, structure }, hint? }`.\n\n## Anti-patterns\n\n- Do not call `.trigger(...)` on the `workflow(...)` manual builder — use `createWorkflowBuilder(...)` for non-manual triggers.\n- Do not rely on slug-derived node ids for production workflows with credential bindings — always set an explicit `id:`.\n- Do not improvise from memory when `find_examples` returns zero hits — use the fallback chain above.\n\n## Read next when needed\n\n- `references/builder-patterns.md` — item-flow rules and fluent authoring patterns.\n- `references/workflow-testing.md` — TestTrigger / IsTestRun / Assertion with full examples.\n- `references/complete-example.md` — dense end-to-end example covering most authoring features.\n"
+    },
+    {
+      "name": "codemation-workspace-files",
+      "description": "ListWorkspaceFiles + ReadWorkspaceFile nodes — read files from the shared workspace pool. Covers read-by-filename (latest-wins), pinned fileId, binary slot handoff, and the raw-upload → concierge-digests → workflow-reads-derived-file pattern. Read before building any workflow that reads workspace files.",
+      "tags": [
+        "workspace",
+        "files",
+        "binary",
+        "storage",
+        "read",
+        "csv",
+        "json"
+      ],
+      "sourcePath": "skills/codemation-workspace-files/SKILL.md",
+      "dependencies": {
+        "@codemation/core-nodes-workspace-files": "0.2.0"
+      },
+      "code": "---\nname: codemation-workspace-files\ndescription: ListWorkspaceFiles + ReadWorkspaceFile nodes — read files from the shared workspace pool. Covers read-by-filename (latest-wins), pinned fileId, binary slot handoff, and the raw-upload → concierge-digests → workflow-reads-derived-file pattern. Read before building any workflow that reads workspace files.\ncompatibility: Codemation core-nodes-workspace-files. Requires WORKSPACE_ID and BLOB_STORAGE_* env vars.\ntags: workspace, files, binary, storage, read, csv, json\nuses: \"@codemation/core-nodes-workspace-files\"\n---\n\n# Codemation Workspace Files\n\n## Mental model\n\nWorkflows **read** the shared workspace file pool; they do **not** write to it. Files are\ncreated and managed on the control-plane side (the Files UI, the concierge, the\nDocumentScanner). The framework's role is to provide `ListWorkspaceFiles` and\n`ReadWorkspaceFile` as pure read nodes.\n\nThe **headline scenario** is: a user uploads a raw PDF; the concierge digests it into a\nstructured JSON; the workflow reads the _derived JSON_, not the raw bytes. Workflows\nnever touch raw uploads directly.\n\n## When to use / when NOT\n\nUse `ReadWorkspaceFile` when a workflow needs data that lives in the workspace pool\n(pricing sheets, config JSON, concierge-derived documents, CSV exports).\n\nUse `ListWorkspaceFiles` to discover what files exist or to drive a fan-out (one item per file).\n\nDo NOT use these nodes to write files — writing is CP-mediated and deferred to v2.\n\nDo NOT base64-encode bytes onto `item.json`. Binary payloads always flow through\n`item.binary` via `ctx.binary`.\n\n## Quickstart\n\n```ts\nimport { readWorkspaceFileNode } from \"@codemation/core-nodes-workspace-files\";\n\n// Read the latest \"pricing.csv\" by name — picks up the newest upload automatically.\nreadWorkspaceFileNode.create({ filename: \"pricing.csv\", binarySlot: \"data\" }, \"Read pricing CSV\", \"read-pricing-csv\");\n```\n\n```ts\n// Pin to an exact version — a later upload never changes what this reads.\nreadWorkspaceFileNode.create(\n  { fileId: \"abc123def456\", binarySlot: \"data\" },\n  \"Read pinned pricing CSV\",\n  \"read-pricing-pinned\",\n);\n```\n\nFor full patterns (parse the bytes, scenario walkthrough, list + filter), use your\nharness's example-discovery tool: `find_examples({ query: \"workspace files\" })`.\n\n## Resolution modes\n\n| Mode                      | Config                    | Behaviour                                                                                          |\n| ------------------------- | ------------------------- | -------------------------------------------------------------------------------------------------- |\n| **latest-wins** (default) | `filename: \"pricing.csv\"` | Reads the **newest** file with that name. Next upload of the same name is what the next run reads. |\n| **pinned fileId**         | `fileId: \"abc123...\"`     | Reads that exact, immutable version forever. A new upload never changes this ref.                  |\n\nUse **latest-wins** for \"always use the current sheet\" patterns.\nUse **pinned fileId** for reproducible/auditable runs (e.g., regression tests, compliance audits).\n\n## Binary slot handoff\n\n`ReadWorkspaceFile` streams the file's bytes into `item.binary[binarySlot]` (default `\"data\"`).\nThe node emits:\n\n```ts\n{\n  fileId: string;\n  filename: string;\n  contentType: string;\n  size: number; // bytes\n  lastModified: string; // ISO 8601\n  binarySlot: string; // e.g. \"data\"\n}\n```\n\nDownstream nodes read the bytes via `ctx.binary.openReadStream(item.binary[\"data\"])`.\nThe bytes are **never** base64-encoded on `item.json`.\n\n## Concierge → digest → workflow pattern\n\nThis is the intended headline flow:\n\n```\nUser uploads PDF  →  CP Files UI stores it in the workspace pool\nConcierge sees upload  →  DocumentScanner digests it  →  writes \"report-digested.json\" back\nWorkflow runs (schedule/webhook)  →  ReadWorkspaceFile(\"report-digested.json\")  →  acts\n```\n\nThe workflow is **decoupled** from the upload event. It reads the _derived_ file that the\nconcierge produced, not the raw upload. The concierge's job is to bridge the raw-upload world\nand the structured-data world.\n\nKey boundaries:\n\n- **CP side (write)**: raw file ingest, concierge digest, derived file write, Files UI.\n- **Workflow side (read)**: `ReadWorkspaceFile` + `ListWorkspaceFiles` only.\n\n## Anti-patterns\n\n- Do NOT tell users to read the raw PDF upload in a workflow — point at the concierge-derived JSON.\n- Do NOT base64-encode file bytes onto `item.json` — use `item.binary[slot]` + `ctx.binary`.\n- Do NOT attempt to write a file from a workflow node — there is no write surface in v1.\n- Do NOT assume `WORKSPACE_ID` is always set — in local dev without CP integration, the storage\n  token resolves to `undefined`. Add a guard if your workflow runs in dev mode.\n\n## Node reference\n\n### `listWorkspaceFilesNode`\n\n```ts\nlistWorkspaceFilesNode.create(\n  {\n    filenameFilter?: string; // optional substring match (case-insensitive)\n  },\n  \"List files\",\n  \"list-files\",\n)\n```\n\nOutput per item: `{ fileId, filename, contentType, size, lastModified }`. Sorted newest-first.\n\n### `readWorkspaceFileNode`\n\n```ts\nreadWorkspaceFileNode.create(\n  {\n    filename?: string;    // latest-wins resolution\n    fileId?: string;      // pinned resolution (takes precedence over filename)\n    binarySlot?: string;  // default: \"data\"\n    maxBytes?: number;    // default: 100 MiB — raise for large files\n  },\n  \"Read file\",\n  \"read-file\",\n)\n```\n\nEither `filename` or `fileId` must be set. Output: metadata JSON + bytes in `item.binary[binarySlot]`.\n"
     }
   ]
 }

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@codemation/agent-skills",
-  "version": "0.3.0",
+  "version": "0.4.0",
   "description": "Reusable agent skills for Codemation projects and plugin development.",
   "publishConfig": {
     "access": "public"

package/skills/codemation-document-scanner/SKILL.md ADDED Viewed

@@ -0,0 +1,136 @@
+---
+name: codemation-document-scanner
+description: CodemationDocumentScanner node — managed document/invoice/image extraction via the Codemation doc-scanner service. No Azure credentials required. Read before writing any workflow that scans documents, invoices, or images.
+compatibility: Codemation core-nodes. Requires @codemation/core-nodes import.
+tags: ocr, document, invoice, image, scan, extract, managed, confidence
+uses: "@codemation/core-nodes"
+---
+# Codemation Document Scanner Node
+> **Start here: call `find_examples` before reading further.**
+>
+> - `find_examples({ query: "CodemationDocumentScanner" })` — node-level usage and all analyzerType variants
+> - `find_examples({ query: "invoice scan post accounting" })` — end-to-end invoice extraction scenario
+> - `find_examples({ query: "document scanner confidence fields" })` — how to enable per-field confidence scores
+## Use this skill when
+Writing a workflow that extracts text and/or structured fields from documents, invoices, or images
+using the Codemation managed scanning service. No Azure credentials are required — the service is
+pre-wired to the platform.
+Use `codemation-workflow-dsl` for surrounding workflow structure.
+Use `codemation-ai-agent-node` if you need to pass the extracted markdown to an LLM for further processing.
+## When to use CodemationDocumentScanner vs the standalone OCR nodes
+| Situation                                        | Use                                                                                                 |
+| ------------------------------------------------ | --------------------------------------------------------------------------------------------------- |
+| Managed platform deployment, no Azure credential | `codemationDocumentScannerNode` (this skill)                                                        |
+| Self-hosted / BYOK Azure Content Understanding   | `analyzeDocumentNode` / `analyzeInvoiceNode` / `analyzeImageNode` from `@codemation/core-nodes-ocr` |
+`codemationDocumentScannerNode` calls the internal `doc-scanner` service via HMAC — the workspace holds
+no Azure key. The standalone OCR nodes call Azure directly using a per-workspace credential.
+## Choosing `analyzerType`
+| `analyzerType` | When to use                                     | Azure analyzer                                                          | Field extraction       | Confidence opt-in supported             |
+| -------------- | ----------------------------------------------- | ----------------------------------------------------------------------- | ---------------------- | --------------------------------------- |
+| `"document"`   | General PDFs, Word docs, HTML, text-heavy files | `prebuilt-document`                                                     | Yes                    | Yes                                     |
+| `"invoice"`    | Invoices, receipts — always prebuilt-invoice    | `prebuilt-invoice`                                                      | Yes                    | Yes                                     |
+| `"image"`      | Photos, screenshots, diagrams                   | `prebuilt-imageAnalyzer`                                                | No (markdown only)     | No — image carries no extraction charge |
+| `"auto"`       | Unknown mime type at author time                | Routes on `Content-Type`: `image/*` → image, everything else → document | Depends on routed type | Depends on routed type                  |
+**Default is `"auto"`.** Set an explicit type whenever you know the content class — it avoids
+unnecessary re-routing and makes the workflow self-documenting.
+## Output shape
+```ts
+{
+  markdown: string; // full text content
+  fields: Record<
+    string,
+    {
+      value: unknown; // extracted scalar, date ISO string, nested object, or array
+      confidence: number | null; // 0–1 when includeConfidence:true; null otherwise
+    }
+  >;
+}
+```
+`item.json.markdown` is the Markdown rendering of the document.
+`item.json.fields` is a flat-or-nested map of structured fields found by the analyzer.
+For `analyzerType: "image"`, `fields` is always `{}`.
+Fields may be sparse or absent for generic documents — extraction is best-effort.
+## WARNING: How to enable per-field confidence scores (LD6)
+**By default, `confidence` is `null` on every field.** This keeps token cost low for the majority
+of workflows that only need `value`.
+To get a populated `confidence` (0–1 float) on each field, set `includeConfidence: true`:
+```ts
+codemationDocumentScannerNode.create(
+  {
+    binaryField: "data",
+    analyzerType: "invoice",
+    includeConfidence: true, // ← opt-in: fields carry confidence 0–1
+  },
+  "Scan invoice",
+  "scan-invoice",
+);
+```
+**Cost implication:** enabling confidence routes the request to a confidence-enabled analyzer variant,
+which roughly doubles the contextualization token count for document/invoice analyzers.
+Only enable it when your downstream logic actually reads `field.confidence`.
+Images (`analyzerType: "image"`) and auto-routed-to-image requests carry no field-extraction charge
+regardless of this flag — they silently ignore `includeConfidence` (confidence stays `null`, never a 400).
+## API usage
+```ts
+import { codemationDocumentScannerNode } from "@codemation/core-nodes";
+codemationDocumentScannerNode.create(
+  {
+    binaryField?: string;         // key on item.binary — default "data"
+    analyzerType?: "document" | "invoice" | "image" | "auto";  // default "auto"
+    contentType?: string;         // MIME type override — falls back to attachment.mimeType
+    includeConfidence?: boolean;  // default false — see cost note above
+    maxBytes?: number;            // size cap before reading; default 50 MiB (LD10)
+  },
+  label?: string,   // node label on the canvas
+  nodeId?: string,  // explicit stable id — set when output is used downstream
+)
+```
+Set an explicit `nodeId` whenever downstream nodes reference this node's output by id, or when
+the node may be renamed later (avoids credential-binding orphaning).
+## Consuming fields downstream
+```ts
+import type { DocScannerOutput, DocScannerField } from "@codemation/core-nodes";
+// item.json after codemationDocumentScannerNode:
+// DocScannerOutput = { markdown: string; fields: Record<string, DocScannerField> }
+// DocScannerField  = { value: unknown; confidence: number | null }
+new Callback<DocScannerOutput>("Use fields", (items, _ctx) =>
+  items.map((item) => {
+    const vendorName = item.json.fields["VendorName"]?.value as string | undefined;
+    const vendorConf = item.json.fields["VendorName"]?.confidence; // null unless includeConfidence:true
+    return { ...item, json: { ...item.json, vendorName, vendorConf } };
+  }),
+);
+```
+## Read next when needed
+- `codemation-workflow-dsl` — workflow builder, trigger types, fluent vs low-level API.
+- `codemation-ai-agent-node` — pass `item.json.markdown` to an LLM for summarisation or extraction.

package/skills/codemation-workspace-files/SKILL.md ADDED Viewed

@@ -0,0 +1,142 @@
+---
+name: codemation-workspace-files
+description: ListWorkspaceFiles + ReadWorkspaceFile nodes — read files from the shared workspace pool. Covers read-by-filename (latest-wins), pinned fileId, binary slot handoff, and the raw-upload → concierge-digests → workflow-reads-derived-file pattern. Read before building any workflow that reads workspace files.
+compatibility: Codemation core-nodes-workspace-files. Requires WORKSPACE_ID and BLOB_STORAGE_* env vars.
+tags: workspace, files, binary, storage, read, csv, json
+uses: "@codemation/core-nodes-workspace-files"
+---
+# Codemation Workspace Files
+## Mental model
+Workflows **read** the shared workspace file pool; they do **not** write to it. Files are
+created and managed on the control-plane side (the Files UI, the concierge, the
+DocumentScanner). The framework's role is to provide `ListWorkspaceFiles` and
+`ReadWorkspaceFile` as pure read nodes.
+The **headline scenario** is: a user uploads a raw PDF; the concierge digests it into a
+structured JSON; the workflow reads the _derived JSON_, not the raw bytes. Workflows
+never touch raw uploads directly.
+## When to use / when NOT
+Use `ReadWorkspaceFile` when a workflow needs data that lives in the workspace pool
+(pricing sheets, config JSON, concierge-derived documents, CSV exports).
+Use `ListWorkspaceFiles` to discover what files exist or to drive a fan-out (one item per file).
+Do NOT use these nodes to write files — writing is CP-mediated and deferred to v2.
+Do NOT base64-encode bytes onto `item.json`. Binary payloads always flow through
+`item.binary` via `ctx.binary`.
+## Quickstart
+```ts
+import { readWorkspaceFileNode } from "@codemation/core-nodes-workspace-files";
+// Read the latest "pricing.csv" by name — picks up the newest upload automatically.
+readWorkspaceFileNode.create({ filename: "pricing.csv", binarySlot: "data" }, "Read pricing CSV", "read-pricing-csv");
+```
+```ts
+// Pin to an exact version — a later upload never changes what this reads.
+readWorkspaceFileNode.create(
+  { fileId: "abc123def456", binarySlot: "data" },
+  "Read pinned pricing CSV",
+  "read-pricing-pinned",
+);
+```
+For full patterns (parse the bytes, scenario walkthrough, list + filter), use your
+harness's example-discovery tool: `find_examples({ query: "workspace files" })`.
+## Resolution modes
+| Mode                      | Config                    | Behaviour                                                                                          |
+| ------------------------- | ------------------------- | -------------------------------------------------------------------------------------------------- |
+| **latest-wins** (default) | `filename: "pricing.csv"` | Reads the **newest** file with that name. Next upload of the same name is what the next run reads. |
+| **pinned fileId**         | `fileId: "abc123..."`     | Reads that exact, immutable version forever. A new upload never changes this ref.                  |
+Use **latest-wins** for "always use the current sheet" patterns.
+Use **pinned fileId** for reproducible/auditable runs (e.g., regression tests, compliance audits).
+## Binary slot handoff
+`ReadWorkspaceFile` streams the file's bytes into `item.binary[binarySlot]` (default `"data"`).
+The node emits:
+```ts
+{
+  fileId: string;
+  filename: string;
+  contentType: string;
+  size: number; // bytes
+  lastModified: string; // ISO 8601
+  binarySlot: string; // e.g. "data"
+}
+```
+Downstream nodes read the bytes via `ctx.binary.openReadStream(item.binary["data"])`.
+The bytes are **never** base64-encoded on `item.json`.
+## Concierge → digest → workflow pattern
+This is the intended headline flow:
+```
+User uploads PDF  →  CP Files UI stores it in the workspace pool
+Concierge sees upload  →  DocumentScanner digests it  →  writes "report-digested.json" back
+Workflow runs (schedule/webhook)  →  ReadWorkspaceFile("report-digested.json")  →  acts
+```
+The workflow is **decoupled** from the upload event. It reads the _derived_ file that the
+concierge produced, not the raw upload. The concierge's job is to bridge the raw-upload world
+and the structured-data world.
+Key boundaries:
+- **CP side (write)**: raw file ingest, concierge digest, derived file write, Files UI.
+- **Workflow side (read)**: `ReadWorkspaceFile` + `ListWorkspaceFiles` only.
+## Anti-patterns
+- Do NOT tell users to read the raw PDF upload in a workflow — point at the concierge-derived JSON.
+- Do NOT base64-encode file bytes onto `item.json` — use `item.binary[slot]` + `ctx.binary`.
+- Do NOT attempt to write a file from a workflow node — there is no write surface in v1.
+- Do NOT assume `WORKSPACE_ID` is always set — in local dev without CP integration, the storage
+  token resolves to `undefined`. Add a guard if your workflow runs in dev mode.
+## Node reference
+### `listWorkspaceFilesNode`
+```ts
+listWorkspaceFilesNode.create(
+  {
+    filenameFilter?: string; // optional substring match (case-insensitive)
+  },
+  "List files",
+  "list-files",
+)
+```
+Output per item: `{ fileId, filename, contentType, size, lastModified }`. Sorted newest-first.
+### `readWorkspaceFileNode`
+```ts
+readWorkspaceFileNode.create(
+  {
+    filename?: string;    // latest-wins resolution
+    fileId?: string;      // pinned resolution (takes precedence over filename)
+    binarySlot?: string;  // default: "data"
+    maxBytes?: number;    // default: 100 MiB — raise for large files
+  },
+  "Read file",
+  "read-file",
+)
+```
+Either `filename` or `fileId` must be set. Output: metadata JSON + bytes in `item.binary[binarySlot]`.