@elevasis/sdk 1.24.0 → 1.26.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (66) hide show
  1. package/dist/cli.cjs +875 -834
  2. package/dist/index.d.ts +4857 -4547
  3. package/dist/index.js +564 -2338
  4. package/dist/node/index.d.ts +693 -1356
  5. package/dist/node/index.js +1 -1
  6. package/dist/test-utils/index.d.ts +4186 -4139
  7. package/dist/test-utils/index.js +694 -2769
  8. package/dist/types/worker/adapters/clickup.d.ts +22 -0
  9. package/dist/types/worker/adapters/index.d.ts +1 -0
  10. package/dist/types/worker/index.d.ts +3 -2
  11. package/dist/types/worker/platform.d.ts +2 -2
  12. package/dist/worker/index.js +427 -2803
  13. package/package.json +2 -2
  14. package/reference/_navigation.md +11 -1
  15. package/reference/_reference-manifest.json +70 -0
  16. package/reference/claude-config/rules/organization-model.md +12 -1
  17. package/reference/claude-config/rules/organization-os.md +12 -1
  18. package/reference/claude-config/skills/om/SKILL.md +13 -5
  19. package/reference/claude-config/skills/om/operations/codify-level-a.md +109 -100
  20. package/reference/claude-config/skills/om/operations/customers.md +10 -6
  21. package/reference/claude-config/skills/om/operations/features.md +7 -3
  22. package/reference/claude-config/skills/om/operations/goals.md +10 -6
  23. package/reference/claude-config/skills/om/operations/identity.md +8 -5
  24. package/reference/claude-config/skills/om/operations/labels.md +17 -1
  25. package/reference/claude-config/skills/om/operations/offerings.md +11 -7
  26. package/reference/claude-config/skills/om/operations/roles.md +11 -7
  27. package/reference/claude-config/skills/om/operations/techStack.md +10 -2
  28. package/reference/claude-config/sync-notes/2026-05-20-om-define-helpers.md +32 -0
  29. package/reference/claude-config/sync-notes/2026-05-22-access-model-and-right-panel.md +43 -0
  30. package/reference/claude-config/sync-notes/2026-05-22-lead-gen-tenant-config.md +40 -0
  31. package/reference/claude-config/sync-notes/2026-05-22-org-model-multi-file-split.md +61 -0
  32. package/reference/cli-management.mdx +539 -0
  33. package/reference/cli.mdx +579 -808
  34. package/reference/concepts.mdx +134 -146
  35. package/reference/deployment/api.mdx +296 -297
  36. package/reference/deployment/command-center.mdx +208 -209
  37. package/reference/deployment/index.mdx +194 -195
  38. package/reference/deployment/provided-features.mdx +110 -107
  39. package/reference/deployment/ui-execution.mdx +249 -250
  40. package/reference/framework/index.mdx +111 -195
  41. package/reference/framework/resource-documentation.mdx +90 -0
  42. package/reference/framework/tutorial-system.mdx +135 -135
  43. package/reference/getting-started.mdx +141 -142
  44. package/reference/index.mdx +95 -106
  45. package/reference/packages/ui/src/auth/README.md +6 -6
  46. package/reference/platform-tools/adapters-integration.mdx +300 -301
  47. package/reference/platform-tools/adapters-platform.mdx +552 -553
  48. package/reference/platform-tools/index.mdx +216 -217
  49. package/reference/platform-tools/type-safety.mdx +82 -82
  50. package/reference/resources/index.mdx +348 -349
  51. package/reference/resources/patterns.mdx +446 -449
  52. package/reference/resources/types.mdx +115 -116
  53. package/reference/roadmap.mdx +164 -165
  54. package/reference/rules/organization-model.md +14 -0
  55. package/reference/runtime.mdx +172 -173
  56. package/reference/scaffold/operations/propagation-pipeline.md +1 -1
  57. package/reference/scaffold/recipes/customize-crm-actions.md +45 -46
  58. package/reference/scaffold/recipes/extend-crm.md +253 -255
  59. package/reference/scaffold/recipes/extend-lead-gen.md +130 -77
  60. package/reference/scaffold/recipes/index.md +43 -44
  61. package/reference/scaffold/reference/contracts.md +1275 -1432
  62. package/reference/scaffold/reference/glossary.md +8 -6
  63. package/reference/scaffold/ui/feature-flags-and-gating.md +59 -46
  64. package/reference/scaffold/ui/feature-shell.mdx +11 -11
  65. package/reference/scaffold/ui/recipes.md +24 -24
  66. package/reference/troubleshooting.mdx +222 -223
@@ -1,116 +1,115 @@
1
- ---
2
- title: SDK Types
3
- description: Type reference for @elevasis/sdk package exports, resource metadata, platform types, ElevasConfig, StepHandler context, and runtime values.
4
- loadWhen: "Looking up type signatures or SDK exports"
5
- ---
6
-
7
- `@elevasis/sdk` bundles platform types into a self-contained package for external projects.
8
-
9
- ```bash
10
- pnpm add @elevasis/sdk zod
11
- ```
12
-
13
- Zod is a peer dependency.
14
-
15
- ## Package Exports
16
-
17
- ```json
18
- {
19
- "exports": {
20
- ".": { "types": "./dist/index.d.ts", "import": "./dist/index.js" },
21
- "./worker": { "import": "./dist/worker/index.js" }
22
- }
23
- }
24
- ```
25
-
26
- - `@elevasis/sdk` -- resource, workflow, agent, trigger, deployment, and execution types plus runtime errors.
27
- - `@elevasis/sdk/worker` -- worker runtime module, platform adapters, and worker helpers.
28
-
29
- ## Platform Types
30
-
31
- | Type | Description |
32
- | --- | --- |
33
- | `ResourceDefinition` | Base interface for resource definitions |
34
- | `ResourceType` | Resource kind such as `workflow`, `agent`, `trigger`, `integration`, `external`, or `human_checkpoint` |
35
- | `ResourceStatus` | Resource lifecycle status such as `dev` or `prod` |
36
- | `ResourceLink` | Graph link `{ nodeId, kind }` binding a resource to an Organization Model node |
37
- | `ResourceCategory` | Operational category: `production`, `diagnostic`, `internal`, or `testing` |
38
- | `DeploymentSpec` | Top-level export shape: `{ workflows, agents, triggers, integrations, humanCheckpoints, externalResources, relationships }` |
39
- | `TriggerDefinition` | Base for trigger configuration |
40
- | `IntegrationDefinition` | Integration configuration structure |
41
- | `WebhookProviderType` | Supported webhook providers |
42
- | `WebhookTriggerConfig` | Webhook trigger configuration |
43
- | `ScheduleTriggerConfig` | Cron/schedule trigger configuration |
44
- | `EventTriggerConfig` | Internal event trigger configuration |
45
- | `TriggerConfig` | Union of all trigger config types |
46
-
47
- Resource metadata uses graph links:
48
-
49
- ```ts
50
- config: {
51
- resourceId: 'lead-import',
52
- name: 'Lead Import',
53
- type: 'workflow',
54
- version: '1.0.0',
55
- status: 'prod',
56
- links: [{ nodeId: 'system:sales.lead-gen', kind: 'operates-on' }],
57
- category: 'production'
58
- }
59
- ```
60
-
61
- ## Execution Types
62
-
63
- | Type | Description |
64
- | --- | --- |
65
- | `WorkflowDefinition` | Complete workflow definition including config, contract, steps, and entryPoint |
66
- | `WorkflowStep` | Individual step definition with type, handler, and next routing |
67
- | `WorkflowConfig` | Metadata block: name, description, status, links, category |
68
- | `StepHandler` | Function type: `(input: unknown, context: StepContext) => Promise<unknown>` |
69
- | `NextConfig` | Union of `LinearNext` and `ConditionalNext` |
70
- | `LinearNext` | Fixed next step routing |
71
- | `ConditionalNext` | Branching step routing |
72
- | `StepType` | Runtime enum for step routing |
73
- | `AgentDefinition` | Complete agent definition including config, agentConfig, and tools |
74
- | `ExecutionContext` | Runtime context passed to step handlers |
75
- | `ExecutionMetadata` | Metadata about a running execution |
76
- | `ExecutionInterface` | Interface for triggering and inspecting executions |
77
-
78
- ## ElevasConfig
79
-
80
- ```ts
81
- export interface ElevasConfig {
82
- defaultStatus?: ResourceStatus
83
- dev?: { port?: number }
84
- }
85
- ```
86
-
87
- | Field | Type | Default | Description |
88
- | --- | --- | --- | --- |
89
- | `defaultStatus` | `'dev' | 'prod'` | `'prod'` | Default status applied when resources do not set `config.status` |
90
- | `dev.port` | `number` | `3001` | Local worker development port |
91
-
92
- ## StepHandler Context
93
-
94
- ```ts
95
- import type { StepHandler, ExecutionContext } from '@elevasis/sdk'
96
-
97
- const handler: StepHandler = async (input, context: ExecutionContext) => {
98
- context.logger.info('Processing', {
99
- executionId: context.executionId,
100
- resourceId: context.resourceId
101
- })
102
-
103
- await context.store.set('checkpoint', JSON.stringify({ step: 'started' }))
104
- return { done: true }
105
- }
106
- ```
107
-
108
- ## Runtime Values
109
-
110
- Runtime exports include:
111
-
112
- - `StepType`
113
- - `ToolingError`
114
- - `ExecutionError`
115
-
116
- Use `ToolingError` for platform adapter failures and `ExecutionError` for execution-level failures that should be surfaced in run history.
1
+ ---
2
+ title: SDK Types
3
+ description: Type reference for @elevasis/sdk package exports, resource metadata, platform types, ElevasConfig, StepHandler context, and runtime values
4
+ ---
5
+
6
+ `@elevasis/sdk` bundles platform types into a self-contained package for external projects.
7
+
8
+ ```bash
9
+ pnpm add @elevasis/sdk zod
10
+ ```
11
+
12
+ Zod is a peer dependency.
13
+
14
+ ## Package Exports
15
+
16
+ ```json
17
+ {
18
+ "exports": {
19
+ ".": { "types": "./dist/index.d.ts", "import": "./dist/index.js" },
20
+ "./worker": { "import": "./dist/worker/index.js" }
21
+ }
22
+ }
23
+ ```
24
+
25
+ - `@elevasis/sdk` -- resource, workflow, agent, trigger, deployment, and execution types plus runtime errors.
26
+ - `@elevasis/sdk/worker` -- worker runtime module, platform adapters, and worker helpers.
27
+
28
+ ## Platform Types
29
+
30
+ | Type | Description |
31
+ | ----------------------- | --------------------------------------------------------------------------------------------------------------------------- |
32
+ | `ResourceDefinition` | Base interface for resource definitions |
33
+ | `ResourceType` | Resource kind such as `workflow`, `agent`, `trigger`, `integration`, `external`, or `human_checkpoint` |
34
+ | `ResourceStatus` | Resource lifecycle status such as `dev` or `prod` |
35
+ | `ResourceLink` | Graph link `{ nodeId, kind }` binding a resource to an Organization Model node |
36
+ | `ResourceCategory` | Operational category: `production`, `diagnostic`, `internal`, or `testing` |
37
+ | `DeploymentSpec` | Top-level export shape: `{ workflows, agents, triggers, integrations, humanCheckpoints, externalResources, relationships }` |
38
+ | `TriggerDefinition` | Base for trigger configuration |
39
+ | `IntegrationDefinition` | Integration configuration structure |
40
+ | `WebhookProviderType` | Supported webhook providers |
41
+ | `WebhookTriggerConfig` | Webhook trigger configuration |
42
+ | `ScheduleTriggerConfig` | Cron/schedule trigger configuration |
43
+ | `EventTriggerConfig` | Internal event trigger configuration |
44
+ | `TriggerConfig` | Union of all trigger config types |
45
+
46
+ Resource metadata uses graph links:
47
+
48
+ ```ts
49
+ config: {
50
+ resourceId: 'lead-import',
51
+ name: 'Lead Import',
52
+ type: 'workflow',
53
+ version: '1.0.0',
54
+ status: 'prod',
55
+ links: [{ nodeId: 'system:sales.lead-gen', kind: 'operates-on' }],
56
+ category: 'production'
57
+ }
58
+ ```
59
+
60
+ ## Execution Types
61
+
62
+ | Type | Description |
63
+ | -------------------- | ------------------------------------------------------------------------------ |
64
+ | `WorkflowDefinition` | Complete workflow definition including config, contract, steps, and entryPoint |
65
+ | `WorkflowStep` | Individual step definition with type, handler, and next routing |
66
+ | `WorkflowConfig` | Metadata block: name, description, status, links, category |
67
+ | `StepHandler` | Function type: `(input: unknown, context: StepContext) => Promise<unknown>` |
68
+ | `NextConfig` | Union of `LinearNext` and `ConditionalNext` |
69
+ | `LinearNext` | Fixed next step routing |
70
+ | `ConditionalNext` | Branching step routing |
71
+ | `StepType` | Runtime enum for step routing |
72
+ | `AgentDefinition` | Complete agent definition including config, agentConfig, and tools |
73
+ | `ExecutionContext` | Runtime context passed to step handlers |
74
+ | `ExecutionMetadata` | Metadata about a running execution |
75
+ | `ExecutionInterface` | Interface for triggering and inspecting executions |
76
+
77
+ ## ElevasConfig
78
+
79
+ ```ts
80
+ export interface ElevasConfig {
81
+ defaultStatus?: ResourceStatus
82
+ dev?: { port?: number }
83
+ }
84
+ ```
85
+
86
+ | Field | Type | Default | Description |
87
+ | --------------- | -------- | ------- | ----------------------------- | ---------------------------------------------------------------- |
88
+ | `defaultStatus` | `'dev' | 'prod'` | `'prod'` | Default status applied when resources do not set `config.status` |
89
+ | `dev.port` | `number` | `3001` | Local worker development port |
90
+
91
+ ## StepHandler Context
92
+
93
+ ```ts
94
+ import type { StepHandler, ExecutionContext } from '@elevasis/sdk'
95
+
96
+ const handler: StepHandler = async (input, context: ExecutionContext) => {
97
+ context.logger.info('Processing', {
98
+ executionId: context.executionId,
99
+ resourceId: context.resourceId
100
+ })
101
+
102
+ await context.store.set('checkpoint', JSON.stringify({ step: 'started' }))
103
+ return { done: true }
104
+ }
105
+ ```
106
+
107
+ ## Runtime Values
108
+
109
+ Runtime exports include:
110
+
111
+ - `StepType`
112
+ - `ToolingError`
113
+ - `ExecutionError`
114
+
115
+ Use `ToolingError` for platform adapter failures and `ExecutionError` for execution-level failures that should be surfaced in run history.
@@ -1,165 +1,164 @@
1
- ---
2
- title: Roadmap
3
- description: Planned SDK features -- error taxonomy, retry semantics, circuit breaker, metrics, alerting, and resource lifecycle extensions
4
- loadWhen: "Asking about future features or planned capabilities"
5
- ---
6
-
7
- **Status:** Mixed -- some features below are implemented, others remain planned. Each section notes its current status.
8
-
9
- For currently implemented behavior, see [Runtime](runtime.mdx).
10
-
11
- ---
12
-
13
- ## Structured Error Taxonomy
14
-
15
- **Status: Partially implemented.** The runtime has a structured error hierarchy (`ExecutionError`, `PlatformToolError`, `ToolingError`) with error codes and context fields. The taxonomy below describes a planned _redesign_ that is not yet implemented.
16
-
17
- The current runtime reports errors as plain strings. A future SDK version will introduce a structured error taxonomy. All SDK errors will extend `ResourceError`, the base class for errors surfaced through the execution protocol.
18
-
19
- Every error carries: `message` (string), `code` (string enum), `details` (optional structured data), and `retryable` (boolean).
20
-
21
- **Error types:**
22
-
23
- - **`ResourceError`** -- Base class for all SDK errors
24
- - **`ValidationError`** -- Input or output schema validation failed. Thrown automatically when Zod `.parse()` fails. Code: `VALIDATION_ERROR`. Not retryable.
25
- - **`StepError`** -- A workflow step handler threw. Includes `stepId` and `stepName`. Code: `STEP_ERROR`. Retryable (transient failures may succeed on retry).
26
- - **`ToolError`** -- A tool execution failed. Includes `toolName`. Code: `TOOL_ERROR`. Retryability depends on the underlying error.
27
- - **`TimeoutError`** -- Execution exceeded the deadline. Code: `TIMEOUT`. Retryable.
28
- - **`CancellationError`** -- Execution was cancelled by the platform or user. Code: `CANCELLED`. Not retryable.
29
-
30
- ---
31
-
32
- ## Retry Semantics
33
-
34
- **Status: Planned.**
35
-
36
- Retries are platform-side only -- workers are ephemeral and never retry internally.
37
-
38
- - **Configuration:** Per-resource via `maxRetries` (default: 0) and `backoffStrategy` (exponential with jitter)
39
- - **Retryable conditions:** Worker crash or timeout (worker terminated by `AbortSignal`)
40
- - **Non-retryable conditions:** Worker reports `status: 'failed'` (handler ran and returned an error -- application logic, not infrastructure failure), user cancellation
41
- - **Idempotency:** On retry, the same `executionId` is reused. Design handlers to be idempotent where possible.
42
-
43
- ---
44
-
45
- ## Workflow Step Failure
46
-
47
- **Status: Planned.** The current runtime uses fail-fast behavior: when a step handler throws, the worker logs a `step-failed` context entry and re-throws the original error unchanged. The `onError` callback, `completedSteps`, and `partialOutput` features described below are not yet implemented.
48
-
49
- Default behavior is fail-fast: when a step throws, the workflow fails immediately.
50
-
51
- - **Error handler:** Optional `onError` callback per step. The callback receives the error and can return a recovery value or rethrow to propagate the failure.
52
- - **Partial output:** Steps completed before the failure are included in the error response. The platform receives: `failedStepId`, `completedSteps[]` (IDs of successfully completed steps), and `partialOutput` (the last successful step's output).
53
-
54
- **Proposed step error response format** (not yet implemented -- subject to change):
55
-
56
- ```json
57
- {
58
- "status": "failed",
59
- "error": {
60
- "code": "STEP_ERROR",
61
- "message": "Email delivery failed: invalid address",
62
- "stepId": "send-welcome",
63
- "stepName": "Send Welcome Email"
64
- },
65
- "completedSteps": ["validate"],
66
- "partialOutput": { "clientName": "Jane", "isValid": true }
67
- }
68
- ```
69
-
70
- ---
71
-
72
- ## Agent Failure Modes
73
-
74
- **Status: Planned.** Agent execution runs in ephemeral worker threads with full tool calling support via `PostMessageLLMAdapter`. The current runtime uses fail-fast behavior for all agent error paths; the richer failure handling described below is not yet implemented.
75
-
76
- **Current behavior:** Any unhandled error from the agent (including `AgentMaxIterationsError` thrown by `@repo/core` when the iteration limit is reached) propagates out of the worker and is reported as a failed execution. The worker sends: `{ type: 'result', status: 'failed', error: 'ErrorName: message', logs, metrics: { durationMs } }`. There is no graceful termination, partial output, or retry logic in the worker itself.
77
-
78
- **Planned improvements:**
79
-
80
- - **Max iterations reached:** Instead of throwing, the agent returns the best output produced so far, plus a warning flag (`maxIterationsReached: true`). This becomes a graceful termination rather than a failure.
81
- - **Tool crash:** Tool errors are caught by the SDK runtime, formatted as a tool result, and sent back to the LLM. The LLM decides whether to retry, try a different approach, or give up.
82
- - **Model refusal:** If the model refuses the prompt, the SDK retries once with an adjusted system prompt. If the retry also refuses, the agent fails with `code: 'MODEL_REFUSAL'`.
83
- - **Model API error:** Network errors, rate limits, or server errors from the model provider. The SDK retries with exponential backoff (3 attempts, 1s/2s/4s), then fails with `code: 'MODEL_ERROR'`.
84
- - **Agent error response includes:** `iterationCount` (LLM iterations completed), `toolCallHistory` (array of tool calls made), `lastModelResponse` (final response from the LLM before failure).
85
-
86
- ---
87
-
88
- ## Circuit Breaker
89
-
90
- **Status: Planned.**
91
-
92
- The platform will implement a circuit breaker to prevent runaway failures:
93
-
94
- - **Trip condition:** 5 consecutive failures on the same resource within a 10-minute window
95
- - **Action:** Pause executions for 60 seconds. New execution requests for that resource return `503` with: "Resource temporarily unavailable (circuit breaker tripped)"
96
- - **Auto-recovery:** After the 60-second pause, the next execution attempt is allowed through. If it succeeds, the circuit breaker resets. If it fails, the pause extends (120s, then 240s, capped at 5 minutes).
97
- - **Alerting:** You are notified via webhook callback or email when the circuit breaker trips. Configurable per organization.
98
-
99
- ---
100
-
101
- ## Metrics
102
-
103
- **Status: Planned.**
104
-
105
- ### Auto-Collected
106
-
107
- The SDK runtime and platform will automatically collect these metrics for every execution:
108
-
109
- - `execution_duration_ms` -- Total wall-clock time from request received to result sent
110
- - `step_duration_ms` -- Per-step timing for workflows (array of `{ stepId, durationMs }`)
111
- - `iteration_count` -- Number of LLM loop iterations for agents
112
- - `ai_token_usage` -- Token counts per model call: `{ prompt_tokens, completion_tokens, total_tokens }`
113
- - `ai_cost_usd` -- Calculated from model pricing multiplied by token usage
114
- - `tool_call_count` -- Total number of tool invocations during the execution
115
- - `tool_call_duration_ms` -- Per-tool timing (array of `{ toolName, durationMs }`)
116
- - `error_count` -- Number of errors encountered (including recovered errors)
117
-
118
- ### Cost Attribution
119
-
120
- Metrics are aggregated at multiple levels:
121
-
122
- - **Per-execution:** Total AI spend and total duration
123
- - **Per-resource:** Aggregated over configurable time periods (daily, weekly, monthly)
124
- - **Per-organization:** Total platform cost (execution time + AI spend + managed hosting compute)
125
- - **Visibility:** Platform dashboard and the CLI via `elevasis-sdk executions <resourceId>`
126
-
127
- ### Developer-Defined Metrics
128
-
129
- A future SDK version will support custom metrics emitted from your handlers:
130
-
131
- - `sdk.metrics.counter('custom_name', value)` -- Increment a counter
132
- - `sdk.metrics.gauge('queue_depth', value)` -- Set a point-in-time gauge value
133
- - Custom metrics are stored alongside auto-collected metrics and queryable through the same APIs
134
-
135
- ---
136
-
137
- ## Alerting
138
-
139
- **Status: Planned.**
140
-
141
- Developer-configurable alerts for production monitoring:
142
-
143
- - **Error rate threshold:** Notify when more than X% of executions fail within Y minutes
144
- - **Latency percentile:** Notify when p95 execution duration exceeds a threshold
145
- - **Cost budget:** Notify when daily or weekly AI spend exceeds a configured limit
146
- - **Channel:** Webhook callback to a developer-provided URL (integrates with Slack, PagerDuty, and similar services via webhook)
147
-
148
- ---
149
-
150
- ## Resource Lifecycle Extensions
151
-
152
- **Status: Planned.**
153
-
154
- ### Deprecation Status
155
-
156
- Beyond the current `dev` and `prod` statuses, two additional statuses are planned:
157
-
158
- - **`deprecated`** -- Marked via the platform UI. Existing executions continue working. New executions show a warning: "This resource is deprecated." The resource still appears in the platform and can still be triggered.
159
- - **`offline`** -- Set automatically when a deployment is unregistered (for example, after a failed deploy or explicit deletion). Clears automatically on the next successful deploy. No executions are accepted while a resource is offline.
160
-
161
- Deprecation requires no automatic removal -- the developer must explicitly delete the resource to remove it.
162
-
163
- ---
164
-
165
- **Last Updated:** 2026-03-08
1
+ ---
2
+ title: Roadmap
3
+ description: Planned SDK features -- error taxonomy, retry semantics, circuit breaker, metrics, alerting, and resource lifecycle extensions
4
+ ---
5
+
6
+ **Status:** Mixed -- some features below are implemented, others remain planned. Each section notes its current status.
7
+
8
+ For currently implemented behavior, see [Runtime](runtime.mdx).
9
+
10
+ ---
11
+
12
+ ## Structured Error Taxonomy
13
+
14
+ **Status: Partially implemented.** The runtime has a structured error hierarchy (`ExecutionError`, `PlatformToolError`, `ToolingError`) with error codes and context fields. The taxonomy below describes a planned _redesign_ that is not yet implemented.
15
+
16
+ The current runtime reports errors as plain strings. A future SDK version will introduce a structured error taxonomy. All SDK errors will extend `ResourceError`, the base class for errors surfaced through the execution protocol.
17
+
18
+ Every error carries: `message` (string), `code` (string enum), `details` (optional structured data), and `retryable` (boolean).
19
+
20
+ **Error types:**
21
+
22
+ - **`ResourceError`** -- Base class for all SDK errors
23
+ - **`ValidationError`** -- Input or output schema validation failed. Thrown automatically when Zod `.parse()` fails. Code: `VALIDATION_ERROR`. Not retryable.
24
+ - **`StepError`** -- A workflow step handler threw. Includes `stepId` and `stepName`. Code: `STEP_ERROR`. Retryable (transient failures may succeed on retry).
25
+ - **`ToolError`** -- A tool execution failed. Includes `toolName`. Code: `TOOL_ERROR`. Retryability depends on the underlying error.
26
+ - **`TimeoutError`** -- Execution exceeded the deadline. Code: `TIMEOUT`. Retryable.
27
+ - **`CancellationError`** -- Execution was cancelled by the platform or user. Code: `CANCELLED`. Not retryable.
28
+
29
+ ---
30
+
31
+ ## Retry Semantics
32
+
33
+ **Status: Planned.**
34
+
35
+ Retries are platform-side only -- workers are ephemeral and never retry internally.
36
+
37
+ - **Configuration:** Per-resource via `maxRetries` (default: 0) and `backoffStrategy` (exponential with jitter)
38
+ - **Retryable conditions:** Worker crash or timeout (worker terminated by `AbortSignal`)
39
+ - **Non-retryable conditions:** Worker reports `status: 'failed'` (handler ran and returned an error -- application logic, not infrastructure failure), user cancellation
40
+ - **Idempotency:** On retry, the same `executionId` is reused. Design handlers to be idempotent where possible.
41
+
42
+ ---
43
+
44
+ ## Workflow Step Failure
45
+
46
+ **Status: Planned.** The current runtime uses fail-fast behavior: when a step handler throws, the worker logs a `step-failed` context entry and re-throws the original error unchanged. The `onError` callback, `completedSteps`, and `partialOutput` features described below are not yet implemented.
47
+
48
+ Default behavior is fail-fast: when a step throws, the workflow fails immediately.
49
+
50
+ - **Error handler:** Optional `onError` callback per step. The callback receives the error and can return a recovery value or rethrow to propagate the failure.
51
+ - **Partial output:** Steps completed before the failure are included in the error response. The platform receives: `failedStepId`, `completedSteps[]` (IDs of successfully completed steps), and `partialOutput` (the last successful step's output).
52
+
53
+ **Proposed step error response format** (not yet implemented -- subject to change):
54
+
55
+ ```json
56
+ {
57
+ "status": "failed",
58
+ "error": {
59
+ "code": "STEP_ERROR",
60
+ "message": "Email delivery failed: invalid address",
61
+ "stepId": "send-welcome",
62
+ "stepName": "Send Welcome Email"
63
+ },
64
+ "completedSteps": ["validate"],
65
+ "partialOutput": { "clientName": "Jane", "isValid": true }
66
+ }
67
+ ```
68
+
69
+ ---
70
+
71
+ ## Agent Failure Modes
72
+
73
+ **Status: Planned.** Agent execution runs in ephemeral worker threads with full tool calling support via `PostMessageLLMAdapter`. The current runtime uses fail-fast behavior for all agent error paths; the richer failure handling described below is not yet implemented.
74
+
75
+ **Current behavior:** Any unhandled error from the agent (including `AgentMaxIterationsError` thrown by `@repo/core` when the iteration limit is reached) propagates out of the worker and is reported as a failed execution. The worker sends: `{ type: 'result', status: 'failed', error: 'ErrorName: message', logs, metrics: { durationMs } }`. There is no graceful termination, partial output, or retry logic in the worker itself.
76
+
77
+ **Planned improvements:**
78
+
79
+ - **Max iterations reached:** Instead of throwing, the agent returns the best output produced so far, plus a warning flag (`maxIterationsReached: true`). This becomes a graceful termination rather than a failure.
80
+ - **Tool crash:** Tool errors are caught by the SDK runtime, formatted as a tool result, and sent back to the LLM. The LLM decides whether to retry, try a different approach, or give up.
81
+ - **Model refusal:** If the model refuses the prompt, the SDK retries once with an adjusted system prompt. If the retry also refuses, the agent fails with `code: 'MODEL_REFUSAL'`.
82
+ - **Model API error:** Network errors, rate limits, or server errors from the model provider. The SDK retries with exponential backoff (3 attempts, 1s/2s/4s), then fails with `code: 'MODEL_ERROR'`.
83
+ - **Agent error response includes:** `iterationCount` (LLM iterations completed), `toolCallHistory` (array of tool calls made), `lastModelResponse` (final response from the LLM before failure).
84
+
85
+ ---
86
+
87
+ ## Circuit Breaker
88
+
89
+ **Status: Planned.**
90
+
91
+ The platform will implement a circuit breaker to prevent runaway failures:
92
+
93
+ - **Trip condition:** 5 consecutive failures on the same resource within a 10-minute window
94
+ - **Action:** Pause executions for 60 seconds. New execution requests for that resource return `503` with: "Resource temporarily unavailable (circuit breaker tripped)"
95
+ - **Auto-recovery:** After the 60-second pause, the next execution attempt is allowed through. If it succeeds, the circuit breaker resets. If it fails, the pause extends (120s, then 240s, capped at 5 minutes).
96
+ - **Alerting:** You are notified via webhook callback or email when the circuit breaker trips. Configurable per organization.
97
+
98
+ ---
99
+
100
+ ## Metrics
101
+
102
+ **Status: Planned.**
103
+
104
+ ### Auto-Collected
105
+
106
+ The SDK runtime and platform will automatically collect these metrics for every execution:
107
+
108
+ - `execution_duration_ms` -- Total wall-clock time from request received to result sent
109
+ - `step_duration_ms` -- Per-step timing for workflows (array of `{ stepId, durationMs }`)
110
+ - `iteration_count` -- Number of LLM loop iterations for agents
111
+ - `ai_token_usage` -- Token counts per model call: `{ prompt_tokens, completion_tokens, total_tokens }`
112
+ - `ai_cost_usd` -- Calculated from model pricing multiplied by token usage
113
+ - `tool_call_count` -- Total number of tool invocations during the execution
114
+ - `tool_call_duration_ms` -- Per-tool timing (array of `{ toolName, durationMs }`)
115
+ - `error_count` -- Number of errors encountered (including recovered errors)
116
+
117
+ ### Cost Attribution
118
+
119
+ Metrics are aggregated at multiple levels:
120
+
121
+ - **Per-execution:** Total AI spend and total duration
122
+ - **Per-resource:** Aggregated over configurable time periods (daily, weekly, monthly)
123
+ - **Per-organization:** Total platform cost (execution time + AI spend + managed hosting compute)
124
+ - **Visibility:** Platform dashboard and the CLI via `elevasis-sdk executions <resourceId>`
125
+
126
+ ### Developer-Defined Metrics
127
+
128
+ A future SDK version will support custom metrics emitted from your handlers:
129
+
130
+ - `sdk.metrics.counter('custom_name', value)` -- Increment a counter
131
+ - `sdk.metrics.gauge('queue_depth', value)` -- Set a point-in-time gauge value
132
+ - Custom metrics are stored alongside auto-collected metrics and queryable through the same APIs
133
+
134
+ ---
135
+
136
+ ## Alerting
137
+
138
+ **Status: Planned.**
139
+
140
+ Developer-configurable alerts for production monitoring:
141
+
142
+ - **Error rate threshold:** Notify when more than X% of executions fail within Y minutes
143
+ - **Latency percentile:** Notify when p95 execution duration exceeds a threshold
144
+ - **Cost budget:** Notify when daily or weekly AI spend exceeds a configured limit
145
+ - **Channel:** Webhook callback to a developer-provided URL (integrates with Slack, PagerDuty, and similar services via webhook)
146
+
147
+ ---
148
+
149
+ ## Resource Lifecycle Extensions
150
+
151
+ **Status: Planned.**
152
+
153
+ ### Deprecation Status
154
+
155
+ Beyond the current `dev` and `prod` statuses, two additional statuses are planned:
156
+
157
+ - **`deprecated`** -- Marked via the platform UI. Existing executions continue working. New executions show a warning: "This resource is deprecated." The resource still appears in the platform and can still be triggered.
158
+ - **`offline`** -- Set automatically when a deployment is unregistered (for example, after a failed deploy or explicit deletion). Clears automatically on the next successful deploy. No executions are accepted while a resource is offline.
159
+
160
+ Deprecation requires no automatic removal -- the developer must explicitly delete the resource to remove it.
161
+
162
+ ---
163
+
164
+ **Last Updated:** 2026-03-08
@@ -2,6 +2,7 @@
2
2
  description: Edits to the canonical organization model go through /om
3
3
  paths:
4
4
  - core/config/organization-model.ts
5
+ - core/config/organization-model/**/*.ts
5
6
  - core/config/extensions/**/*.ts
6
7
  ---
7
8
  <!-- @generated by packages/sdk/scripts/copy-reference-docs.mjs -- DO NOT EDIT -->
@@ -18,6 +19,19 @@ New semantic authoring should start in system-colocated `ontology` scopes. Top-l
18
19
  `entities` and top-level `actions` remain compatibility mirrors while published
19
20
  consumers finish moving to compiled ontology indexes. `System.content` is retired.
20
21
 
22
+ ## File Layout
23
+
24
+ Projects keep the organization model either as a single `core/config/organization-model.ts`
25
+ file, or split into an entry file plus a sibling directory. Both layouts are valid; the entry
26
+ filename never changes, so consumer imports and the SDK loader resolve identically:
27
+
28
+ - `organization-model.ts` -- ENTRY: assembles the model and re-exports every public symbol.
29
+ - `organization-model/profile.ts` -- the `defineOrganizationModel(...)` body (identity, customers, offerings, roles, goals). Primary `/om` codify write target.
30
+ - `organization-model/systems.ts` -- systems, resources, topology, entities, actions, resource governance, feature/system-ID constants.
31
+ - `organization-model/navigation.ts` -- sidebar navigation and surface projection; imports `systems.ts` directly to stay acyclic.
32
+
33
+ `/om` routes domain edits to the right file automatically (identity/customers/offerings/roles/goals → `profile.ts`; systems/actions/resources → `systems.ts`). Unsplit projects default every edit to the entry file.
34
+
21
35
  ## Preferred Entry Point: `/om`
22
36
 
23
37
  Direct edits to `organization-model.ts` are discouraged. Instead, use `/om` (or