@bratsos/workflow-engine 0.1.0 → 0.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (37) hide show
  1. package/README.md +274 -513
  2. package/dist/{chunk-7IITBLFY.js → chunk-NYKMT46J.js} +268 -25
  3. package/dist/chunk-NYKMT46J.js.map +1 -0
  4. package/dist/chunk-SPXBCZLB.js +17 -0
  5. package/dist/chunk-SPXBCZLB.js.map +1 -0
  6. package/dist/chunk-WZ533CPU.js +1108 -0
  7. package/dist/chunk-WZ533CPU.js.map +1 -0
  8. package/dist/{client-5vz5Vv4A.d.ts → client-D4PoxADF.d.ts} +3 -143
  9. package/dist/client.d.ts +3 -2
  10. package/dist/{index-DmR3E8D7.d.ts → index-DAzCfO1R.d.ts} +20 -1
  11. package/dist/index.d.ts +234 -601
  12. package/dist/index.js +46 -2034
  13. package/dist/index.js.map +1 -1
  14. package/dist/{interface-Cv22wvLG.d.ts → interface-MMqhfQQK.d.ts} +69 -2
  15. package/dist/kernel/index.d.ts +26 -0
  16. package/dist/kernel/index.js +3 -0
  17. package/dist/kernel/index.js.map +1 -0
  18. package/dist/kernel/testing/index.d.ts +44 -0
  19. package/dist/kernel/testing/index.js +85 -0
  20. package/dist/kernel/testing/index.js.map +1 -0
  21. package/dist/persistence/index.d.ts +2 -2
  22. package/dist/persistence/index.js +2 -1
  23. package/dist/persistence/prisma/index.d.ts +2 -2
  24. package/dist/persistence/prisma/index.js +2 -1
  25. package/dist/plugins-CPC-X0rR.d.ts +421 -0
  26. package/dist/ports-tU3rzPXJ.d.ts +245 -0
  27. package/dist/stage-BPw7m9Wx.d.ts +144 -0
  28. package/dist/testing/index.d.ts +23 -1
  29. package/dist/testing/index.js +156 -13
  30. package/dist/testing/index.js.map +1 -1
  31. package/package.json +11 -1
  32. package/skills/workflow-engine/SKILL.md +234 -348
  33. package/skills/workflow-engine/references/03-runtime-setup.md +111 -426
  34. package/skills/workflow-engine/references/05-persistence-setup.md +32 -0
  35. package/skills/workflow-engine/references/07-testing-patterns.md +141 -474
  36. package/skills/workflow-engine/references/08-common-patterns.md +125 -428
  37. package/dist/chunk-7IITBLFY.js.map +0 -1
package/README.md CHANGED
@@ -13,12 +13,13 @@ A **type-safe, distributed workflow engine** for AI-orchestrated processes. Feat
13
13
  - [1. Database Setup](#1-database-setup)
14
14
  - [2. Define Your First Stage](#2-define-your-first-stage)
15
15
  - [3. Build a Workflow](#3-build-a-workflow)
16
- - [4. Create the Runtime](#4-create-the-runtime)
17
- - [5. Run a Workflow](#5-run-a-workflow)
16
+ - [4. Create the Kernel](#4-create-the-kernel)
17
+ - [5. Choose a Host](#5-choose-a-host)
18
18
  - [Core Concepts](#core-concepts)
19
19
  - [Stages](#stages)
20
20
  - [Workflows](#workflows)
21
- - [Runtime](#runtime)
21
+ - [Kernel](#kernel)
22
+ - [Hosts](#hosts)
22
23
  - [Persistence](#persistence)
23
24
  - [Common Patterns](#common-patterns)
24
25
  - [Accessing Previous Stage Output](#accessing-previous-stage-output)
@@ -39,24 +40,22 @@ A **type-safe, distributed workflow engine** for AI-orchestrated processes. Feat
39
40
  | **Type-Safe** | Full TypeScript inference from input to output across all stages |
40
41
  | **Async-First** | Native support for long-running operations (batch jobs that take hours/days) |
41
42
  | **AI-Native** | Built-in tracking of prompts, responses, tokens, and costs |
42
- | **Event-Driven** | Real-time progress updates via Server-Sent Events |
43
+ | **Event-Driven** | Transactional outbox pattern for reliable event delivery |
43
44
  | **Parallel Execution** | Run independent stages concurrently |
44
45
  | **Resume Capability** | Automatic state persistence and recovery from failures |
45
46
  | **Distributed** | Job queue with priority support and stale lock recovery |
47
+ | **Environment-Agnostic** | Pure command kernel runs on Node.js, serverless, edge, or any runtime |
46
48
 
47
49
  ---
48
50
 
49
51
  ## Requirements
50
52
 
51
- - **Node.js** >= 18.0.0
52
53
  - **TypeScript** >= 5.0.0
54
+ - **Zod** >= 4.0.0
53
55
  - **PostgreSQL** >= 14 (for Prisma persistence)
54
- - **Zod** >= 3.22.0
55
56
 
56
57
  ### Optional Peer Dependencies
57
58
 
58
- Install based on which AI providers you use:
59
-
60
59
  ```bash
61
60
  # For Google AI
62
61
  npm install @google/genai
@@ -76,19 +75,20 @@ npm install @prisma/client
76
75
  ## Installation
77
76
 
78
77
  ```bash
78
+ # Core library
79
79
  npm install @bratsos/workflow-engine zod
80
- # or
81
- pnpm add @bratsos/workflow-engine zod
82
- # or
83
- yarn add @bratsos/workflow-engine zod
80
+
81
+ # Node.js host (long-running worker processes)
82
+ npm install @bratsos/workflow-engine-host-node
83
+
84
+ # Serverless host (Cloudflare Workers, AWS Lambda, Vercel Edge, etc.)
85
+ npm install @bratsos/workflow-engine-host-serverless
84
86
  ```
85
87
 
86
88
  ---
87
89
 
88
90
  ## Getting Started
89
91
 
90
- This guide walks you through creating your first workflow from scratch.
91
-
92
92
  ### 1. Database Setup
93
93
 
94
94
  The engine requires persistence tables. Add these to your Prisma schema:
@@ -96,7 +96,6 @@ The engine requires persistence tables. Add these to your Prisma schema:
96
96
  ```prisma
97
97
  // schema.prisma
98
98
 
99
- // Unified status enum for workflows, stages, and jobs
100
99
  enum Status {
101
100
  PENDING
102
101
  RUNNING
@@ -111,20 +110,20 @@ model WorkflowRun {
111
110
  id String @id @default(cuid())
112
111
  createdAt DateTime @default(now())
113
112
  updatedAt DateTime @updatedAt
114
- workflowId String // e.g., "document-processor"
115
- workflowName String // e.g., "Document Processor"
116
- workflowType String // For grouping/filtering
113
+ workflowId String
114
+ workflowName String
115
+ workflowType String
117
116
  status Status @default(PENDING)
118
117
  startedAt DateTime?
119
118
  completedAt DateTime?
120
- duration Int? // milliseconds
119
+ duration Int?
121
120
  input Json
122
121
  output Json?
123
122
  config Json @default("{}")
124
123
  totalCost Float @default(0)
125
124
  totalTokens Int @default(0)
126
125
  priority Int @default(5)
127
- metadata Json? // Optional domain-specific data
126
+ metadata Json?
128
127
 
129
128
  stages WorkflowStage[]
130
129
  logs WorkflowLog[]
@@ -140,23 +139,23 @@ model WorkflowStage {
140
139
  updatedAt DateTime @updatedAt
141
140
  workflowRunId String
142
141
  workflowRun WorkflowRun @relation(fields: [workflowRunId], references: [id], onDelete: Cascade)
143
- stageId String // e.g., "extract-text"
144
- stageName String // e.g., "Extract Text"
145
- stageNumber Int // 1-based execution order
146
- executionGroup Int // For parallel grouping
142
+ stageId String
143
+ stageName String
144
+ stageNumber Int
145
+ executionGroup Int
147
146
  status Status @default(PENDING)
148
147
  startedAt DateTime?
149
148
  completedAt DateTime?
150
- duration Int? // milliseconds
149
+ duration Int?
151
150
  inputData Json?
152
- outputData Json? // May contain { _artifactKey: "..." }
151
+ outputData Json?
153
152
  config Json?
154
- suspendedState Json? // State for async batch stages
153
+ suspendedState Json?
155
154
  resumeData Json?
156
- nextPollAt DateTime? // When to check again
157
- pollInterval Int? // milliseconds
155
+ nextPollAt DateTime?
156
+ pollInterval Int?
158
157
  maxWaitUntil DateTime?
159
- metrics Json? // { cost, tokens, custom... }
158
+ metrics Json?
160
159
  embeddingInfo Json?
161
160
  errorMessage String?
162
161
 
@@ -174,7 +173,7 @@ model WorkflowLog {
174
173
  workflowRun WorkflowRun? @relation(fields: [workflowRunId], references: [id], onDelete: Cascade)
175
174
  workflowStageId String?
176
175
  workflowStage WorkflowStage? @relation(fields: [workflowStageId], references: [id], onDelete: Cascade)
177
- level String // DEBUG, INFO, WARN, ERROR
176
+ level String
178
177
  message String
179
178
  metadata Json?
180
179
 
@@ -187,10 +186,10 @@ model WorkflowArtifact {
187
186
  createdAt DateTime @default(now())
188
187
  workflowRunId String
189
188
  workflowRun WorkflowRun @relation(fields: [workflowRunId], references: [id], onDelete: Cascade)
190
- key String // Unique key within the run
191
- type String // STAGE_OUTPUT, ARTIFACT, METADATA
189
+ key String
190
+ type String
192
191
  data Json
193
- size Int // bytes
192
+ size Int
194
193
 
195
194
  @@unique([workflowRunId, key])
196
195
  @@index([workflowRunId])
@@ -199,10 +198,10 @@ model WorkflowArtifact {
199
198
  model AICall {
200
199
  id String @id @default(cuid())
201
200
  createdAt DateTime @default(now())
202
- topic String // Hierarchical: "workflow.{runId}.stage.{stageId}"
203
- callType String // text, object, embed, stream
204
- modelKey String // e.g., "gemini-2.5-flash"
205
- modelId String // e.g., "google/gemini-2.5-flash-preview"
201
+ topic String
202
+ callType String
203
+ modelKey String
204
+ modelId String
206
205
  prompt String @db.Text
207
206
  response String @db.Text
208
207
  inputTokens Int
@@ -231,6 +230,35 @@ model JobQueue {
231
230
  @@index([status, priority])
232
231
  @@index([nextPollAt])
233
232
  }
233
+
234
+ model OutboxEvent {
235
+ id String @id @default(cuid())
236
+ createdAt DateTime @default(now())
237
+ workflowRunId String
238
+ sequence Int
239
+ eventType String
240
+ payload Json
241
+ causationId String
242
+ occurredAt DateTime
243
+ publishedAt DateTime?
244
+ retryCount Int @default(0)
245
+ dlqAt DateTime?
246
+
247
+ @@unique([workflowRunId, sequence])
248
+ @@index([publishedAt])
249
+ @@map("outbox_events")
250
+ }
251
+
252
+ model IdempotencyKey {
253
+ id String @id @default(cuid())
254
+ createdAt DateTime @default(now())
255
+ key String
256
+ commandType String
257
+ result Json
258
+
259
+ @@unique([key, commandType])
260
+ @@map("idempotency_keys")
261
+ }
234
262
  ```
235
263
 
236
264
  Run the migration:
@@ -242,215 +270,121 @@ npx prisma generate
242
270
 
243
271
  ### 2. Define Your First Stage
244
272
 
245
- Create a file `stages/extract-text.ts`:
246
-
247
273
  ```typescript
248
274
  import { defineStage } from "@bratsos/workflow-engine";
249
275
  import { z } from "zod";
250
276
 
251
- // Define schemas for type safety
252
- const InputSchema = z.object({
253
- url: z.string().url().describe("URL of the document to process"),
254
- });
255
-
256
- const OutputSchema = z.object({
257
- text: z.string().describe("Extracted text content"),
258
- wordCount: z.number().describe("Number of words extracted"),
259
- metadata: z.object({
260
- title: z.string().optional(),
261
- author: z.string().optional(),
262
- }),
263
- });
264
-
265
- const ConfigSchema = z.object({
266
- maxLength: z.number().default(50000).describe("Maximum text length to extract"),
267
- includeMetadata: z.boolean().default(true),
268
- });
269
-
270
277
  export const extractTextStage = defineStage({
271
278
  id: "extract-text",
272
279
  name: "Extract Text",
273
- description: "Extracts text content from a document URL",
274
-
275
280
  schemas: {
276
- input: InputSchema,
277
- output: OutputSchema,
278
- config: ConfigSchema,
281
+ input: z.object({ url: z.string().url() }),
282
+ output: z.object({ text: z.string(), wordCount: z.number() }),
283
+ config: z.object({ maxLength: z.number().default(50000) }),
279
284
  },
280
-
281
285
  async execute(ctx) {
282
- const { url } = ctx.input;
283
- const { maxLength, includeMetadata } = ctx.config;
284
-
285
- // Log progress
286
- ctx.log("INFO", "Starting text extraction", { url });
287
-
288
- // Simulate fetching document (replace with real implementation)
289
- const response = await fetch(url);
290
- const text = await response.text();
291
- const truncatedText = text.slice(0, maxLength);
292
-
293
- ctx.log("INFO", "Extraction complete", {
294
- originalLength: text.length,
295
- truncatedLength: truncatedText.length,
296
- });
297
-
286
+ const response = await fetch(ctx.input.url);
287
+ const text = (await response.text()).slice(0, ctx.config.maxLength);
288
+ ctx.log("INFO", "Extraction complete", { length: text.length });
298
289
  return {
299
- output: {
300
- text: truncatedText,
301
- wordCount: truncatedText.split(/\s+/).length,
302
- metadata: includeMetadata ? { title: "Document Title" } : {},
303
- },
304
- // Optional: custom metrics for observability
305
- customMetrics: {
306
- bytesProcessed: text.length,
307
- },
290
+ output: { text, wordCount: text.split(/\s+/).length },
308
291
  };
309
292
  },
310
293
  });
311
-
312
- // Export types for use in other stages
313
- export type ExtractTextOutput = z.infer<typeof OutputSchema>;
314
294
  ```
315
295
 
316
296
  ### 3. Build a Workflow
317
297
 
318
- Create a file `workflows/document-processor.ts`:
319
-
320
298
  ```typescript
321
299
  import { WorkflowBuilder } from "@bratsos/workflow-engine";
322
300
  import { z } from "zod";
323
- import { extractTextStage } from "../stages/extract-text";
324
- import { summarizeStage } from "../stages/summarize";
325
- import { analyzeStage } from "../stages/analyze";
326
-
327
- const InputSchema = z.object({
328
- url: z.string().url(),
329
- options: z.object({
330
- generateSummary: z.boolean().default(true),
331
- analyzeContent: z.boolean().default(true),
332
- }).optional(),
333
- });
301
+ import { extractTextStage } from "./stages/extract-text";
302
+ import { summarizeStage } from "./stages/summarize";
334
303
 
335
304
  export const documentProcessorWorkflow = new WorkflowBuilder(
336
- "document-processor", // Unique ID
337
- "Document Processor", // Display name
338
- "Extracts, summarizes, and analyzes documents",
339
- InputSchema,
340
- InputSchema // Initial output type (will be inferred)
305
+ "document-processor",
306
+ "Document Processor",
307
+ "Extracts and summarizes documents",
308
+ z.object({ url: z.string().url() }),
309
+ z.object({ url: z.string().url() }),
341
310
  )
342
- .pipe(extractTextStage) // Stage 1: Extract
343
- .pipe(summarizeStage) // Stage 2: Summarize
344
- .pipe(analyzeStage) // Stage 3: Analyze
311
+ .pipe(extractTextStage)
312
+ .pipe(summarizeStage)
345
313
  .build();
346
-
347
- // Export input type for API consumers
348
- export type DocumentProcessorInput = z.infer<typeof InputSchema>;
349
314
  ```
350
315
 
351
- ### 4. Create the Runtime
316
+ ### 4. Create the Kernel
352
317
 
353
- Create a file `runtime.ts`:
318
+ The kernel is the core command dispatcher. It's environment-agnostic -- no timers, no signals, no global state.
354
319
 
355
320
  ```typescript
321
+ import { createKernel } from "@bratsos/workflow-engine/kernel";
356
322
  import {
357
- createWorkflowRuntime,
358
323
  createPrismaWorkflowPersistence,
359
324
  createPrismaJobQueue,
360
- createPrismaAICallLogger,
361
- type WorkflowRegistry,
362
325
  } from "@bratsos/workflow-engine";
363
326
  import { PrismaClient } from "@prisma/client";
364
327
  import { documentProcessorWorkflow } from "./workflows/document-processor";
365
328
 
366
- // Initialize Prisma client
367
329
  const prisma = new PrismaClient();
368
330
 
369
- // Create persistence implementations
370
- const persistence = createPrismaWorkflowPersistence(prisma);
371
- const jobQueue = createPrismaJobQueue(prisma);
372
- const aiCallLogger = createPrismaAICallLogger(prisma);
373
-
374
- // Create a workflow registry
375
- const registry: WorkflowRegistry = {
376
- getWorkflow(id: string) {
377
- const workflows: Record<string, any> = {
378
- "document-processor": documentProcessorWorkflow,
379
- // Add more workflows here
380
- };
381
- return workflows[id];
331
+ const kernel = createKernel({
332
+ persistence: createPrismaWorkflowPersistence(prisma),
333
+ blobStore: myBlobStore, // BlobStore implementation
334
+ jobTransport: createPrismaJobQueue(prisma),
335
+ eventSink: myEventSink, // EventSink implementation
336
+ scheduler: myScheduler, // Scheduler implementation
337
+ clock: { now: () => new Date() },
338
+ registry: {
339
+ getWorkflow: (id) =>
340
+ id === "document-processor" ? documentProcessorWorkflow : undefined,
382
341
  },
383
- };
384
-
385
- // Optional: Define priority function for different workflow types
386
- function getWorkflowPriority(workflowId: string): number {
387
- const priorities: Record<string, number> = {
388
- "document-processor": 5, // Normal priority
389
- "urgent-task": 10, // High priority
390
- "background-job": 2, // Low priority
391
- };
392
- return priorities[workflowId] ?? 5;
393
- }
394
-
395
- // Create the runtime
396
- export const runtime = createWorkflowRuntime({
397
- persistence,
398
- jobQueue,
399
- registry,
400
- aiCallLogger,
401
- pollIntervalMs: 10000, // Check for suspended stages every 10s
402
- jobPollIntervalMs: 1000, // Poll job queue every 1s
403
- getWorkflowPriority, // Optional priority function
404
342
  });
405
-
406
- // Convenience exports
407
- export { prisma, persistence, jobQueue };
408
343
  ```
409
344
 
410
- ### 5. Run a Workflow
411
-
412
- #### Option A: Worker Process (Recommended for Production)
345
+ ### 5. Choose a Host
413
346
 
414
- Create a file `worker.ts`:
347
+ #### Option A: Node.js Worker (Recommended for Production)
415
348
 
416
349
  ```typescript
417
- import { runtime } from "./runtime";
350
+ import { createNodeHost } from "@bratsos/workflow-engine-host-node";
351
+
352
+ const host = createNodeHost({
353
+ kernel,
354
+ jobTransport: createPrismaJobQueue(prisma),
355
+ workerId: "worker-1",
356
+ orchestrationIntervalMs: 10_000,
357
+ jobPollIntervalMs: 1_000,
358
+ });
418
359
 
419
- // Register shutdown handlers
420
- process.on("SIGTERM", () => runtime.stop());
421
- process.on("SIGINT", () => runtime.stop());
360
+ // Start polling loops + signal handlers
361
+ await host.start();
422
362
 
423
- // Start the worker
424
- console.log("Starting workflow worker...");
425
- runtime.start();
363
+ // Queue a workflow
364
+ await kernel.dispatch({
365
+ type: "run.create",
366
+ idempotencyKey: crypto.randomUUID(),
367
+ workflowId: "document-processor",
368
+ input: { url: "https://example.com/doc.pdf" },
369
+ });
426
370
  ```
427
371
 
428
- Run the worker:
372
+ #### Option B: Serverless (Cloudflare Workers, Lambda, etc.)
429
373
 
430
- ```bash
431
- npx tsx worker.ts
432
- ```
374
+ ```typescript
375
+ import { createServerlessHost } from "@bratsos/workflow-engine-host-serverless";
433
376
 
434
- #### Option B: Queue from API Endpoint
377
+ const host = createServerlessHost({
378
+ kernel,
379
+ jobTransport,
380
+ workerId: "my-worker",
381
+ });
435
382
 
436
- ```typescript
437
- import { runtime } from "./runtime";
438
-
439
- // In your API route handler
440
- export async function POST(request: Request) {
441
- const { url } = await request.json();
442
-
443
- const { workflowRunId } = await runtime.createRun({
444
- workflowId: "document-processor",
445
- input: { url },
446
- config: {
447
- "extract-text": { maxLength: 100000 },
448
- "summarize": { modelKey: "gemini-2.5-flash" },
449
- },
450
- });
383
+ // Handle a single job from a queue message
384
+ const result = await host.handleJob(msg);
451
385
 
452
- return Response.json({ workflowRunId });
453
- }
386
+ // Run maintenance from a cron trigger
387
+ const tick = await host.runMaintenanceTick();
454
388
  ```
455
389
 
456
390
  ---
@@ -459,16 +393,7 @@ export async function POST(request: Request) {
459
393
 
460
394
  ### Stages
461
395
 
462
- A stage is the atomic unit of work. Every stage has:
463
-
464
- | Property | Description |
465
- |----------|-------------|
466
- | `id` | Unique identifier within the workflow |
467
- | `name` | Human-readable name |
468
- | `inputSchema` | Zod schema for input validation |
469
- | `outputSchema` | Zod schema for output validation |
470
- | `configSchema` | Zod schema for stage configuration |
471
- | `execute(ctx)` | The function that performs the work |
396
+ A stage is the atomic unit of work. Every stage has typed input, output, and config schemas.
472
397
 
473
398
  **Stage Modes:**
474
399
 
@@ -490,36 +415,61 @@ new WorkflowBuilder(id, name, description, inputSchema, outputSchema)
490
415
  .build();
491
416
  ```
492
417
 
493
- ### Runtime
418
+ ### Kernel
494
419
 
495
- The `WorkflowRuntime` manages workflow execution:
420
+ The `Kernel` is a pure command dispatcher. All operations are expressed as typed commands:
496
421
 
497
422
  ```typescript
498
- const runtime = createWorkflowRuntime({
499
- persistence, // Database operations
500
- jobQueue, // Distributed job queue
501
- registry, // Workflow definitions
502
- aiCallLogger, // AI call tracking (optional)
423
+ // Create a run
424
+ const { workflowRunId } = await kernel.dispatch({
425
+ type: "run.create",
426
+ idempotencyKey: "unique-key",
427
+ workflowId: "my-workflow",
428
+ input: { data: "hello" },
503
429
  });
504
430
 
505
- // Start as a worker process
506
- await runtime.start();
507
-
508
- // Queue a new workflow run
509
- await runtime.createRun({ workflowId, input, config });
431
+ // Cancel a run
432
+ await kernel.dispatch({
433
+ type: "run.cancel",
434
+ workflowRunId,
435
+ reason: "User requested",
436
+ });
510
437
 
511
- // Create AI helper for a stage
512
- const ai = runtime.createAIHelper("workflow.run-123.stage.extract");
438
+ // Rerun from a specific stage
439
+ await kernel.dispatch({
440
+ type: "run.rerunFrom",
441
+ workflowRunId,
442
+ fromStageId: "extract-text",
443
+ });
513
444
  ```
514
445
 
515
- ### Persistence
446
+ The kernel depends on 7 port interfaces (injected at creation):
516
447
 
517
- The engine uses three persistence interfaces:
448
+ | Port | Purpose |
449
+ |------|---------|
450
+ | `Persistence` | Runs, stages, logs, outbox, idempotency CRUD |
451
+ | `BlobStore` | Large payload storage (put/get/has/delete/list) |
452
+ | `JobTransport` | Job queue (enqueue/dequeue/complete/suspend/fail) |
453
+ | `EventSink` | Async event publishing |
454
+ | `Scheduler` | Deferred command triggers |
455
+ | `Clock` | Injectable time source |
456
+ | `WorkflowRegistry` | Workflow definition lookup |
457
+
458
+ ### Hosts
459
+
460
+ Hosts wrap the kernel with environment-specific process management:
461
+
462
+ **Node Host** (`@bratsos/workflow-engine-host-node`): Long-running worker process with polling loops, signal handling (SIGTERM/SIGINT), and continuous job dequeuing.
463
+
464
+ **Serverless Host** (`@bratsos/workflow-engine-host-serverless`): Stateless single-invocation methods for queue-driven environments. Consumers wire platform-specific glue (ack/retry/waitUntil) around the host methods.
465
+
466
+ ### Persistence
518
467
 
519
468
  | Interface | Purpose |
520
469
  |-----------|---------|
521
- | `WorkflowPersistence` | Workflow runs, stages, logs, artifacts |
522
- | `JobQueue` | Distributed job queue with priority and retries |
470
+ | `Persistence` | Workflow runs, stages, logs, outbox, idempotency |
471
+ | `JobTransport` | Distributed job queue with priority and retries |
472
+ | `BlobStore` | Large payload storage |
523
473
  | `AICallLogger` | AI call tracking with cost aggregation |
524
474
 
525
475
  **Built-in implementations:**
@@ -527,26 +477,6 @@ The engine uses three persistence interfaces:
527
477
  - `createPrismaJobQueue(prisma)` - PostgreSQL with `FOR UPDATE SKIP LOCKED`
528
478
  - `createPrismaAICallLogger(prisma)` - PostgreSQL
529
479
 
530
- **Prisma version compatibility:**
531
-
532
- The persistence layer supports both Prisma 6.x and 7.x. Enum values are automatically resolved based on your Prisma version:
533
-
534
- ```typescript
535
- import { createPrismaWorkflowPersistence } from "@bratsos/workflow-engine/persistence/prisma";
536
-
537
- // Works with both Prisma 6.x (string enums) and Prisma 7.x (typed enums)
538
- const persistence = createPrismaWorkflowPersistence(prisma);
539
- ```
540
-
541
- For advanced use cases, you can use the enum helper directly:
542
-
543
- ```typescript
544
- import { createEnumHelper } from "@bratsos/workflow-engine/persistence/prisma";
545
-
546
- const enums = createEnumHelper(prisma);
547
- const status = enums.status("PENDING"); // Returns typed enum for Prisma 7.x, string for 6.x
548
- ```
549
-
550
480
  ---
551
481
 
552
482
  ## Common Patterns
@@ -559,309 +489,192 @@ Use `ctx.require()` for type-safe access to any previous stage's output:
559
489
  export const analyzeStage = defineStage({
560
490
  id: "analyze",
561
491
  name: "Analyze Content",
562
-
563
492
  schemas: {
564
- input: "none", // This stage reads from workflowContext
493
+ input: "none",
565
494
  output: AnalysisOutputSchema,
566
495
  config: ConfigSchema,
567
496
  },
568
-
569
497
  async execute(ctx) {
570
- // Type-safe access to previous stage output
571
498
  const extracted = ctx.require("extract-text"); // Throws if missing
572
499
  const summary = ctx.optional("summarize"); // Returns undefined if missing
573
-
574
- // Use the data
575
- console.log(`Analyzing ${extracted.wordCount} words`);
576
-
577
- return {
578
- output: { /* ... */ },
579
- };
500
+ return { output: { /* ... */ } };
580
501
  },
581
502
  });
582
503
  ```
583
504
 
584
- > **Important:** Use `input: "none"` for stages that read from `workflowContext` instead of receiving direct input from the previous stage.
585
-
586
505
  ### Parallel Execution
587
506
 
588
- Run independent stages concurrently:
589
-
590
507
  ```typescript
591
508
  const workflow = new WorkflowBuilder(/* ... */)
592
509
  .pipe(extractStage)
593
510
  .parallel([
594
- sentimentAnalysisStage, // These three run
595
- keywordExtractionStage, // at the same time
596
- languageDetectionStage,
511
+ sentimentAnalysisStage,
512
+ keywordExtractionStage,
513
+ languageDetectionStage,
597
514
  ])
598
- .pipe(aggregateResultsStage) // Runs after all parallel stages complete
515
+ .pipe(aggregateResultsStage)
599
516
  .build();
600
517
  ```
601
518
 
602
- In subsequent stages, access parallel outputs by stage ID:
603
-
604
- ```typescript
605
- async execute(ctx) {
606
- const sentiment = ctx.require("sentiment-analysis");
607
- const keywords = ctx.require("keyword-extraction");
608
- const language = ctx.require("language-detection");
609
- // ...
610
- }
611
- ```
612
-
613
519
  ### AI Integration
614
520
 
615
- Use the `AIHelper` for tracked AI calls:
616
-
617
521
  ```typescript
618
- import { createAIHelper, type AIHelper } from "@bratsos/workflow-engine";
522
+ import { createAIHelper } from "@bratsos/workflow-engine";
619
523
 
620
524
  async execute(ctx) {
621
- // Create AI helper with topic for cost tracking
622
- const ai = runtime.createAIHelper(`workflow.${ctx.workflowRunId}.${ctx.stageId}`);
623
-
624
- // Generate text
625
- const { text, cost, inputTokens, outputTokens } = await ai.generateText(
626
- "gemini-2.5-flash",
627
- "Summarize this document: " + ctx.input.text
525
+ const ai = createAIHelper(
526
+ `workflow.${ctx.workflowRunId}.stage.${ctx.stageId}`,
527
+ aiCallLogger,
628
528
  );
629
529
 
630
- // Generate structured object
530
+ const { text, cost } = await ai.generateText("gemini-2.5-flash", "Summarize: " + ctx.input.text);
531
+
631
532
  const { object: analysis } = await ai.generateObject(
632
533
  "gemini-2.5-flash",
633
- "Analyze this text: " + ctx.input.text,
634
- z.object({
635
- sentiment: z.enum(["positive", "negative", "neutral"]),
636
- topics: z.array(z.string()),
637
- })
638
- );
639
-
640
- // Generate embeddings
641
- const { embedding, dimensions } = await ai.embed(
642
- "gemini-embedding-001",
643
- ctx.input.text,
644
- { dimensions: 768 }
534
+ "Analyze: " + ctx.input.text,
535
+ z.object({ sentiment: z.enum(["positive", "negative", "neutral"]) })
645
536
  );
646
537
 
647
- // All calls are automatically logged with cost tracking
648
- return { output: { text, analysis, embedding } };
538
+ return { output: { text, analysis } };
649
539
  }
650
540
  ```
651
541
 
652
542
  ### Long-Running Batch Jobs
653
543
 
654
- For operations that may take hours (OpenAI Batch API, Google Vertex Batch, etc.):
655
-
656
544
  ```typescript
657
545
  import { defineAsyncBatchStage } from "@bratsos/workflow-engine";
658
546
 
659
- export const batchProcessingStage = defineAsyncBatchStage({
547
+ export const batchStage = defineAsyncBatchStage({
660
548
  id: "batch-process",
661
549
  name: "Batch Processing",
662
- mode: "async-batch", // Required for async stages
663
-
550
+ mode: "async-batch",
664
551
  schemas: { input: InputSchema, output: OutputSchema, config: ConfigSchema },
665
552
 
666
553
  async execute(ctx) {
667
- // If we're resuming from suspension, the batch is complete
668
554
  if (ctx.resumeState) {
669
- const results = await fetchBatchResults(ctx.resumeState.batchId);
670
- return { output: results };
555
+ return { output: await fetchBatchResults(ctx.resumeState.batchId) };
671
556
  }
672
557
 
673
- // First execution: submit the batch job
674
- const batch = await submitBatchToOpenAI(ctx.input.prompts);
675
-
676
- // Return suspended state - workflow will pause here
558
+ const batch = await submitBatch(ctx.input.prompts);
677
559
  return {
678
560
  suspended: true,
679
- state: {
680
- batchId: batch.id,
681
- submittedAt: new Date().toISOString(),
682
- pollInterval: 3600000, // 1 hour
683
- maxWaitTime: 86400000, // 24 hours
684
- },
685
- pollConfig: {
686
- pollInterval: 3600000,
687
- maxWaitTime: 86400000,
688
- nextPollAt: new Date(Date.now() + 3600000),
689
- },
561
+ state: { batchId: batch.id },
562
+ pollConfig: { pollInterval: 3600000, maxWaitTime: 86400000, nextPollAt: new Date(Date.now() + 3600000) },
690
563
  };
691
564
  },
692
565
 
693
- // Called by the orchestrator to check if batch is ready
694
- async checkCompletion(state, ctx) {
566
+ async checkCompletion(state) {
695
567
  const status = await checkBatchStatus(state.batchId);
696
-
697
568
  if (status === "completed") {
698
- return { ready: true }; // Workflow will resume
699
- }
700
-
701
- if (status === "failed") {
702
- return { ready: true, error: "Batch processing failed" };
569
+ const output = await fetchBatchResults(state.batchId);
570
+ return { ready: true, output };
703
571
  }
704
-
705
- // Not ready yet - check again later
706
- return {
707
- ready: false,
708
- nextCheckIn: 60000, // Override poll interval for this check
709
- };
572
+ if (status === "failed") return { ready: false, error: "Batch failed" };
573
+ return { ready: false };
710
574
  },
711
575
  });
712
576
  ```
713
577
 
714
578
  ### Config Presets
715
579
 
716
- Use built-in config presets for common patterns:
717
-
718
580
  ```typescript
719
- import {
720
- withAIConfig,
721
- withStandardConfig,
722
- withFullConfig
723
- } from "@bratsos/workflow-engine";
581
+ import { withAIConfig, withStandardConfig } from "@bratsos/workflow-engine";
724
582
  import { z } from "zod";
725
583
 
726
- // AI-focused config (model, temperature, maxTokens)
727
- const MyConfigSchema = withAIConfig(z.object({
728
- customField: z.string(),
729
- }));
730
-
731
- // Standard config (AI + concurrency + featureFlags)
732
- const StandardConfigSchema = withStandardConfig(z.object({
733
- specificOption: z.boolean().default(true),
734
- }));
735
-
736
- // Full config (Standard + debug options)
737
- const DebugConfigSchema = withFullConfig(z.object({
738
- processType: z.string(),
739
- }));
584
+ const MyConfigSchema = withAIConfig(z.object({ customField: z.string() }));
740
585
  ```
741
586
 
742
587
  ---
743
588
 
744
589
  ## Best Practices
745
590
 
746
- ### 1. Schema Design
591
+ ### Schema Design
747
592
 
748
593
  ```typescript
749
- // Good: Strict schemas with descriptions and defaults
594
+ // Good: Strict schemas with descriptions and defaults
750
595
  const ConfigSchema = z.object({
751
- modelKey: z.string()
752
- .default("gemini-2.5-flash")
753
- .describe("AI model to use for processing"),
754
- maxRetries: z.number()
755
- .min(0)
756
- .max(10)
757
- .default(3)
758
- .describe("Maximum retry attempts on failure"),
759
- });
760
-
761
- // ❌ Bad: Loose typing
762
- const ConfigSchema = z.object({
763
- model: z.any(),
764
- retries: z.number(),
765
- });
766
- ```
767
-
768
- ### 2. Stage Dependencies
769
-
770
- ```typescript
771
- // ✅ Good: Declare dependencies explicitly
772
- export const analyzeStage = defineStage({
773
- id: "analyze",
774
- dependencies: ["extract-text", "summarize"], // Build-time validation
775
- schemas: { input: "none", /* ... */ },
776
- // ...
777
- });
778
-
779
- // ❌ Bad: Hidden dependencies
780
- export const analyzeStage = defineStage({
781
- id: "analyze",
782
- schemas: { input: SomeSchema, /* ... */ }, // Where does input come from?
783
- // ...
596
+ modelKey: z.string().default("gemini-2.5-flash").describe("AI model to use"),
597
+ maxRetries: z.number().min(0).max(10).default(3),
784
598
  });
785
599
  ```
786
600
 
787
- ### 3. Logging
601
+ ### Logging
788
602
 
789
603
  ```typescript
790
604
  async execute(ctx) {
791
- // Good: Structured logging with context
792
- ctx.log("INFO", "Starting processing", {
793
- itemCount: items.length,
794
- config: ctx.config,
795
- });
605
+ ctx.log("INFO", "Starting processing", { itemCount: items.length });
796
606
 
797
- // ✅ Good: Progress updates for long operations
798
607
  for (const [index, item] of items.entries()) {
799
608
  ctx.onProgress({
800
609
  progress: (index + 1) / items.length,
801
610
  message: `Processing item ${index + 1}/${items.length}`,
802
611
  });
803
612
  }
804
-
805
- // ❌ Bad: Console.log (not persisted)
806
- console.log("Processing...");
807
613
  }
808
614
  ```
809
615
 
810
- ### 4. Error Handling
616
+ ### Error Handling
811
617
 
812
618
  ```typescript
813
619
  async execute(ctx) {
814
620
  try {
815
- // Validate early
816
- if (!ctx.input.url.startsWith("https://")) {
817
- throw new Error("Only HTTPS URLs are supported");
818
- }
819
-
820
621
  const result = await processDocument(ctx.input);
821
622
  return { output: result };
822
-
823
623
  } catch (error) {
824
- // Log errors with context
825
624
  ctx.log("ERROR", "Processing failed", {
826
625
  error: error instanceof Error ? error.message : String(error),
827
- input: ctx.input,
828
626
  });
829
-
830
- // Re-throw to mark stage as failed
831
627
  throw error;
832
628
  }
833
629
  }
834
630
  ```
835
631
 
836
- ### 5. Performance
837
-
838
- ```typescript
839
- // ✅ Good: Use parallel stages for independent work
840
- .parallel([
841
- fetchDataFromSourceA,
842
- fetchDataFromSourceB,
843
- fetchDataFromSourceC,
844
- ])
845
-
846
- // ✅ Good: Use batch processing for many AI calls
847
- const results = await ai.batch("gpt-4o").submit(
848
- items.map((item, i) => ({
849
- id: `item-${i}`,
850
- prompt: `Process: ${item}`,
851
- schema: OutputSchema,
852
- }))
853
- );
854
-
855
- // ❌ Bad: Sequential AI calls when parallel would work
856
- for (const item of items) {
857
- await ai.generateText("gpt-4o", `Process: ${item}`);
858
- }
859
- ```
860
-
861
632
  ---
862
633
 
863
634
  ## API Reference
864
635
 
636
+ ### Kernel Commands
637
+
638
+ | Command | Description | Key Fields |
639
+ |---------|-------------|------------|
640
+ | `run.create` | Create a new workflow run | `idempotencyKey`, `workflowId`, `input`, `config?`, `priority?` |
641
+ | `run.claimPending` | Claim pending runs for processing | `workerId`, `maxClaims?` |
642
+ | `run.transition` | Advance to next stage group | `workflowRunId` |
643
+ | `run.cancel` | Cancel a running workflow | `workflowRunId`, `reason?` |
644
+ | `run.rerunFrom` | Rerun from a specific stage | `workflowRunId`, `fromStageId` |
645
+ | `job.execute` | Execute a single stage (multi-phase transactions) | `idempotencyKey?`, `workflowRunId`, `workflowId`, `stageId`, `config` |
646
+ | `stage.pollSuspended` | Poll suspended stages | `maxChecks?` (returns `resumedWorkflowRunIds`) |
647
+ | `lease.reapStale` | Release stale job leases | `staleThresholdMs` |
648
+ | `outbox.flush` | Publish pending events | `maxEvents?` |
649
+ | `plugin.replayDLQ` | Replay dead-letter queue events | `maxEvents?` |
650
+
651
+ Idempotency behavior:
652
+ - Replaying the same `idempotencyKey` returns cached results.
653
+ - If the same key is already executing, dispatch throws `IdempotencyInProgressError`.
654
+
655
+ Transaction behavior:
656
+ - Most commands execute inside a single database transaction (handler + outbox events).
657
+ - `job.execute` uses multi-phase transactions: Phase 1 commits `RUNNING` status immediately, Phase 2 runs `stageDef.execute()` outside any transaction, Phase 3 commits the final status. This avoids holding a database connection during long-running stage execution.
658
+
659
+ ### Node Host Config
660
+
661
+ | Option | Type | Default | Description |
662
+ |--------|------|---------|-------------|
663
+ | `kernel` | `Kernel` | required | Kernel instance |
664
+ | `jobTransport` | `JobTransport` | required | Job queue |
665
+ | `workerId` | `string` | required | Unique worker ID |
666
+ | `orchestrationIntervalMs` | `number` | 10000 | Orchestration poll interval |
667
+ | `jobPollIntervalMs` | `number` | 1000 | Job dequeue interval |
668
+ | `staleLeaseThresholdMs` | `number` | 60000 | Stale lease timeout |
669
+
670
+ ### Serverless Host
671
+
672
+ | Method | Description |
673
+ |--------|-------------|
674
+ | `handleJob(msg)` | Execute a single pre-dequeued job. Returns `{ outcome, error? }` |
675
+ | `processAvailableJobs(opts?)` | Dequeue and process jobs. Returns `{ processed, succeeded, failed }` |
676
+ | `runMaintenanceTick()` | Claim, poll, reap, flush in one call. Returns structured result |
677
+
865
678
  ### Core Exports
866
679
 
867
680
  ```typescript
@@ -871,81 +684,47 @@ import { defineStage, defineAsyncBatchStage } from "@bratsos/workflow-engine";
871
684
  // Workflow building
872
685
  import { WorkflowBuilder, Workflow } from "@bratsos/workflow-engine";
873
686
 
874
- // Runtime
875
- import { createWorkflowRuntime, WorkflowRuntime } from "@bratsos/workflow-engine";
687
+ // Kernel
688
+ import { createKernel, type Kernel, type KernelConfig } from "@bratsos/workflow-engine/kernel";
689
+
690
+ // Kernel types
691
+ import type { KernelCommand, CommandResult, KernelEvent } from "@bratsos/workflow-engine/kernel";
692
+
693
+ // Port interfaces
694
+ import type { Persistence, BlobStore, JobTransport, EventSink, Scheduler, Clock } from "@bratsos/workflow-engine/kernel";
695
+
696
+ // Plugins
697
+ import { definePlugin, createPluginRunner } from "@bratsos/workflow-engine/kernel";
876
698
 
877
699
  // Persistence (Prisma)
878
- import {
879
- createPrismaWorkflowPersistence,
880
- createPrismaJobQueue,
881
- createPrismaAICallLogger,
882
- } from "@bratsos/workflow-engine";
700
+ import { createPrismaWorkflowPersistence, createPrismaJobQueue, createPrismaAICallLogger } from "@bratsos/workflow-engine";
883
701
 
884
702
  // AI Helper
885
703
  import { createAIHelper, type AIHelper } from "@bratsos/workflow-engine";
886
704
 
887
- // Types
888
- import type {
889
- WorkflowPersistence,
890
- JobQueue,
891
- AICallLogger,
892
- WorkflowRunRecord,
893
- WorkflowStageRecord,
894
- } from "@bratsos/workflow-engine";
705
+ // Testing
706
+ import { InMemoryWorkflowPersistence, InMemoryJobQueue } from "@bratsos/workflow-engine/testing";
707
+ import { FakeClock, InMemoryBlobStore, CollectingEventSink, NoopScheduler } from "@bratsos/workflow-engine/kernel/testing";
895
708
  ```
896
709
 
897
- ### `createWorkflowRuntime(config)`
898
-
899
- | Option | Type | Default | Description |
900
- |--------|------|---------|-------------|
901
- | `persistence` | `WorkflowPersistence` | required | Database operations |
902
- | `jobQueue` | `JobQueue` | required | Job queue implementation |
903
- | `registry` | `WorkflowRegistry` | required | Workflow definitions |
904
- | `aiCallLogger` | `AICallLogger` | optional | AI call tracking |
905
- | `pollIntervalMs` | `number` | 10000 | Interval for checking suspended stages |
906
- | `jobPollIntervalMs` | `number` | 1000 | Interval for polling job queue |
907
- | `workerId` | `string` | auto | Unique worker identifier |
908
- | `staleJobThresholdMs` | `number` | 60000 | Time before a job is considered stale |
909
- | `getWorkflowPriority` | `(id: string) => number` | optional | Priority function (1-10, higher = more important) |
910
-
911
- ### `WorkflowRuntime`
912
-
913
- | Method | Description |
914
- |--------|-------------|
915
- | `start()` | Start polling for jobs (runs forever until stopped) |
916
- | `stop()` | Stop the worker gracefully |
917
- | `createRun(options)` | Queue a new workflow run |
918
- | `createAIHelper(topic)` | Create an AI helper bound to the logger |
919
- | `transitionWorkflow(runId)` | Manually transition a workflow to next stage |
920
- | `pollSuspendedStages()` | Manually poll suspended stages |
921
-
922
- ### `CreateRunOptions`
923
-
924
- | Option | Type | Required | Description |
925
- |--------|------|----------|-------------|
926
- | `workflowId` | `string` | yes | ID of the workflow to run |
927
- | `input` | `object` | yes | Input data matching workflow's input schema |
928
- | `config` | `object` | no | Stage configurations keyed by stage ID |
929
- | `priority` | `number` | no | Priority (1-10, overrides `getWorkflowPriority`) |
930
- | `metadata` | `object` | no | Domain-specific metadata |
931
-
932
710
  ---
933
711
 
934
712
  ## Troubleshooting
935
713
 
936
714
  ### "Workflow not found in registry"
937
715
 
938
- Ensure the workflow is registered before creating runs:
716
+ Ensure the workflow is registered in the `registry` passed to `createKernel`:
939
717
 
940
718
  ```typescript
941
- const registry = {
942
- getWorkflow(id) {
943
- const workflows = {
944
- "my-workflow": myWorkflow, // Add your workflow here
945
- };
946
- return workflows[id];
719
+ const kernel = createKernel({
720
+ // ...
721
+ registry: {
722
+ getWorkflow(id) {
723
+ const workflows = { "my-workflow": myWorkflow };
724
+ return workflows[id];
725
+ },
947
726
  },
948
- };
727
+ });
949
728
  ```
950
729
 
951
730
  ### "Stage X depends on Y which was not found"
@@ -953,31 +732,13 @@ const registry = {
953
732
  Verify all dependencies are included in the workflow:
954
733
 
955
734
  ```typescript
956
- // If stage "analyze" depends on "extract":
957
735
  .pipe(extractStage) // Must be piped before
958
736
  .pipe(analyzeStage) // analyze can now access extract's output
959
737
  ```
960
738
 
961
- ### Jobs stuck in "PROCESSING"
962
-
963
- This usually means a worker crashed. The stale lock recovery will automatically release jobs after `staleJobThresholdMs` (default 60s).
739
+ ### Jobs stuck in "RUNNING"
964
740
 
965
- To manually release stale jobs:
966
-
967
- ```typescript
968
- await jobQueue.releaseStaleJobs(60000); // Release jobs locked > 60s
969
- ```
970
-
971
- ### AI calls not being tracked
972
-
973
- Ensure you pass `aiCallLogger` to the runtime:
974
-
975
- ```typescript
976
- const runtime = createWorkflowRuntime({
977
- // ...
978
- aiCallLogger: createPrismaAICallLogger(prisma), // Required for tracking
979
- });
980
- ```
741
+ A worker likely crashed. The stale lease recovery (`lease.reapStale` command) automatically releases jobs. In Node host, this runs on each orchestration tick. For serverless, call `runMaintenanceTick()` from a cron trigger.
981
742
 
982
743
  ---
983
744