@donkeylabs/server 0.4.8 → 0.5.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -258,9 +258,17 @@ router.route("subscribe-job").raw({
258
258
 
259
259
  ## Wrapper Libraries
260
260
 
261
- ### Python Wrapper
261
+ After installing `@donkeylabs/server`, copy the wrapper to your project:
262
+
263
+ ```bash
264
+ # Python
265
+ cp node_modules/@donkeylabs/server/examples/external-jobs/python/donkeylabs_job.py ./workers/
266
+
267
+ # Shell
268
+ cp node_modules/@donkeylabs/server/examples/external-jobs/shell/donkeylabs-job.sh ./workers/
269
+ ```
262
270
 
263
- Located at `examples/external-jobs/python/donkeylabs_job.py`:
271
+ ### Python Wrapper
264
272
 
265
273
  ```python
266
274
  from donkeylabs_job import DonkeylabsJob, run_job
@@ -331,19 +339,131 @@ job_complete '{"result": "success"}'
331
339
 
332
340
  ## Server Restart Resilience
333
341
 
334
- External jobs survive server restarts:
342
+ External jobs automatically survive server restarts through built-in SQLite persistence.
343
+
344
+ ### Default Behavior (SQLite Persistence)
345
+
346
+ Jobs are automatically persisted to `.donkeylabs/jobs.db` by default:
347
+
348
+ ```typescript
349
+ import { AppServer } from "@donkeylabs/server";
350
+
351
+ const server = new AppServer({
352
+ db: createDatabase(),
353
+ // Jobs automatically use SQLite persistence - no config needed!
354
+ });
355
+
356
+ server.getCore().jobs.registerExternal("process-video", {
357
+ command: "python",
358
+ args: ["-m", "video_processor"],
359
+ });
360
+ ```
361
+
362
+ ### Configuration Options
363
+
364
+ ```typescript
365
+ const server = new AppServer({
366
+ db: createDatabase(),
367
+ jobs: {
368
+ // SQLite is used by default (persist: true)
369
+ persist: true, // Set to false for in-memory only
370
+ dbPath: ".donkeylabs/jobs.db", // Custom database path
371
+ external: {
372
+ socketDir: "/tmp/donkeylabs-jobs",
373
+ },
374
+ },
375
+ });
376
+ ```
377
+
378
+ ### Custom Adapter
379
+
380
+ For Postgres, MySQL, or other databases, provide your own adapter:
381
+
382
+ ```typescript
383
+ import { AppServer, SqliteJobAdapter } from "@donkeylabs/server";
384
+ import { MyPostgresJobAdapter } from "./adapters/postgres";
385
+
386
+ const server = new AppServer({
387
+ db: createDatabase(),
388
+ jobs: {
389
+ adapter: new MyPostgresJobAdapter(db), // Custom adapter
390
+ },
391
+ });
392
+ ```
393
+
394
+ ### What Gets Persisted
395
+
396
+ The adapter must persist these fields for external jobs:
397
+
398
+ | Field | Description |
399
+ |-------|-------------|
400
+ | `id` | Unique job ID |
401
+ | `name` | Job name |
402
+ | `data` | Job payload (JSON) |
403
+ | `status` | pending, running, completed, failed |
404
+ | `pid` | External process ID |
405
+ | `socketPath` | Unix socket path |
406
+ | `tcpPort` | TCP port (Windows) |
407
+ | `lastHeartbeat` | Last heartbeat timestamp |
408
+ | `processState` | spawning, running, orphaned |
409
+
410
+ ### How Reconnection Works
411
+
412
+ 1. **On Server Shutdown**: Job state is already persisted in the database
413
+ 2. **On Server Restart**:
414
+ - Server queries for jobs where `status = 'running'` and `external = true`
415
+ - Checks if the process is still alive (via PID)
416
+ - Checks if heartbeat hasn't expired
417
+ - **Reserves** the socket path/port to prevent new jobs from using it
418
+ - Recreates the socket server on the **same path/port**
419
+ - External process detects disconnection and retries connecting
420
+ 3. **Reconnection**: Once reconnected, the job resumes normal operation
421
+ 4. **Cleanup**: When the job completes, fails, or is killed, the reservation is released
422
+
423
+ ### Socket/Port Reservation
424
+
425
+ The server prevents new jobs from accidentally using socket paths or TCP ports that are reserved for orphaned jobs awaiting reconnection:
426
+
427
+ - When an orphaned job is detected on startup, its socket path/port is **reserved**
428
+ - New jobs cannot use reserved paths/ports (an error is thrown if attempted)
429
+ - Reservations are automatically released when:
430
+ - The job completes successfully
431
+ - The job fails
432
+ - The job is killed due to stale heartbeat
433
+ - The process is confirmed dead
434
+
435
+ This ensures that running external processes can always reconnect to their original socket path/port even if the server restarts multiple times.
436
+
437
+ ### Python Wrapper Reconnection
438
+
439
+ The Python wrapper automatically handles reconnection:
440
+
441
+ ```python
442
+ # Default reconnection settings
443
+ job = DonkeylabsJob(
444
+ job_id=job_id,
445
+ name=name,
446
+ data=data,
447
+ socket_path=socket_path,
448
+ heartbeat_interval=5.0, # Heartbeat every 5 seconds
449
+ reconnect_interval=2.0, # Retry every 2 seconds
450
+ max_reconnect_attempts=30, # Try for up to 60 seconds
451
+ )
452
+ ```
335
453
 
336
- 1. **On Shutdown**: Job state (PID, socket path) is persisted in the database
337
- 2. **On Startup**: Server checks for orphaned jobs:
338
- - If process is still alive, attempts reconnection
339
- - If process died, marks job as failed
340
- 3. **Reconnection**: External process continues sending heartbeats; server picks them up
454
+ When the connection is lost:
455
+ 1. Heartbeat/progress messages fail to send
456
+ 2. Background reconnection thread starts
457
+ 3. Retries connecting to the same socket path
458
+ 4. Once reconnected, sends "started" message to server
459
+ 5. Normal operation resumes
341
460
 
342
461
  ### Best Practices
343
462
 
344
- - External workers should handle reconnection gracefully
345
- - Use heartbeats to detect server restarts
346
- - Consider idempotent operations for potential re-execution
463
+ - **Always use a persistent adapter in production**
464
+ - External workers should be idempotent when possible
465
+ - Set `heartbeatTimeout` appropriately (longer = more time to reconnect)
466
+ - Consider longer `max_reconnect_attempts` for critical jobs
347
467
 
348
468
  ## Error Handling
349
469
 
package/docs/workflows.md CHANGED
@@ -1,6 +1,6 @@
1
1
  # Workflows
2
2
 
3
- Workflows provide step function / state machine orchestration for complex multi-step processes. Built on top of the Jobs service, workflows support sequential tasks, parallel execution, conditional branching, retries, and real-time progress via SSE.
3
+ Workflows provide step function / state machine orchestration for complex multi-step processes. Workflows support sequential tasks with inline handlers, parallel execution, conditional branching, retries, and real-time progress via SSE.
4
4
 
5
5
  ## Overview
6
6
 
@@ -17,11 +17,17 @@ Use workflows when you need to:
17
17
 
18
18
  ```typescript
19
19
  import { workflow } from "@donkeylabs/server";
20
+ import { z } from "zod";
20
21
 
21
22
  const orderWorkflow = workflow("process-order")
23
+ // First task: inputSchema validates workflow input
22
24
  .task("validate", {
23
- job: "validate-order",
24
- input: (ctx) => ({ orderId: ctx.input.orderId }),
25
+ inputSchema: z.object({ orderId: z.string() }),
26
+ outputSchema: z.object({ valid: z.boolean(), inStock: z.boolean(), total: z.number() }),
27
+ handler: async (input, ctx) => {
28
+ const order = await ctx.plugins.orders.validate(input.orderId);
29
+ return { valid: true, inStock: order.inStock, total: order.total };
30
+ },
25
31
  })
26
32
  .choice("check-inventory", {
27
33
  choices: [
@@ -35,16 +41,35 @@ const orderWorkflow = workflow("process-order")
35
41
  .parallel("fulfill", {
36
42
  branches: [
37
43
  workflow.branch("shipping")
38
- .task("ship", { job: "create-shipment" })
44
+ .task("ship", {
45
+ // inputSchema as function: maps previous step output to this step's input
46
+ inputSchema: (prev) => ({ orderId: prev.orderId }),
47
+ handler: async (input, ctx) => {
48
+ return await ctx.plugins.shipping.createShipment(input.orderId);
49
+ },
50
+ })
39
51
  .build(),
40
52
  workflow.branch("notification")
41
- .task("notify", { job: "send-confirmation" })
53
+ .task("notify", {
54
+ inputSchema: (prev, workflowInput) => ({
55
+ orderId: workflowInput.orderId,
56
+ total: prev.total,
57
+ }),
58
+ handler: async (input, ctx) => {
59
+ await ctx.plugins.email.sendConfirmation(input);
60
+ return { sent: true };
61
+ },
62
+ })
42
63
  .build(),
43
64
  ],
44
65
  next: "complete",
45
66
  })
67
+ // Subsequent tasks: inputSchema as function receives prev step output
46
68
  .task("backorder", {
47
- job: "create-backorder",
69
+ inputSchema: (prev) => ({ orderId: prev.orderId, total: prev.total }),
70
+ handler: async (input, ctx) => {
71
+ return await ctx.plugins.orders.createBackorder(input);
72
+ },
48
73
  next: "complete",
49
74
  })
50
75
  .pass("complete", { end: true })
@@ -82,25 +107,26 @@ ctx.core.events.on("workflow.progress", (data) => {
82
107
 
83
108
  ### Task
84
109
 
85
- Executes a job (in-process or external) and waits for completion.
110
+ Executes an inline handler function with typed input/output.
86
111
 
87
112
  ```typescript
88
113
  workflow("example")
89
114
  .task("step-name", {
90
- // Required: job to execute
91
- job: "my-job-name",
92
-
93
- // Optional: transform workflow context to job input
94
- input: (ctx) => ({
95
- orderId: ctx.input.orderId,
96
- previousResult: ctx.steps.previousStep,
97
- }),
98
-
99
- // Optional: transform job result to step output
100
- output: (result, ctx) => ({
101
- processed: true,
102
- data: result.data,
103
- }),
115
+ // Input: Zod schema (for first step) OR mapper function (for subsequent steps)
116
+ // First step - validates workflow input:
117
+ inputSchema: z.object({ orderId: z.string() }),
118
+ // Subsequent steps - maps previous output to this step's input:
119
+ // inputSchema: (prev, workflowInput) => ({ orderId: prev.orderId }),
120
+
121
+ // Optional: Zod schema for output validation
122
+ outputSchema: z.object({ success: z.boolean(), data: z.any() }),
123
+
124
+ // Required: inline handler function
125
+ handler: async (input, ctx) => {
126
+ // input is typed from inputSchema
127
+ // ctx provides access to plugins, prev, steps, etc.
128
+ return { success: true, data: await processOrder(input.orderId) };
129
+ },
104
130
 
105
131
  // Optional: retry configuration
106
132
  retry: {
@@ -119,6 +145,58 @@ workflow("example")
119
145
  })
120
146
  ```
121
147
 
148
+ #### Input Schema Options
149
+
150
+ **Option 1: Zod Schema (first step or when validating workflow input)**
151
+ ```typescript
152
+ .task("validate", {
153
+ inputSchema: z.object({ orderId: z.string(), userId: z.string() }),
154
+ handler: async (input, ctx) => {
155
+ // input: { orderId: string, userId: string } - validated from workflow input
156
+ return { valid: true };
157
+ },
158
+ })
159
+ ```
160
+
161
+ **Option 2: Mapper Function (subsequent steps)**
162
+ ```typescript
163
+ .task("charge", {
164
+ // prev = output from previous step, workflowInput = original workflow input
165
+ inputSchema: (prev, workflowInput) => ({
166
+ amount: prev.total,
167
+ userId: workflowInput.userId,
168
+ }),
169
+ handler: async (input, ctx) => {
170
+ // input: { amount: number, userId: string } - inferred from mapper return
171
+ return { chargeId: "ch_123" };
172
+ },
173
+ })
174
+ ```
175
+
176
+ #### Legacy API (Job-based)
177
+
178
+ For backward compatibility, you can still use job references:
179
+
180
+ ```typescript
181
+ workflow("example")
182
+ .task("step-name", {
183
+ // Job name to execute
184
+ job: "my-job-name",
185
+
186
+ // Optional: transform workflow context to job input
187
+ input: (ctx) => ({
188
+ orderId: ctx.input.orderId,
189
+ previousResult: ctx.steps.previousStep,
190
+ }),
191
+
192
+ // Optional: transform job result to step output
193
+ output: (result, ctx) => ({
194
+ processed: true,
195
+ data: result.data,
196
+ }),
197
+ })
198
+ ```
199
+
122
200
  ### Parallel
123
201
 
124
202
  Runs multiple workflow branches concurrently.
@@ -219,6 +297,9 @@ interface WorkflowContext {
219
297
  /** Results from completed steps (keyed by step name) */
220
298
  steps: Record<string, any>;
221
299
 
300
+ /** Output from the previous step (undefined for first step) */
301
+ prev?: any;
302
+
222
303
  /** Current workflow instance */
223
304
  instance: WorkflowInstance;
224
305
 
@@ -230,18 +311,17 @@ interface WorkflowContext {
230
311
  Example usage in step configuration:
231
312
 
232
313
  ```typescript
314
+ // Using inputSchema mapper function (recommended)
233
315
  .task("process", {
234
- job: "process-data",
235
- input: (ctx) => ({
236
- // Access original input
237
- orderId: ctx.input.orderId,
238
-
239
- // Access previous step output
240
- validationResult: ctx.steps.validate,
241
-
242
- // Type-safe access
243
- amount: ctx.getStepResult<{ amount: number }>("calculate")?.amount,
316
+ inputSchema: (prev, workflowInput) => ({
317
+ orderId: workflowInput.orderId,
318
+ validationResult: prev, // prev = output from previous step
244
319
  }),
320
+ handler: async (input, ctx) => {
321
+ // Access any step's output
322
+ const calcResult = ctx.getStepResult<{ amount: number }>("calculate");
323
+ return { processed: true, amount: calcResult?.amount };
324
+ },
245
325
  })
246
326
  ```
247
327
 
@@ -421,24 +501,41 @@ For this to work properly:
421
501
 
422
502
  ```typescript
423
503
  import { AppServer, workflow, createDatabase } from "@donkeylabs/server";
504
+ import { z } from "zod";
424
505
 
425
- // Define workflow
506
+ // Define workflow with inline handlers
426
507
  const onboardingWorkflow = workflow("user-onboarding")
427
508
  .timeout(86400000) // 24 hour max
428
509
  .defaultRetry({ maxAttempts: 3 })
429
510
 
511
+ // First step: inputSchema validates workflow input
430
512
  .task("create-account", {
431
- job: "create-user-account",
432
- input: (ctx) => ctx.input,
513
+ inputSchema: z.object({
514
+ email: z.string().email(),
515
+ name: z.string(),
516
+ plan: z.enum(["free", "pro", "enterprise"]),
517
+ }),
518
+ outputSchema: z.object({ userId: z.string() }),
519
+ handler: async (input, ctx) => {
520
+ const user = await ctx.plugins.users.create({
521
+ email: input.email,
522
+ name: input.name,
523
+ });
524
+ return { userId: user.id };
525
+ },
433
526
  })
434
527
 
528
+ // Subsequent steps: inputSchema maps previous output
435
529
  .task("send-welcome-email", {
436
- job: "send-email",
437
- input: (ctx) => ({
438
- to: ctx.input.email,
439
- template: "welcome",
440
- userId: ctx.steps["create-account"].userId,
530
+ inputSchema: (prev, workflowInput) => ({
531
+ to: workflowInput.email,
532
+ template: "welcome" as const,
533
+ userId: prev.userId,
441
534
  }),
535
+ handler: async (input, ctx) => {
536
+ await ctx.plugins.email.send(input);
537
+ return { sent: true };
538
+ },
442
539
  })
443
540
 
444
541
  .choice("check-plan", {
@@ -452,14 +549,24 @@ const onboardingWorkflow = workflow("user-onboarding")
452
549
  })
453
550
 
454
551
  .task("enterprise-setup", {
455
- job: "setup-enterprise",
456
- input: (ctx) => ({ userId: ctx.steps["create-account"].userId }),
552
+ // After a choice step, use handler to access specific step outputs
553
+ handler: async (input, ctx) => {
554
+ const userId = ctx.steps["create-account"].userId;
555
+ await ctx.plugins.accounts.setupEnterprise({
556
+ userId,
557
+ features: ["sso", "audit-logs", "dedicated-support"],
558
+ });
559
+ return { setup: "enterprise", userId };
560
+ },
457
561
  next: "complete",
458
562
  })
459
563
 
460
564
  .task("standard-setup", {
461
- job: "setup-standard",
462
- input: (ctx) => ({ userId: ctx.steps["create-account"].userId }),
565
+ handler: async (input, ctx) => {
566
+ const userId = ctx.steps["create-account"].userId;
567
+ await ctx.plugins.accounts.setupStandard({ userId });
568
+ return { setup: "standard", userId };
569
+ },
463
570
  next: "complete",
464
571
  })
465
572
 
@@ -476,19 +583,6 @@ const onboardingWorkflow = workflow("user-onboarding")
476
583
  // Setup server
477
584
  const server = new AppServer({ db: createDatabase() });
478
585
 
479
- // Register jobs that workflows use
480
- server.getCore().jobs.register("create-user-account", async (data) => {
481
- // ... create user
482
- return { userId: "user-123" };
483
- });
484
-
485
- server.getCore().jobs.register("send-email", async (data) => {
486
- // ... send email
487
- return { sent: true };
488
- });
489
-
490
- // ... register other jobs
491
-
492
586
  // Register workflow
493
587
  server.getCore().workflows.register(onboardingWorkflow);
494
588