npm - workerflow - Versions diffs - 0.1.0 → 0.2.0 - Mend

workerflow 0.1.0 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (24) hide show

package/README.md +32 -10
package/package.json +1 -1
package/src/definition.ts +126 -174
package/src/migrations/0000_initial.ts +84 -285
package/src/runtime.ts +609 -950
package/test/runtime.spec.ts +618 -1074
package/demo/README.md +0 -73
package/demo/index.html +0 -13
package/demo/package.json +0 -33
package/demo/public/vite.svg +0 -1
package/demo/src/App.css +0 -0
package/demo/src/App.tsx +0 -9
package/demo/src/assets/Cloudflare_Logo.svg +0 -51
package/demo/src/assets/react.svg +0 -1
package/demo/src/index.css +0 -1
package/demo/src/main.tsx +0 -10
package/demo/tsconfig.app.json +0 -28
package/demo/tsconfig.json +0 -14
package/demo/tsconfig.node.json +0 -25
package/demo/tsconfig.worker.json +0 -13
package/demo/vite.config.ts +0 -9
package/demo/worker/index.ts +0 -16
package/demo/worker-configuration.d.ts +0 -12851
package/demo/wrangler.jsonc +0 -32

package/README.md CHANGED Viewed

@@ -88,7 +88,22 @@ export default {
 } satisfies ExportedHandler<Env>;
 ```
-Workflow input is **`this.ctx.props.input`**, populated from **`create({ input })`**. Use a **stable `definitionVersion`** string per deploy you want long-running instances to keep using; add a new version in **`getDefinition`** when you ship breaking definition changes.
+Workflow input is **`this.ctx.props.input`**, populated from **`create({ input })`**. The runtime also sets **`this.ctx.props.requestId`** (a new UUID each time the run loop invokes your definition) and **`this.ctx.props.runtimeInstanceId`** (this Durable Object’s id) for logs and correlation. Use a **stable `definitionVersion`** string per deploy you want long-running instances to keep using; add a new version in **`getDefinition`** when you ship breaking definition changes.
+### Runtime control
+From the Durable Object stub you can:
+- **`create({ definitionVersion, input? })`** — Pins the version and optional input in SQLite the **first** time the instance is initialized, then starts execution. **No-op** if the workflow is already **completed**, **failed**, **cancelled**, or **paused**. Throws if the object was already pinned to a **different** version.
+- **`pause()`** — When status is **running**, moves to **paused**, clears alarms, and stops driving **`execute()`** until **`resume()`**. Inbound events are queued and applied when a matching **`wait`** runs again after resume.
+- **`resume()`** — When status is **paused**, moves to **running** and continues the loop. Throws if the workflow is not paused.
+- **`cancel(reason?)`** — Moves to terminal **cancelled** and clears alarms.
+New instances start in **`pending`** until the first transition to **`running`**.
+### Experimental introspection
+For dashboards and debugging, the runtime exposes **`getSteps_experimental()`** and **`getWorkflowEvents_experimental()`**. The optional lifecycle hook is **`onStatusChange_experimental`** (see [Keeping workflow execution separate from state projection](#keeping-workflow-execution-separate-from-state-projection)). These names are marked experimental because they may change as the API hardens.
 ## How it works
@@ -102,17 +117,21 @@ The library separates concerns into two main layers:
 Each time the runtime advances, it calls `next()` on your `WorkflowDefinition`, which **runs `execute()` from the beginning again**. Steps that have already completed durably (`run`, elapsed `sleep`, resolved `wait`, and so on) **replay from stored state**: their callbacks are not re-invoked, and recorded results are returned as-is. New side effects happen only when the engine reaches a step that is not yet complete and the durable state allows that transition.
+**Step ids must be unique** within one top-level **`execute()`** run (the same **`next()`** invocation): reuse the same id across **`run`**, **`sleep`**, or **`wait`** and the workflow fails fast.
+**Sibling `run` calls.** At a given nesting level, after one **`run`** finishes successfully in the same **`next()`**, the next sibling **`run`** forces the runtime to **run the loop again immediately** (you still replay from the top; completed steps stay cached). For linear workflows this is invisible; if you place several **`run`** calls back-to-back at the same depth, expect an extra loop hop per step after the first. Nested **`run`** callbacks get a fresh frame, so children do not consume the parent’s sibling budget.
 ### When the loop runs and when it stops
 The `WorkflowRuntime` Durable Object drives a **run loop** that repeatedly invokes `next()` until one of these happens:
-- **Terminal**: `next()` reports the workflow is **done** (`completed` or `failed`). The loop exits and the watchdog alarm is cleared.
+- **Terminal**: `next()` reports the workflow is **done** (`completed` or `failed`), or the instance is **`cancelled`** via **`cancel()`** while the loop is idle or between iterations. The loop exits and the watchdog alarm is cleared.
 - **Immediate resume**: `next()` asks to **continue immediately** (for example, so another step in the same logical “tick” can run). The loop continues without leaving the Durable Object invocation.
 - **Suspended**: `next()` asks to **suspend**—for example, a step is waiting on a **retry backoff**, a **sleep** until a future time, or a **wait** for an inbound event. The loop exits; the runtime relies on **alarms** and/or **incoming events** to call back into the run loop. A long **watchdog alarm** also exists as a safety net if progress stalls.
 ### Step kinds
-- **`run`**: A named, durable unit of work. Outcomes are persisted; failures can be **retried** with backoff up to a configured maximum number of attempts.
+- **`run`**: A named, durable unit of work. Callbacks return JSON-serializable values or `undefined`. Outcomes are persisted; failures can be **retried** with backoff up to **`maxAttempts`** (default **3** attempts per step unless you pass `{ maxAttempts: n }`).
 - **`sleep`**: Pauses until a **scheduled wake time** stored in SQLite; the Durable Object is woken by an **alarm** when that time is reached.
 - **`wait`**: Pauses until a matching **inbound event** (by name) or an optional **timeout**. Resolution is recorded in durable state so replay does not double-apply the branch that handled the event.
@@ -159,7 +178,7 @@ const payment = await this.wait<{ chargeId: string }>("capture-payment", "paymen
 #### The watchdog alarm
-In addition to these precise alarms, the runtime sets a **30-minute watchdog alarm at the start of every run-loop iteration**, before delegating to the workflow definition. When an iteration completes normally whether the workflow finishes, suspends on a sleep, or suspends on a wait - that alarm is either deleted or overwritten with the more specific alarm for the next resumption point. The watchdog only fires if something goes wrong in the middle.
+In addition to these precise alarms, the runtime sets a **30-minute watchdog alarm at the start of every run-loop iteration**, before delegating to the workflow definition. When an iteration ends cleanly—workflow terminal completion, suspend with a known **`wakeAt`**, or suspend waiting only on inbound events—the alarm is **cleared** or **replaced** by the next wake time when there is one. A **`wait`** with **no** `timeoutAt` has no step-specific alarm until an event arrives; the watchdog remains the backstop. The watchdog only fires if something goes wrong in the middle.
 The problem it guards against is a `run` step that gets stuck in the `running` state. Before the user's callback executes, the runtime durably writes `state = 'running'` to SQLite. That write is intentional: it ensures that a later replay does not try to start a second concurrent attempt for the same step. But it creates a gap:
@@ -176,7 +195,7 @@ There is also a guard for the case where an alarm fires while the run loop is al
 ### Versioning
-`create({ definitionVersion, input })` **pins** the definition version and input in SQLite the first time the instance is initialized. **The version cannot be changed later** for that Durable Object id; attempting a different version throws. Every subsequent `next()` resolves the worker implementation via **`getDefinition(version)`** using that pinned value, so **long-lived workflows keep running the definition lineage they started with**, while new instances can use newer version strings you add to `getDefinition`.
+`create({ definitionVersion, input })` **pins** the definition version and optional input in SQLite the first time the instance is initialized (see [Runtime control](#runtime-control) for no-op cases). **The version cannot be changed later** for that Durable Object id; attempting a different version throws. Every subsequent `next()` resolves the worker implementation via **`getDefinition(version)`** using that pinned value, so **long-lived workflows keep running the definition lineage they started with**, while new instances can use newer version strings you add to `getDefinition`.
 ## Why this exists
@@ -190,9 +209,9 @@ Cloudflare Workflows is a strong managed option, and for many use cases it is th
 ### Versioning workflow definitions
-One of the biggest concerns in long-running workflows is definition drift. A normal Worker request is typically bound to a single in-flight execution on one deployed version, but a Workflow is durable: it persists state and resumes across multiple executions over time. A workflow begin executing on version of its definition and resume later after a new deploy has changed or removed a step. That means the next invocation of the workflow entry point could repeat steps unsafely or leave the runtime in an invalid state.
+One of the biggest concerns in long-running workflows is definition drift. A normal Worker request is typically bound to a single in-flight execution on one deployed version, but a Workflow is durable: it persists state and resumes across multiple executions over time. A workflow may start on one version of its definition and resume later after a deploy has changed or removed a step. That means the next invocation of the workflow entry point could repeat steps unsafely or leave the runtime in an invalid state.
-Versioning does not eliminate these problems, but it makes the risk explicit. It forces you to think about compatibility, migration, and long-lived execution up front. Cloudflare Workflows can support version-aware workflows by passing a version token in the immutable per-instance parameters and branching in workflow code or by maintianing a version mapping in an external database, but both are conventions that your application is responsible for maintaining.
+Versioning does not eliminate these problems, but it makes the risk explicit. It forces you to think about compatibility, migration, and long-lived execution up front. Cloudflare Workflows can support version-aware workflows by passing a version token in the immutable per-instance parameters and branching in workflow code or by maintaining a version mapping in an external database, but both are conventions that your application is responsible for maintaining.
 `workerflow` takes a different approach: the runtime pins a definition version when the instance is created and resolves future execution against that pinned version. The goal is not to make compatibility problems disappear, but to make the version boundary explicit in the runtime rather than implicit in workflow input and application code.
@@ -222,12 +241,15 @@ export class MyWorkflow extends WorkflowEntrypoint {
 This looks reasonable at first, but it creates an important failure-mode problem. If the actual business steps all succeed, but the final “sync success” step fails, then the workflow as a whole is now treated as failed. At that point, workflow execution and application-state projection have become tightly coupled, even though they are not really the same concern.
-I think a cleaner design is to keep synchronization logic out of workflow steps entirely. Instead, the runtime can expose lifecycle hooks that fire when workflow state changes, and synchronization can happen there.
+I think a cleaner design is to keep synchronization logic out of workflow steps entirely. Instead, the runtime can expose a lifecycle hook that fires when workflow status changes, and synchronization can happen there.
 ```ts
 export class MyWorkflowRuntime extends WorkflowRuntime {
-  onStatusChange(status) {
-    // Update your database, or push to a queue for streaming
+  async onStatusChange_experimental(
+    status: "running" | "paused" | "completed" | "failed" | "cancelled"
+  ) {
+    // Update your database, or push to a queue for streaming.
+    // Note: the hook is also invoked with "running" when leaving pending/paused into running.
   }
 }
 ```

package/package.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
   "name": "workerflow",
   "description": "Durable execution engine, built on Cloudflare Workers",
-  "version": "0.1.0",
+  "version": "0.2.0",
   "type": "module",
   "exports": {
     ".": {

package/src/definition.ts CHANGED Viewed

@@ -54,15 +54,17 @@ export abstract class WorkflowDefinition<TInput extends Json | undefined = Json
    *
    *   - { done: true; status: "completed" | "failed" }: the workflow has completed or aborted.
    *   - { done: false; resume: { type: "immediate" } }: the workflow should resume immediately.
-   *   - { done: false; resume: { type: "suspended" } }: the workflow should suspend itself and wait for the next alarm or
-   *     inbound event to resume.
+   *   - { done: false; resume: { type: "suspended", wakeAt?: number } }: the workflow should suspend itself and wait for
+   *     the next alarm or inbound event to resume. The `wakeAt` property is the timestamp at which the workflow should
+   *     wake up. If the `wakeAt` property is not present, the workflow should wait for the next inbound event to
+   *     resume.
    * @internal
    */
   async next(context: WorkflowRuntimeContext): Promise<
     | { done: true; status: "completed" | "failed" }
     | {
         done: false;
-        resume: { type: "immediate" } | { type: "suspended" };
+        resume: { type: "immediate" } | { type: "suspended"; wakeAt?: number };
       }
   > {
     this.#context = context;
@@ -75,7 +77,7 @@ export abstract class WorkflowDefinition<TInput extends Json | undefined = Json
       if (error instanceof ResumeImmediatelyError) {
         return { done: false, resume: { type: "immediate" } };
       } else if (error instanceof SuspendWorkflowError) {
-        return { done: false, resume: { type: "suspended" } };
+        return { done: false, resume: { type: "suspended", wakeAt: error.wakeAt } };
       } else if (error instanceof AbortWorkflowError) {
         return { done: true, status: "failed" };
       } else if (
@@ -89,24 +91,24 @@ export abstract class WorkflowDefinition<TInput extends Json | undefined = Json
         // An exception can be thrown when calling a method on the WorkflowContext RPC target.
         // The resulting exception will have a 'remote' property set to 'True' in this case.
         if (error instanceof Error && "remote" in error && error.remote) {
-          console.info(error, { requestId: this.#requestId, runtimeInstanceId: this.#runtimeInstanceId });
           /**
            * When calling Durable Objects from a Worker, errors may include .retryable and .overloaded properties
-           * indicating whether the operation can be retried. See:
-           * https://developers.cloudflare.com/durable-objects/best-practices/rules-of-durable-objects/#handle-errors-and-use-exception-boundaries.
+           * indicating whether the operation can be retried.
+           *
+           * See: https://developers.cloudflare.com/durable-objects/best-practices/error-handling/
            */
           if ("retryable" in error && error.retryable) {
-            return { done: false, resume: { type: "suspended" } };
-          }
-          // An 'WorkflowInvariantError' indicates that the workflow engine is in an invalid state and the workflow should be aborted.
-          else if (error.message.startsWith("WorkflowInvariantError")) {
+            console.info(error, { requestId: this.#requestId, runtimeInstanceId: this.#runtimeInstanceId });
+            // If the error is retryable, we hint the workflow to suspend and retry after 5 minutes.
+            // In future, we can use a more sophisticated retry strategy.
+            return { done: false, resume: { type: "suspended", wakeAt: new Date().getTime() + 5 * 60 * 1000 } };
+          } else {
+            console.error(error, { requestId: this.#requestId, runtimeInstanceId: this.#runtimeInstanceId });
+            // All other (non-retryable) errors are considered fatal and the workflow should be aborted.
             return { done: true, status: "failed" };
           }
-          // All other remote errors are considered to be transient, so we instruct the workflow to suspend itself and wait for the next alarm to resume.
-          else {
-            return { done: false, resume: { type: "suspended" } };
-          }
         }
         // All other non-remote errors are considered fatal and the workflow should be aborted.
         console.error(error instanceof Error ? error : String(error), {
           requestId: this.#requestId,
@@ -137,6 +139,51 @@ export abstract class WorkflowDefinition<TInput extends Json | undefined = Json
   abstract execute(): Promise<void>;
+  async #processRunStepAttempt<T extends Json | undefined | void>(
+    stepId: RunStepId,
+    ctx: WorkflowRuntimeContext,
+    callback: () => Promise<T>
+  ): Promise<T> {
+    let _result: unknown;
+    try {
+      _result = await this.#runStepFrameContext.run(
+        { numOfSuccessfulRunCallbacks: 0, parentStepId: stepId },
+        async () => await callback()
+      );
+    } catch (error) {
+      /**
+       * A 'run' step callback can include nested steps that can throw control flow errors like 'ResumeImmediatelyError'
+       * and 'SuspendWorkflowError'. We rethrow these errors without recording a failure on this (parent) attempt.
+       */
+      if (error instanceof ResumeImmediatelyError || error instanceof SuspendWorkflowError) {
+        throw error;
+      }
+      const updated = await ctx.handleRunAttemptFailed(stepId, {
+        errorMessage: String(error),
+        errorName: error instanceof Error ? error.name : undefined,
+        isNonRetryableStepError: error instanceof NonRetryableStepError
+      });
+      if (error instanceof NonRetryableStepError) throw error;
+      if (updated.nextAttemptAt === undefined) {
+        const error = new MaxAttemptsExceededError();
+        Error.captureStackTrace(error, WorkflowDefinition.prototype.run);
+        throw error;
+      }
+      throw new SuspendWorkflowError(updated.nextAttemptAt.getTime());
+    }
+    // SQL NULL (resultJson === null) encodes `undefined`; otherwise raw JSON.stringify for the value.
+    const resultJson = _result === undefined ? null : JSON.stringify(_result);
+    await ctx.handleRunAttemptSucceeded(stepId, resultJson);
+    this.#getRunStepFrame().numOfSuccessfulRunCallbacks += 1;
+    return _result as T;
+  }
   protected async run<T extends Json | undefined | void>(
     id: string,
     callback: () => Promise<T>,
@@ -156,8 +203,7 @@ export abstract class WorkflowDefinition<TInput extends Json | undefined = Json
     const parentStepId = this.#getRunStepFrame().parentStepId;
-    const step = await ctx.getOrCreateStep(runStepId, {
-      type: "run",
+    const step = await ctx.getOrCreateRunStep(runStepId, {
       maxAttempts: config?.maxAttempts,
       parentStepId
     });
@@ -166,160 +212,49 @@ export abstract class WorkflowDefinition<TInput extends Json | undefined = Json
       throw new ResumeImmediatelyError();
     }
-    if (step.state === "pending") {
-      if (step.nextAttemptAt.getTime() > Date.now()) {
-        throw new SuspendWorkflowError();
-      }
-      const attemptCount = step.attemptCount + 1; // Increment the attempt count by 1 as we're starting a new attempt
-      const maxAttempts = step.maxAttempts;
-      await ctx.handleRunAttemptEvent(runStepId, {
-        type: "running",
-        attemptCount: attemptCount
-      });
+    const lastAttempt = step.attempts[step.attempts.length - 1];
+    if (lastAttempt === undefined) {
+      await ctx.handleRunAttemptStarted(runStepId);
-      let _result: unknown;
-      try {
-        _result = await this.#runStepFrameContext.run(
-          { numOfSuccessfulRunCallbacks: 0, parentStepId: runStepId },
-          async () => await callback()
-        );
-      } catch (error) {
-        // 'ResumeImmediatelyError' and 'SuspendWorkflowError' are rethrown so a nested `run()` does not record a spurious failure on the parent.
-        if (error instanceof ResumeImmediatelyError || error instanceof SuspendWorkflowError) {
-          throw error;
-        }
-        await ctx.handleRunAttemptEvent(runStepId, {
-          type: "failed",
-          errorMessage: String(error),
-          errorName: error instanceof Error ? error.name : undefined,
-          attemptCount: attemptCount,
-          isNonRetryableStepError: error instanceof NonRetryableStepError
+      return await this.#processRunStepAttempt(runStepId, ctx, callback);
+    } else if (lastAttempt.state === "started") {
+      const hasInProgressChildSteps = await ctx.hasInProgressChildSteps(runStepId);
+      if (!hasInProgressChildSteps) {
+        const updated = await ctx.handleRunAttemptFailed(runStepId, {
+          errorMessage: STEP_EXECUTION_INTERRUPTED_ERROR_MESSAGE,
+          errorName: undefined
         });
-        if (error instanceof NonRetryableStepError) {
-          throw error;
-        }
-        if (maxAttempts !== null && attemptCount >= maxAttempts) {
+        if (updated.nextAttemptAt === undefined) {
           const error = new MaxAttemptsExceededError();
           Error.captureStackTrace(error, WorkflowDefinition.prototype.run);
           throw error;
         }
-        throw new SuspendWorkflowError();
-      }
-      let result: string;
-      if (_result === undefined) {
-        result = "{}";
+        throw new SuspendWorkflowError(updated.nextAttemptAt.getTime());
       } else {
-        result = JSON.stringify({ value: _result });
+        return await this.#processRunStepAttempt(runStepId, ctx, callback);
       }
-      await ctx.handleRunAttemptEvent(runStepId, {
-        type: "succeeded",
-        attemptCount: attemptCount,
-        result: result
-      });
-      this.#getRunStepFrame().numOfSuccessfulRunCallbacks += 1;
-      return _result as T;
-    } else if (step.state === "running") {
-      const maxAttempts = step.maxAttempts;
-      const attemptCount = step.attemptCount;
-      // If no direct child row explains the parent still being `running` (see `hasRunningOrWaitingChildSteps`), fail the attempt as interrupted.
-      if (!(await ctx.hasRunningOrWaitingChildSteps(runStepId))) {
-        await ctx.handleRunAttemptEvent(runStepId, {
-          type: "failed",
-          errorMessage: STEP_EXECUTION_INTERRUPTED_ERROR_MESSAGE,
-          errorName: undefined,
-          attemptCount: attemptCount
-        });
-        if (maxAttempts !== null && attemptCount >= maxAttempts) {
-          const error = new MaxAttemptsExceededError();
-          Error.captureStackTrace(error, WorkflowDefinition.prototype.run);
-          throw error;
+    } else if (lastAttempt.state === "failed") {
+      if (lastAttempt.nextAttemptAt) {
+        if (lastAttempt.nextAttemptAt.getTime() <= Date.now()) {
+          await ctx.handleRunAttemptStarted(runStepId);
+          return await this.#processRunStepAttempt(runStepId, ctx, callback);
         } else {
-          throw new SuspendWorkflowError();
-        }
-      }
-      // Direct children in non-failure states: continue the same attempt by re-entering the callback.
-      let _result: unknown;
-      try {
-        _result = await this.#runStepFrameContext.run(
-          { numOfSuccessfulRunCallbacks: 0, parentStepId: runStepId },
-          async () => await callback()
-        );
-      } catch (error) {
-        if (error instanceof ResumeImmediatelyError || error instanceof SuspendWorkflowError) {
-          throw error;
-        }
-        await ctx.handleRunAttemptEvent(runStepId, {
-          type: "failed",
-          errorMessage: String(error),
-          errorName: error instanceof Error ? error.name : undefined,
-          attemptCount: attemptCount,
-          isNonRetryableStepError: error instanceof NonRetryableStepError
-        });
-        if (error instanceof NonRetryableStepError) {
-          throw error;
-        }
-        if (maxAttempts !== null && attemptCount >= maxAttempts) {
-          const err = new MaxAttemptsExceededError();
-          Error.captureStackTrace(err, WorkflowDefinition.prototype.run);
-          throw err;
+          throw new SuspendWorkflowError(lastAttempt.nextAttemptAt.getTime());
         }
-        throw new SuspendWorkflowError();
-      }
-      const result: string = _result === undefined ? "{}" : JSON.stringify({ value: _result });
-      await ctx.handleRunAttemptEvent(runStepId, {
-        type: "succeeded",
-        attemptCount: attemptCount,
-        result: result
-      });
-      this.#getRunStepFrame().numOfSuccessfulRunCallbacks += 1;
-      return _result as T;
-    } else if (step.state === "failed") {
-      throw new AbortWorkflowError();
-    } else if (step.state === "succeeded") {
-      const parsed: unknown = JSON.parse(step.result);
-      if (typeof parsed !== "object" || parsed === null || Array.isArray(parsed)) {
-        throw new Error(
-          "Invalid stored workflow result; expected a non-null object payload; storage may be corrupted or written by an incompatible version."
-        );
-      }
-      const keys = Object.keys(parsed);
-      // "{}" means top-level undefined
-      if (keys.length === 0) {
-        return undefined as T;
+      } else {
+        throw new AbortWorkflowError();
       }
-      if (keys.length === 1 && Object.hasOwn(parsed, "value")) {
-        return (parsed as { value: T }).value;
+    } else if (lastAttempt.state === "succeeded") {
+      // Replay: the callback is NOT re-executed. Reconstruct the return value from durable state.
+      if (lastAttempt.resultType === "json") {
+        return JSON.parse(lastAttempt.resultJson) as T;
       }
-      throw new Error(
-        "Invalid stored workflow result; expected an object payload with a 'value' property or an empty object; storage may be corrupted or written by an incompatible version."
-      );
+      return undefined as T;
+    } else {
+      throw new Error("Unexpected run step attempt state; expected 'started', 'failed', or 'succeeded'.");
     }
-    throw new Error("Unexpected run step state; expected 'pending', 'running', 'failed', or 'succeeded'.");
   }
   protected async sleep(id: string, duration: number): Promise<void> {
@@ -333,8 +268,7 @@ export abstract class WorkflowDefinition<TInput extends Json | undefined = Json
       throw error;
     }
-    const step = await ctx.getOrCreateStep(sleepStepId, {
-      type: "sleep",
+    const step = await ctx.getOrCreateSleepStep(sleepStepId, {
       wakeAt: new Date(Date.now() + duration),
       parentStepId: this.#getRunStepFrame().parentStepId
     });
@@ -345,11 +279,11 @@ export abstract class WorkflowDefinition<TInput extends Json | undefined = Json
     } else if (step.state === "waiting") {
       // If the sleep step is not yet due to wake up, we suspend the workflow.
       if (Date.now() < step.wakeAt.getTime()) {
-        throw new SuspendWorkflowError();
+        throw new SuspendWorkflowError(step.wakeAt.getTime());
       }
       // If the sleep step is due to wake up, we mark the step as elapsed and throw a 'ResumeImmediatelyError' to hint the driver to resume the workflow immediately.
       else {
-        await ctx.handleSleepStepEvent(sleepStepId, { type: "elapsed" });
+        await ctx.handleSleepStepElapsed(sleepStepId);
         throw new ResumeImmediatelyError();
       }
     }
@@ -357,7 +291,11 @@ export abstract class WorkflowDefinition<TInput extends Json | undefined = Json
     throw new Error("Unexpected sleep step state; expected 'waiting' or 'elapsed'.");
   }
-  protected async wait<T extends Json>(id: string, event: string, config?: { timeoutAt?: number }): Promise<T> {
+  protected async wait<T extends Json | undefined>(
+    id: string,
+    event: string,
+    config?: { timeoutAt?: number }
+  ): Promise<T> {
     const waitStepId = id as WaitStepId;
     this.#assertUniqueStepIdInCurrentExecution(waitStepId);
@@ -368,30 +306,34 @@ export abstract class WorkflowDefinition<TInput extends Json | undefined = Json
       throw error;
     }
-    const step = await ctx.getOrCreateStep(waitStepId, {
-      type: "wait",
+    const step = await ctx.getOrCreateWaitStep<T>(waitStepId, {
       eventName: event,
       timeoutAt: config?.timeoutAt ? new Date(config.timeoutAt) : undefined,
       parentStepId: this.#getRunStepFrame().parentStepId
     });
     if (step.state === "waiting") {
-      // If the wait step has a timeout and the timeout has been reached, we mark the step as timed out and throw an 'AbortWorkflowError' to abort the workflow.
-      if (step.timeoutAt !== undefined && Date.now() >= step.timeoutAt.getTime()) {
-        await ctx.handleWaitStepEvent(waitStepId, { type: "timed_out" });
-        const error = new WaitStepTimedOutError();
-        Error.captureStackTrace(error, WorkflowDefinition.prototype.wait);
-        throw error;
+      if (step.timeoutAt !== undefined) {
+        // If the timeout has been reached (or exceeded), we mark the step as timed out and throw an 'AbortWorkflowError' to abort the workflow.
+        if (Date.now() >= step.timeoutAt.getTime()) {
+          await ctx.handleWaitStepTimedOut(waitStepId);
+          const error = new WaitStepTimedOutError();
+          Error.captureStackTrace(error, WorkflowDefinition.prototype.wait);
+          throw error;
+        } else {
+          // If the timeout has not been reached, we suspend the workflow and wait for the next alarm to resume.
+          throw new SuspendWorkflowError(step.timeoutAt.getTime());
+        }
+      } else {
+        // If the wait step does not have a timeout, we suspend the workflow and wait for the next inbound event to resume.
+        throw new SuspendWorkflowError();
       }
-      // Otherwise, we hint the driver to suspend the workflow until the next alarm or inbound event to resume.
-      throw new SuspendWorkflowError();
     } else if (step.state === "timed_out") {
       // If the wait step has timed out, we throw an 'AbortWorkflowError' to abort the workflow.
       throw new AbortWorkflowError();
     } else if (step.state === "satisfied") {
       // If the wait step has been satisfied, we return the payload of the satisfied step.
-      return JSON.parse(step.payload) as T;
+      return step.payload;
     }
     throw new Error("Unexpected wait step state; expected 'waiting', 'satisfied', or 'timed_out'.");
@@ -399,7 +341,17 @@ export abstract class WorkflowDefinition<TInput extends Json | undefined = Json
 }
 class ResumeImmediatelyError extends Error {}
-class SuspendWorkflowError extends Error {}
+class SuspendWorkflowError extends Error {
+  readonly #wakeAt?: number;
+  constructor(wakeAt?: number) {
+    super();
+    this.#wakeAt = wakeAt;
+    this.name = "SuspendWorkflowError";
+  }
+  get wakeAt() {
+    return this.#wakeAt;
+  }
+}
 class AbortWorkflowError extends Error {}
 class MaxAttemptsExceededError extends Error {}