npm - @redflow/client - Versions diffs - 0.0.2 → 0.0.4 - Mend

@redflow/client 0.0.2 → 0.0.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (10) hide show

package/INTERNALS.md +238 -0
package/README.md +34 -3
package/package.json +1 -1
package/src/client.ts +23 -20
package/src/types.ts +7 -1
package/src/worker.ts +102 -21
package/tests/bugfixes.test.ts +11 -11
package/tests/fixtures/worker-crash.ts +1 -0
package/tests/fixtures/worker-recover.ts +1 -0
package/tests/redflow.e2e.test.ts +182 -72

package/INTERNALS.md ADDED Viewed

@@ -0,0 +1,238 @@
+# redflow internals
+This document describes how `@redflow/client` works internally in production terms.
+## Design model
+- Durable state lives in Redis.
+- Handlers and workflow code live in process memory (per worker process).
+- The runtime is queue-based and crash-recoverable.
+- Delivery semantics are at-least-once at run level.
+- Step API provides deterministic replay/caching to avoid repeating completed work.
+## Main components
+- **Workflow registry (in-memory):** built via `defineWorkflow(...)`.
+- **Client (`RedflowClient`):** enqueue runs, inspect state, cancel runs, sync metadata.
+- **Worker runtime:** executes queued runs, retries failures, promotes scheduled runs.
+- **Cron scheduler:** leader-elected loop that creates cron runs.
+## Registry and metadata sync
+`startWorker({ app, ... })` always calls `syncRegistry(registry, { app })` before loops start.
+What `syncRegistry` writes per workflow:
+- `workflow:<name>` hash:
+  - `name`
+  - `queue`
+  - `maxConcurrency` (default `1`)
+  - `app` (required ownership scope for cleanup)
+  - `updatedAt`
+  - `cronJson`
+  - `retriesJson`
+  - `cronIdsJson`
+- `workflows` set (all known workflow names)
+- cron definitions in `cron:def` and schedule in `cron:next`
+### Stale cleanup
+Before writing new metadata, sync removes stale workflow metadata when all are true:
+- workflow exists in Redis,
+- workflow is missing in current registry,
+- workflow `app` equals current `app`,
+- workflow is older than grace period (`30s`).
+Cleanup removes:
+- `workflow:<name>` metadata hash,
+- `workflows` set membership,
+- associated cron entries (`cron:def`, `cron:next`).
+It does **not** delete historical runs.
+## Redis keyspace
+Key builders are in `src/internal/keys.ts`.
+- `workflows`
+- `workflow:<name>`
+- `workflow-runs:<name>`
+- `runs:created`
+- `runs:status:<status>`
+- `run:<runId>`
+- `run:<runId>:steps`
+- `run:<runId>:lease`
+- `q:<queue>:ready`
+- `q:<queue>:processing`
+- `q:<queue>:scheduled`
+- `cron:def`
+- `cron:next`
+- `lock:cron`
+- `idempo:<encodedWorkflow>:<encodedKey>`
+## Run lifecycle
+Statuses:
+- `scheduled`
+- `queued`
+- `running`
+- terminal: `succeeded`, `failed`, `canceled`
+### Enqueue
+Enqueue uses `ENQUEUE_RUN_LUA` atomically:
+- creates run hash,
+- writes indexes (`runs:created`, `runs:status:*`, `workflow-runs:*`),
+- pushes to ready queue or scheduled ZSET,
+- applies idempotency mapping if key was provided.
+Idempotency key TTL defaults to `7 days`.
+### Processing
+Worker loop uses `LMOVE`/`BLMOVE` from `ready` -> `processing`.
+For each claimed run:
+1. Acquire lease (`run:<id>:lease`) with periodic renewal.
+2. Validate current run status.
+3. If `queued`, enforce `maxConcurrency` for that workflow.
+4. Transition `queued -> running` atomically.
+5. Execute handler with step engine.
+6. Finalize to terminal status atomically.
+7. Remove from `processing`.
+If lease is lost, current worker aborts and does not finalize.
+### Reaper
+Reaper scans `processing` lists. For runs without active lease:
+- removes from `processing`,
+- pushes back to `ready`.
+This recovers from worker crashes.
+### Scheduled promoter
+Promoter pops due items from `q:<queue>:scheduled` (`ZPOPMIN` batch), then:
+- transitions `scheduled -> queued`,
+- pushes to `ready`.
+Future items are put back.
+## maxConcurrency
+`maxConcurrency` is per workflow, default `1`.
+### For regular queued runs
+When a worker picks a `queued` run:
+- it counts current `running` runs for the same workflow,
+- if count >= `maxConcurrency`, run is atomically moved from `processing` back to end of `ready`.
+So non-cron runs are delayed (not dropped).
+### For cron runs
+Cron loop also checks running count before enqueue.
+- if count >= `maxConcurrency`, that cron tick is skipped,
+- next cron tick is still scheduled normally.
+## Cron scheduler
+- Leader election via Redis lock `lock:cron`.
+- Only lock holder schedules cron runs.
+- Loop pops earliest `cronId` from `cron:next`.
+- If due:
+  - parses `cron:def` payload,
+  - enforces `maxConcurrency`,
+  - enqueues run via `runByName` (or skips),
+  - computes next fire time and stores in `cron:next`.
+Cron uses "reschedule from now" behavior (no catch-up burst if stale timestamp was in the past).
+## Step engine semantics
+Inside handler, `step` API has three primitives.
+### `step.run(...)`
+- Step state is persisted in `run:<id>:steps` hash under `step.name`.
+- If step already `succeeded`, cached output is returned.
+- Duplicate step names in one execution are rejected.
+- Step timeout and cancellation are supported.
+### `step.runWorkflow(...)`
+- Enqueues child workflow with deterministic idempotency by default:
+  - `parentRunId + stepName + childWorkflowName`.
+- Waits for child completion.
+- Waiting is bounded by step `timeoutMs` (if set), otherwise unbounded until cancellation.
+- Inline assist: if child is queued on a queue this worker handles, worker may execute child inline to avoid self-deadlock with low concurrency.
+### `step.emitWorkflow(...)`
+- Enqueues child workflow and returns child `runId`.
+- Supports child as workflow object or workflow name string.
+- Uses deterministic idempotency default based on parent run and step name.
+## Retry model
+- `maxAttempts` is workflow-level (`retries.maxAttempts`), default `1`.
+- Retry delay uses exponential backoff + jitter.
+- Non-retryable classes:
+  - input validation errors,
+  - unknown workflow,
+  - output serialization errors,
+  - cancellation,
+  - explicit `NonRetriableError`.
+- Retry scheduling is atomic (`scheduleRetry` Lua): status/index update + queue scheduled ZSET write in one script.
+## Cancellation
+`cancelRun(runId)`:
+- sets `cancelRequestedAt` + optional reason,
+- if run is `queued`/`scheduled`, attempts immediate transition to `canceled` and cleanup,
+- if run is `running`, cancellation is cooperative via `AbortSignal` polling in worker.
+Terminal finalize script ensures consistent indexes and terminal status.
+## Idempotency vs step cache
+- **Idempotency:** deduplicates run creation (`key -> runId`) with TTL.
+- **Step cache:** deduplicates completed step execution within one parent run.
+They solve different failure windows and are intentionally both used.
+## Multi-worker behavior
+- Many workers can process same prefix/queues.
+- Cron scheduling is single-leader.
+- Processing/recovery is shared via Redis lists + leases.
+- `maxConcurrency` is enforced globally against Redis `running` index.
+## Operational notes
+Recommended for production:
+- Use stable `prefix` per environment.
+- Use explicit `app` per service role for safe metadata cleanup.
+- Set `maxConcurrency` intentionally for long workflows.
+- Keep queue ownership clear (avoid workers consuming queues for workflows they do not register).
+- Use idempotency keys for external trigger endpoints.
+## Current guarantees and limitations
+- Run execution is at-least-once.
+- Step cache reduces replay but cannot provide global exactly-once side effects.
+- `maxConcurrency` is enforced via runtime checks against Redis state; it is robust in practice but not a strict distributed semaphore proof.
+- `handle.result({ timeoutMs })` timeout affects caller waiting only, not run execution itself.

package/README.md CHANGED Viewed

@@ -2,6 +2,8 @@
 Redis-backed workflow runtime for Bun.
+Deep internal details: `INTERNALS.md`
 ## Warning
 This project is still in early alpha stage.
@@ -105,6 +107,16 @@ const analyticsRunId = await step.emitWorkflow(
 );
 ```
+You can also pass a workflow name string:
+```ts
+const analyticsRunId = await step.emitWorkflow(
+  { name: "emit-analytics" },
+  "analytics-consumer",
+  { orderId: input.orderId, totalCents: input.totalCents },
+);
+```
 ## Run workflows
 The object returned by `defineWorkflow(...)` has `.run(...)`.
@@ -137,13 +149,14 @@ const output = await handle.result({ timeoutMs: 90_000 });
 ## Start a worker
-Import workflows, then run `startWorker()`.
+Import workflows, then run `startWorker({ app: ... })`.
 ```ts
 import { startWorker } from "@redflow/client";
 import "./workflows";
 const worker = await startWorker({
+  app: "billing-worker",
   url: process.env.REDIS_URL,
   prefix: "redflow:prod",
   concurrency: 4,
@@ -154,6 +167,7 @@ Explicit queues + runtime tuning:
 ```ts
 const worker = await startWorker({
+  app: "billing-worker",
   url: process.env.REDIS_URL,
   prefix: "redflow:prod",
   queues: ["critical", "io", "analytics"],
@@ -168,6 +182,21 @@ const worker = await startWorker({
 ## Workflow options examples
+### maxConcurrency
+`maxConcurrency` limits concurrent `running` runs per workflow. Default is `1`.
+```ts
+defineWorkflow(
+  "heavy-sync",
+  {
+    queue: "ops",
+    maxConcurrency: 1,
+  },
+  async () => ({ ok: true }),
+);
+```
 ### Cron
 ```ts
@@ -184,6 +213,8 @@ defineWorkflow(
 );
 ```
+Cron respects `maxConcurrency`: if the limit is reached, that cron tick is skipped.
 ### onFailure
 ```ts
@@ -271,10 +302,10 @@ const output = await handle.result({ timeoutMs: 30_000 });
 console.log(output);
 ```
-### Registry sync ownership
+### Registry sync app id
 ```ts
 import { getDefaultRegistry } from "@redflow/client";
-await client.syncRegistry(getDefaultRegistry(), { owner: "billing-service" });
+await client.syncRegistry(getDefaultRegistry(), { app: "billing-service" });
 ```

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@redflow/client",
-  "version": "0.0.2",
+  "version": "0.0.4",
   "type": "module",
   "main": "src/index.ts",
   "module": "src/index.ts",

package/src/client.ts CHANGED Viewed

@@ -37,10 +37,10 @@ export type CreateClientOptions = {
 export type SyncRegistryOptions = {
   /**
-   * Workflows are pruned only when they were last synced by the same owner.
-   * Set a stable service id (for example, app name) to enable safe stale cleanup.
+   * Stable application id used for stale workflow metadata cleanup.
+   * Workflows are pruned only when they were last synced by the same app.
    */
-  owner?: string;
+  app: string;
 };
 export function defaultPrefix(): string {
@@ -249,16 +249,6 @@ function encodeCompositePart(value: string): string {
   return `${value.length}:${value}`;
 }
-function defaultRegistryOwner(): string {
-  const envOwner = process.env.REDFLOW_SYNC_OWNER?.trim();
-  if (envOwner) return envOwner;
-  const argvOwner = process.argv[1]?.trim();
-  if (argvOwner) return argvOwner;
-  return "redflow:unknown-owner";
-}
 function parseEnqueueScriptResult(value: unknown): { kind: "created" | "existing"; runId: string } | null {
   if (Array.isArray(value) && value.length === 1 && Array.isArray(value[0])) {
     return parseEnqueueScriptResult(value[0]);
@@ -309,6 +299,11 @@ function isValidDate(value: Date): boolean {
   return value instanceof Date && Number.isFinite(value.getTime());
 }
+function normalizeMaxConcurrency(value: unknown): number {
+  if (typeof value !== "number" || !Number.isFinite(value) || value <= 0) return 1;
+  return Math.floor(value);
+}
 export class RedflowClient {
   constructor(
     public readonly redis: RedisClient,
@@ -356,9 +351,11 @@ export class RedflowClient {
     const retries = safeJsonTryParse<any>(data.retriesJson ?? null) as any;
     const updatedAt = Number(data.updatedAt ?? "0");
     const queue = data.queue ?? "default";
+    const maxConcurrency = normalizeMaxConcurrency(Number(data.maxConcurrency ?? "1"));
     return {
       name,
       queue,
+      maxConcurrency,
       cron: Array.isArray(cron) && cron.length > 0 ? cron : undefined,
       retries,
       updatedAt,
@@ -606,17 +603,21 @@ export class RedflowClient {
     }
   }
-  async syncRegistry(registry: WorkflowRegistry, options?: SyncRegistryOptions): Promise<void> {
+  async syncRegistry(registry: WorkflowRegistry, options: SyncRegistryOptions): Promise<void> {
     const defs = registry.list();
     const syncStartedAt = nowMs();
-    const owner = options?.owner?.trim() || defaultRegistryOwner();
+    const app = options.app.trim();
+    if (!app) {
+      throw new Error("syncRegistry requires a non-empty options.app");
+    }
     const registeredNames = new Set(defs.map((def) => def.options.name));
-    await this.cleanupStaleWorkflows(registeredNames, syncStartedAt, owner);
+    await this.cleanupStaleWorkflows(registeredNames, syncStartedAt, app);
     for (const def of defs) {
       const name = def.options.name;
       const queue = def.options.queue ?? "default";
+      const maxConcurrency = normalizeMaxConcurrency(def.options.maxConcurrency);
       const cron = def.options.cron ?? [];
       const retries = def.options.retries ?? {};
       const updatedAt = nowMs();
@@ -653,6 +654,7 @@ export class RedflowClient {
           id: cronId,
           workflow: name,
           queue,
+          maxConcurrency,
           expression: c.expression,
           timezone: c.timezone,
           inputJson: safeJsonStringify(cronInput),
@@ -671,7 +673,8 @@ export class RedflowClient {
       const meta: Record<string, string> = {
         name,
         queue,
-        owner,
+        maxConcurrency: String(maxConcurrency),
+        app,
         updatedAt: String(updatedAt),
         cronJson: safeJsonStringify(cron),
         retriesJson: safeJsonStringify(retries),
@@ -722,7 +725,7 @@ export class RedflowClient {
   private async cleanupStaleWorkflows(
     registeredNames: Set<string>,
     syncStartedAt: number,
-    owner: string,
+    app: string,
   ): Promise<void> {
     const existingNames = await this.redis.smembers(keys.workflows(this.prefix));
@@ -730,8 +733,8 @@ export class RedflowClient {
       if (registeredNames.has(existingName)) continue;
       const workflowKey = keys.workflow(this.prefix, existingName);
-      const workflowOwner = (await this.redis.hget(workflowKey, "owner")) ?? "";
-      if (!workflowOwner || workflowOwner !== owner) {
+      const workflowApp = (await this.redis.hget(workflowKey, "app")) ?? "";
+      if (!workflowApp || workflowApp !== app) {
         continue;
       }

package/src/types.ts CHANGED Viewed

@@ -32,6 +32,11 @@ export type OnFailureContext = {
 export type DefineWorkflowOptions<TSchema extends ZodTypeAny | undefined = ZodTypeAny | undefined> = {
   name: string;
   queue?: string;
+  /**
+   * Maximum concurrently running runs for this workflow.
+   * Default: 1.
+   */
+  maxConcurrency?: number;
   schema?: TSchema;
   cron?: CronTrigger[];
   retries?: WorkflowRetries;
@@ -113,7 +118,7 @@ export type StepApi = {
     workflow: WorkflowLike<TInput, TOutput>,
     input: TInput,
   ): Promise<TOutput>;
-  emitWorkflow(options: StepEmitWorkflowOptions, workflow: WorkflowLike, input: unknown): Promise<string>;
+  emitWorkflow(options: StepEmitWorkflowOptions, workflow: WorkflowLike | string, input: unknown): Promise<string>;
 };
 export type RunState = {
@@ -167,6 +172,7 @@ export type ListedRun = {
 export type WorkflowMeta = {
   name: string;
   queue: string;
+  maxConcurrency: number;
   cron?: CronTrigger[];
   retries?: WorkflowRetries;
   updatedAt: number;

package/src/worker.ts CHANGED Viewed

@@ -19,6 +19,8 @@ import { getDefaultRegistry, type WorkflowRegistry } from "./registry";
 import type { OnFailureContext, RunStatus, StepApi, StepStatus } from "./types";
 export type StartWorkerOptions = {
+  /** Stable application id used for registry sync stale-cleanup scoping. */
+  app: string;
   redis?: RedisClient;
   url?: string;
   prefix?: string;
@@ -74,18 +76,32 @@ redis.call("lpush", KEYS[2], ARGV[1])
 return 1
 `;
-export async function startWorker(options?: StartWorkerOptions): Promise<WorkerHandle> {
-  const registry = options?.registry ?? getDefaultRegistry();
-  const prefix = options?.prefix ?? defaultPrefix();
-  const ownsBaseRedis = !options?.redis && !!options?.url;
-  const baseRedis = options?.redis ?? (options?.url ? new BunRedisClient(options.url) : defaultRedis);
+const REQUEUE_DUE_TO_CONCURRENCY_LUA = `
+if redis.call("lrem", KEYS[1], 1, ARGV[1]) <= 0 then
+  return 0
+end
+redis.call("rpush", KEYS[2], ARGV[1])
+return 1
+`;
+export async function startWorker(options: StartWorkerOptions): Promise<WorkerHandle> {
+  const app = options.app.trim();
+  if (!app) {
+    throw new Error("startWorker requires a non-empty options.app");
+  }
+  const registry = options.registry ?? getDefaultRegistry();
+  const prefix = options.prefix ?? defaultPrefix();
+  const ownsBaseRedis = !options.redis && !!options.url;
+  const baseRedis = options.redis ?? (options.url ? new BunRedisClient(options.url) : defaultRedis);
   const syncClient = createClient({ redis: baseRedis, prefix });
-  const queues = options?.queues ?? deriveQueuesFromRegistry(registry);
-  const concurrency = Math.max(1, options?.concurrency ?? 1);
-  const leaseMs = Math.max(100, options?.runtime?.leaseMs ?? 5000);
-  const blmoveTimeoutSec = options?.runtime?.blmoveTimeoutSec ?? 1;
-  const reaperIntervalMs = options?.runtime?.reaperIntervalMs ?? 500;
+  const queues = options.queues ?? deriveQueuesFromRegistry(registry);
+  const concurrency = Math.max(1, options.concurrency ?? 1);
+  const leaseMs = Math.max(100, options.runtime?.leaseMs ?? 5000);
+  const blmoveTimeoutSec = options.runtime?.blmoveTimeoutSec ?? 1;
+  const reaperIntervalMs = options.runtime?.reaperIntervalMs ?? 500;
   const abort = new AbortController();
   const tasks: Promise<void>[] = [];
@@ -111,7 +127,7 @@ export async function startWorker(options?: StartWorkerOptions): Promise<WorkerH
   };
   try {
-    await syncClient.syncRegistry(registry);
+    await syncClient.syncRegistry(registry, { app });
     // Worker loops (blocking BLMOVE). Use dedicated connections per slot.
     for (let i = 0; i < concurrency; i++) {
@@ -222,6 +238,11 @@ function encodeIdempotencyPart(value: string): string {
   return `${value.length}:${value}`;
 }
+function normalizeMaxConcurrency(value: unknown): number {
+  if (typeof value !== "number" || !Number.isFinite(value) || value <= 0) return 1;
+  return Math.floor(value);
+}
 function defaultStepWorkflowIdempotencyKey(parentRunId: string, stepName: string, childWorkflowName: string): string {
   return `stepwf:${encodeIdempotencyPart(parentRunId)}:${encodeIdempotencyPart(stepName)}:${encodeIdempotencyPart(childWorkflowName)}`;
 }
@@ -396,6 +417,8 @@ async function processRun(args: {
     }
     const workflowName = run.workflow ?? "";
+    const def = workflowName ? registry.get(workflowName) : undefined;
+    const maxConcurrency = normalizeMaxConcurrency(def?.options.maxConcurrency);
     const maxAttempts = Number(run.maxAttempts ?? "1");
     const cancelRequestedAt = run.cancelRequestedAt ? Number(run.cancelRequestedAt) : 0;
     if (cancelRequestedAt > 0) {
@@ -406,7 +429,26 @@ async function processRun(args: {
     const startedAt = run.startedAt && run.startedAt !== "" ? Number(run.startedAt) : nowMs();
-    if (currentStatus === "queued") {
+    if (currentStatus === "queued" && def) {
+      const runningCount = await countRunningRunsForWorkflow({
+        redis,
+        prefix,
+        workflowName,
+        stopAt: maxConcurrency,
+      });
+      if (runningCount >= maxConcurrency) {
+        await redis.send("EVAL", [
+          REQUEUE_DUE_TO_CONCURRENCY_LUA,
+          "2",
+          processingKey,
+          keys.queueReady(prefix, queue),
+          runId,
+        ]);
+        await sleep(25);
+        return;
+      }
       const movedToRunning = await client.transitionRunStatusIfCurrent(runId, "queued", "running", startedAt);
       if (!movedToRunning) {
         // Most likely canceled between dequeue and start transition.
@@ -433,7 +475,6 @@ async function processRun(args: {
       return;
     }
-    const def = registry.get(workflowName);
     if (!def) {
       const errorJson = makeErrorJson(new UnknownWorkflowError(workflowName));
       await client.finalizeRun(runId, { status: "failed", errorJson, finishedAt: nowMs() });
@@ -694,18 +735,27 @@ async function processRun(args: {
       };
       const emitWorkflowStep: StepApi["emitWorkflow"] = async (options, workflow, workflowInput) => {
+        const workflowName = typeof workflow === "string" ? workflow : workflow.name;
         const idempotencyKey =
-          options.idempotencyKey ?? defaultStepWorkflowIdempotencyKey(runId, options.name, workflow.name);
+          options.idempotencyKey ?? defaultStepWorkflowIdempotencyKey(runId, options.name, workflowName);
         return await runStep(
           { name: options.name, timeoutMs: options.timeoutMs },
           async () => {
-            const handle = await workflow.run(workflowInput, {
-              runAt: options.runAt,
-              queueOverride: options.queueOverride,
-              idempotencyTtl: options.idempotencyTtl,
-              idempotencyKey,
-            });
+            const handle =
+              typeof workflow === "string"
+                ? await client.runByName(workflow, workflowInput, {
+                    runAt: options.runAt,
+                    queueOverride: options.queueOverride,
+                    idempotencyTtl: options.idempotencyTtl,
+                    idempotencyKey,
+                  })
+                : await workflow.run(workflowInput, {
+                    runAt: options.runAt,
+                    queueOverride: options.queueOverride,
+                    idempotencyTtl: options.idempotencyTtl,
+                    idempotencyKey,
+                  });
             return handle.id;
           },
         );
@@ -893,6 +943,27 @@ async function reaperLoop(args: {
   }
 }
+async function countRunningRunsForWorkflow(args: {
+  redis: RedisClient;
+  prefix: string;
+  workflowName: string;
+  stopAt?: number;
+}): Promise<number> {
+  const { redis, prefix, workflowName, stopAt } = args;
+  const runningRunIds = await redis.zrevrange(keys.runsStatus(prefix, "running"), 0, -1);
+  let count = 0;
+  for (const runId of runningRunIds) {
+    const runWorkflow = await redis.hget(keys.run(prefix, runId), "workflow");
+    if (runWorkflow !== workflowName) continue;
+    count += 1;
+    if (typeof stopAt === "number" && count >= stopAt) return count;
+  }
+  return count;
+}
 async function cronSchedulerLoop(args: {
   redis: RedisClient;
   client: RedflowClient;
@@ -967,7 +1038,17 @@ async function cronSchedulerLoop(args: {
               continue;
             }
-            await client.runByName(workflow, input, { queueOverride: queue });
+            const cronMaxConcurrency = normalizeMaxConcurrency(def.maxConcurrency);
+            const runningCount = await countRunningRunsForWorkflow({
+              redis,
+              prefix,
+              workflowName: workflow,
+              stopAt: cronMaxConcurrency,
+            });
+            if (runningCount < cronMaxConcurrency) {
+              await client.runByName(workflow, input, { queueOverride: queue });
+            }
             // Schedule next run.
             let nextAt: number | null = null;