npm - @cheapestinference/openclaw-ratelimit-retry - Versions diffs - 1.0.0 - Mend

@cheapestinference/openclaw-ratelimit-retry 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (8) hide show

package/LICENSE +21 -0
package/README.md +154 -0
package/docs/superpowers/plans/2026-03-12-retry-on-error-plugin.md +604 -0
package/docs/superpowers/specs/2026-03-12-retry-on-error-plugin-design.md +211 -0
package/index.ts +67 -0
package/openclaw.plugin.json +32 -0
package/package.json +15 -0
package/src/service.ts +377 -0

package/docs/superpowers/specs/2026-03-12-retry-on-error-plugin-design.md ADDED Viewed

@@ -0,0 +1,211 @@
+# OpenClaw Plugin: retry-on-error
+## Problem
+When the inference provider (CheapestInference via LiteLLM) returns 429 rate limit errors due to budget exhaustion (5-hour fixed window), all running agent tasks and conversations stop. When the budget resets, nothing resumes automatically. Users must manually re-trigger each conversation, and if the dashboard is closed, there is no way to resume at all.
+## Solution
+An OpenClaw plugin (`retry-on-error`) that:
+1. Detects retriable provider errors via the `agent_end` hook
+2. Parks failed sessions in a persistent queue on disk
+3. Runs a background service that retries parked sessions when the budget window resets
+4. Uses OpenClaw's internal `GatewayClient` to send `chat.send` to the local gateway, resuming conversations with their full transcript context
+## Architecture
+### Plugin Structure
+```
+~/.openclaw/extensions/retry-on-error/
+├── openclaw.plugin.json    # Plugin manifest
+├── package.json            # NPM metadata
+├── index.ts                # Entry point: registers hook + service
+└── src/
+    └── service.ts          # Background retry service
+```
+### Components
+#### 1. Error Detection (hook: `agent_end`)
+The `agent_end` hook receives `{ messages, success, error, durationMs }` with `PluginHookAgentContext` which includes `sessionKey`.
+The `error` field is a **plain string** (not an HTTP status code). Error strings from the CheapestInference/LiteLLM stack arrive in formats like `"Error code: 429 - ..."` or `"RateLimitError: ..."`.
+**Retriable error patterns** (string matching, case-insensitive):
+- `"429"` (catches `"Error code: 429"`)
+- `"rate limit"` / `"rate_limit"` / `"too many requests"`
+- `"budget"` / `"quota exceeded"`
+- `"resource_exhausted"` / `"resource has been exhausted"`
+**Non-retriable errors** (ignored):
+- Auth errors ("invalid api key", "unauthorized", "401", "403")
+- Format errors ("invalid request", "malformed")
+- Model not found ("404", "model not found")
+- Context overflow ("context length", "prompt too large")
+- Billing errors ("402", "insufficient credits") — require user action
+#### 2. Persistent Queue
+Path: `path.join(ctx.stateDir, 'retry-on-error', 'queue.json')` (resolves to `~/.openclaw/retry-on-error/queue.json`).
+```json
+[
+  {
+    "sessionKey": "agent:myagent:main",
+    "errorTime": 1710000000000,
+    "retryAfter": 1710018000000,
+    "errorMessage": "Error code: 429 - rate limit exceeded",
+    "attempts": 0
+  }
+]
+```
+**Deduplication**: Only one entry per `sessionKey`. If the same session errors again, the existing entry is updated with incremented `attempts` and recalculated `retryAfter`.
+**Retry time calculation**: Computes the next 5-hour budget window boundary aligned to midnight (00:00, 05:00, 10:00, 15:00, 20:00 UTC) plus 1 minute margin. LiteLLM uses UTC-aligned boundaries per its `get_next_standardized_reset_time()` implementation.
+```
+function nextResetTime(now, windowHours):
+  currentHour = now.getUTCHours()
+  nextBoundary = currentHour + windowHours - (currentHour % windowHours)
+  if currentHour % windowHours == 0 and minutes == 0:
+    nextBoundary = currentHour + windowHours
+  // Handle day overflow (nextBoundary >= 24)
+  return date at nextBoundary:01:00 UTC
+```
+**Atomic writes**: Write to a temp file, then rename (atomic rename) to prevent corruption on crashes.
+**Max queue size**: Capped at 100 entries. Oldest entries evicted when full.
+#### 3. Background Service (`registerService`)
+**`start(ctx)`**:
+1. Read gateway port: `ctx.config.gateway?.port ?? 18789`
+2. Capture actual port via `gateway_start` hook if available
+3. Load `queue.json` from `ctx.stateDir` (recovers state after restarts)
+4. Start interval timer (every `checkIntervalMinutes`, default 5 minutes)
+5. On each tick (guarded by `retryInProgress` flag to prevent overlapping batches):
+   - Filter queue items where `retryAfter < Date.now()`
+   - If items ready: connect `GatewayClient` to local gateway
+   - Send `chat.send` for each ready session (fire-and-forget, see Response Model)
+   - On ack success: remove from queue, log
+   - On connection/send error: leave in queue, retry on next tick
+   - Save updated `queue.json` to disk (atomic write)
+**`stop(ctx)`**:
+1. Clear interval timer
+2. Disconnect `GatewayClient` if connected
+3. Save queue to disk
+#### 4. Gateway Client (authentication)
+Uses OpenClaw's internal `GatewayClient` class instead of a custom WebSocket implementation. This handles:
+- `connect.challenge` handshake
+- Device identity authentication (`loadOrCreateDeviceIdentity`, `buildDeviceAuthPayloadV3`)
+- Reconnection with exponential backoff
+- Protocol negotiation
+The `GatewayClient` is imported from `openclaw` internals (resolved via Jiti at runtime). Connection is ephemeral: opens when retries are pending, closes after processing.
+Auth token can alternatively be read from `ctx.config.gateway?.auth?.token` if device identity is not available.
+#### 5. Response Model (fire-and-forget with re-detection)
+`chat.send` returns an immediate ack `{ ok: true, runId, status: "started" }`, NOT the final result. The plugin does NOT wait for the agent run to complete.
+**If the retry succeeds**: The agent processes the message normally. No further action needed.
+**If the retry fails again with 429**: The `agent_end` hook fires again with the same `sessionKey`. The deduplication logic updates the existing queue entry: increments `attempts`, recalculates `retryAfter` to the next 5h window. This natural loop continues until the retry succeeds or `attempts >= maxRetryAttempts`.
+**Idempotency key generation**: Each retry uses `retry:${sessionKey}:${Date.now()}` to prevent replay/deduplication conflicts with previous messages.
+### Configuration
+Plugin config in OpenClaw settings:
+```yaml
+plugins:
+  retry-on-error:
+    budgetWindowHours: 5          # Budget reset window (hours)
+    maxRetryAttempts: 3           # Max retries per session before abandoning
+    checkIntervalMinutes: 5       # How often to check for pending retries
+    retryMessage: "Continue where you left off. The previous attempt failed due to a rate limit that has now reset."
+```
+Config schema (in `openclaw.plugin.json`):
+```json
+{
+  "id": "retry-on-error",
+  "configSchema": {
+    "type": "object",
+    "additionalProperties": false,
+    "properties": {
+      "budgetWindowHours": {
+        "type": "number",
+        "default": 5,
+        "description": "Budget reset window in hours"
+      },
+      "maxRetryAttempts": {
+        "type": "number",
+        "default": 3,
+        "description": "Maximum retry attempts per session"
+      },
+      "checkIntervalMinutes": {
+        "type": "number",
+        "default": 5,
+        "description": "Interval between retry checks in minutes"
+      },
+      "retryMessage": {
+        "type": "string",
+        "default": "Continue where you left off. The previous attempt failed due to a rate limit that has now reset.",
+        "description": "Message sent to resume the conversation"
+      }
+    }
+  }
+}
+```
+### Edge Cases
+| Scenario | Behavior |
+|----------|----------|
+| Server restarts | `start()` reloads `queue.json` from disk |
+| Multiple errors same session | Deduplicate by `sessionKey` (update existing entry) |
+| Retry also fails with 429 | `agent_end` hook fires again → re-queues with incremented attempts |
+| Gateway unreachable during retry | Catch connection error, leave in queue for next tick |
+| `attempts >= maxRetryAttempts` | Remove from queue, log warning |
+| 24 not divisible by windowHours | Handle day overflow (hour >= 24 wraps to next day) |
+| Sub-agent session error | Same treatment — sessionKey format `agent:X:subagent:Y` handled identically |
+| Timer fires during active retry | `retryInProgress` guard prevents overlapping batches |
+| Queue file corrupted | Catch JSON parse error, start with empty queue, log warning |
+| Queue exceeds 100 entries | Evict oldest entries |
+### Why `chat.send` (not `/hooks` endpoint)
+The `/hooks` endpoint creates "isolated agent turns" (cron-like). Using `chat.send` with the original `sessionKey` is equivalent to a user sending a message manually — the gateway loads the complete JSONL transcript and the agent resumes with full context. This is the correct behavior for conversation resumption.
+### Dependencies
+Runtime: None (zero runtime dependencies).
+Dev/type-only:
+- `openclaw` — devDependency for types, resolved at runtime via Jiti alias
+- `GatewayClient` — imported from `openclaw` internals at runtime
+### Installation
+```bash
+# Copy to global extensions directory
+cp -r retry-on-error ~/.openclaw/extensions/
+# Enable in OpenClaw config
+openclaw config set plugins.retry-on-error.budgetWindowHours 5
+openclaw config set plugins.retry-on-error.maxRetryAttempts 3
+```
+No `npm install` needed (zero runtime dependencies).

package/index.ts ADDED Viewed

@@ -0,0 +1,67 @@
+import type { OpenClawPluginApi } from "openclaw/plugin-sdk";
+import { createRetryService, isRetriableError } from "./src/service.js";
+interface PluginConfig {
+  budgetWindowHours?: number;
+  maxRetryAttempts?: number;
+  checkIntervalMinutes?: number;
+  retryMessage?: string;
+}
+const DEFAULT_CONFIG: Required<PluginConfig> = {
+  budgetWindowHours: 5,
+  maxRetryAttempts: 3,
+  checkIntervalMinutes: 5,
+  retryMessage: "Continue where you left off. The previous attempt failed due to a rate limit that has now reset.",
+};
+const { service, addEntry, removeEntry } = createRetryService();
+const plugin = {
+  id: "ratelimit-retry",
+  name: "Ratelimit Retry",
+  description: "Automatically retry agent conversations that fail due to provider rate limits",
+  register(api: OpenClawPluginApi) {
+    const cfg = {
+      ...DEFAULT_CONFIG,
+      ...(api.pluginConfig as PluginConfig),
+    };
+    api.on("agent_end", (event, ctx) => {
+      const error = (event as Record<string, unknown>).error as string | undefined;
+      const success = (event as Record<string, unknown>).success as boolean | undefined;
+      const sessionKey = (ctx as Record<string, unknown>).sessionKey as string | undefined;
+      if (!sessionKey) return;
+      // On success, remove from retry queue (if present)
+      if (success || !error) {
+        removeEntry(sessionKey);
+        return;
+      }
+      // Ignore non-retriable errors
+      if (!isRetriableError(error)) {
+        api.logger.debug?.(`ratelimit-retry: non-retriable error on ${sessionKey}: ${error.slice(0, 100)}`);
+        return;
+      }
+      api.logger.info(`ratelimit-retry: queuing retry for ${sessionKey} (error: ${error.slice(0, 100)})`);
+      const resolvedConfig = {
+        ...cfg,
+        gatewayPort: (api.config as Record<string, any>).gateway?.port ?? 18789,
+        gatewayToken: (api.config as Record<string, any>).gateway?.auth?.token,
+        gatewayPassword: (api.config as Record<string, any>).gateway?.auth?.password,
+      };
+      addEntry(sessionKey, error, resolvedConfig, api.logger as any);
+    });
+    api.registerService(service);
+    api.logger.info("ratelimit-retry: plugin registered");
+  },
+};
+export default plugin;

package/openclaw.plugin.json ADDED Viewed

@@ -0,0 +1,32 @@
+{
+  "id": "ratelimit-retry",
+  "configSchema": {
+    "type": "object",
+    "additionalProperties": false,
+    "properties": {
+      "budgetWindowHours": {
+        "type": "number",
+        "default": 5,
+        "minimum": 1,
+        "description": "Budget reset window in hours (aligned to UTC clock boundaries)"
+      },
+      "maxRetryAttempts": {
+        "type": "number",
+        "default": 3,
+        "minimum": 1,
+        "description": "Maximum retry attempts per session before abandoning"
+      },
+      "checkIntervalMinutes": {
+        "type": "number",
+        "default": 5,
+        "minimum": 1,
+        "description": "How often to check for pending retries (minutes)"
+      },
+      "retryMessage": {
+        "type": "string",
+        "default": "Continue where you left off. The previous attempt failed due to a rate limit that has now reset.",
+        "description": "Message sent to the session to resume the conversation"
+      }
+    }
+  }
+}

package/package.json ADDED Viewed

@@ -0,0 +1,15 @@
+{
+  "name": "@cheapestinference/openclaw-ratelimit-retry",
+  "version": "1.0.0",
+  "description": "Automatically retry agent conversations that fail due to provider rate limits",
+  "type": "module",
+  "license": "MIT",
+  "repository": {
+    "type": "git",
+    "url": "https://github.com/cheapestinference/openclaw-plugin-ratelimit-retry"
+  },
+  "keywords": ["openclaw", "plugin", "retry", "rate-limit", "429", "budget", "ratelimit"],
+  "openclaw": {
+    "extensions": ["./index.ts"]
+  }
+}

package/src/service.ts ADDED Viewed

@@ -0,0 +1,377 @@
+import { writeFile, readFile, mkdir, rename } from "node:fs/promises";
+import { join, dirname } from "node:path";
+import type { OpenClawPluginService } from "openclaw/plugin-sdk";
+// --- Types ---
+interface QueueEntry {
+  sessionKey: string;
+  errorTime: number;
+  retryAfter: number;
+  errorMessage: string;
+  attempts: number;
+}
+interface RetryConfig {
+  budgetWindowHours: number;
+  maxRetryAttempts: number;
+  checkIntervalMinutes: number;
+  retryMessage: string;
+  gatewayPort: number;
+  gatewayToken: string | undefined;
+  gatewayPassword: string | undefined;
+}
+// --- Error Detection ---
+const RETRIABLE_PATTERNS = [
+  /\b429\b/i,
+  /rate[_ ]?limit/i,
+  /too many requests/i,
+  /budget/i,
+  /quota[_ ]?exceeded/i,
+  /resource[_ ]?(exhausted|has been exhausted)/i,
+  /tokens? per minute/i,
+  /\btpm\b/i,
+];
+const NON_RETRIABLE_PATTERNS = [
+  /\b40[1-4]\b/i,
+  /invalid api key/i,
+  /unauthorized/i,
+  /invalid request/i,
+  /context[_ ]?(length|overflow)/i,
+  /prompt too (large|long)/i,
+  /model not found/i,
+  /insufficient[_ ]?credits/i,
+  /malformed/i,
+];
+export function isRetriableError(error: string | undefined): boolean {
+  if (!error) return false;
+  for (const pattern of NON_RETRIABLE_PATTERNS) {
+    if (pattern.test(error)) return false;
+  }
+  for (const pattern of RETRIABLE_PATTERNS) {
+    if (pattern.test(error)) return true;
+  }
+  return false;
+}
+// --- Reset Time Calculation ---
+export function nextResetTime(now: Date, windowHours: number): number {
+  if (!windowHours || windowHours <= 0) windowHours = 5;
+  const currentHour = now.getUTCHours();
+  const nextBoundary = currentHour + windowHours - (currentHour % windowHours);
+  const result = new Date(now);
+  if (nextBoundary >= 24) {
+    // Overflows to next day
+    result.setUTCDate(result.getUTCDate() + 1);
+    result.setUTCHours(Math.floor(nextBoundary - 24), 1, 0, 0); // +1 minute margin
+  } else {
+    result.setUTCHours(Math.floor(nextBoundary), 1, 0, 0); // +1 minute margin
+  }
+  return result.getTime();
+}
+// --- Queue Management ---
+const MAX_QUEUE_SIZE = 100;
+async function loadQueue(queuePath: string): Promise<QueueEntry[]> {
+  try {
+    const data = await readFile(queuePath, "utf-8");
+    const parsed = JSON.parse(data);
+    if (!Array.isArray(parsed)) return [];
+    return parsed;
+  } catch {
+    return [];
+  }
+}
+async function saveQueue(queuePath: string, queue: QueueEntry[]): Promise<void> {
+  await mkdir(dirname(queuePath), { recursive: true });
+  const tmpPath = `${queuePath}.tmp.${Date.now()}.${Math.random().toString(36).slice(2, 8)}`;
+  await writeFile(tmpPath, JSON.stringify(queue, null, 2), "utf-8");
+  await rename(tmpPath, queuePath);
+}
+function addToQueue(queue: QueueEntry[], entry: QueueEntry): QueueEntry[] {
+  const filtered = queue.filter((e) => e.sessionKey !== entry.sessionKey);
+  filtered.push(entry);
+  if (filtered.length > MAX_QUEUE_SIZE) {
+    filtered.sort((a, b) => a.errorTime - b.errorTime);
+    return filtered.slice(-MAX_QUEUE_SIZE);
+  }
+  return filtered;
+}
+// --- WebSocket Chat Client ---
+interface ChatSendResult {
+  ok: boolean;
+  error?: string;
+}
+async function sendRetryMessage(
+  port: number,
+  token: string | undefined,
+  password: string | undefined,
+  sessionKey: string,
+  message: string,
+): Promise<ChatSendResult> {
+  return new Promise((outerResolve) => {
+    let settled = false;
+    const resolve = (val: ChatSendResult) => {
+      if (settled) return;
+      settled = true;
+      outerResolve(val);
+    };
+    const timeout = setTimeout(() => {
+      try { ws.close(); } catch {}
+      resolve({ ok: false, error: "Connection timeout" });
+    }, 30_000);
+    const ws = new WebSocket(`ws://127.0.0.1:${port}`);
+    let requestId = 0;
+    ws.addEventListener("error", () => {
+      clearTimeout(timeout);
+      resolve({ ok: false, error: "WebSocket connection error" });
+    });
+    ws.addEventListener("close", () => {
+      clearTimeout(timeout);
+      resolve({ ok: false, error: "Connection closed unexpectedly" });
+    });
+    ws.addEventListener("message", (event) => {
+      try {
+        const frame = JSON.parse(String(event.data));
+        if (frame.type === "event" && frame.event === "connect.challenge") {
+          const connectFrame: Record<string, unknown> = {
+            type: "req",
+            id: ++requestId,
+            method: "connect",
+            params: {
+              minProtocol: 1,
+              maxProtocol: 1,
+              client: {
+                name: "ratelimit-retry",
+                displayName: "Ratelimit Retry Plugin",
+                version: "1.0.0",
+                mode: "backend",
+              },
+              role: "operator",
+              scopes: ["operator.admin"],
+            },
+          };
+          if (token) {
+            (connectFrame.params as Record<string, unknown>).auth = { token };
+          } else if (password) {
+            (connectFrame.params as Record<string, unknown>).auth = { password };
+          }
+          ws.send(JSON.stringify(connectFrame));
+          return;
+        }
+        if (frame.type === "res" && frame.id === 1 && !frame.ok) {
+          clearTimeout(timeout);
+          resolve({ ok: false, error: frame.error?.message ?? "Gateway authentication failed" });
+          ws.close();
+          return;
+        }
+        if (frame.type === "res" && frame.id === 1 && frame.ok) {
+          const chatFrame = {
+            type: "req",
+            id: ++requestId,
+            method: "chat.send",
+            params: {
+              sessionKey,
+              message,
+              idempotencyKey: `retry:${sessionKey}:${Date.now()}`,
+            },
+          };
+          ws.send(JSON.stringify(chatFrame));
+          return;
+        }
+        if (frame.type === "res" && frame.id === 2) {
+          clearTimeout(timeout);
+          if (frame.ok) {
+            resolve({ ok: true });
+          } else {
+            resolve({ ok: false, error: frame.error?.message ?? "chat.send failed" });
+          }
+          ws.close();
+          return;
+        }
+      } catch {
+        // Ignore unparseable frames
+      }
+    });
+  });
+}
+// --- Service ---
+interface Logger {
+  info: (msg: string) => void;
+  warn: (msg: string) => void;
+  error: (msg: string) => void;
+}
+export function createRetryService(): {
+  service: OpenClawPluginService;
+  addEntry: (sessionKey: string, errorMessage: string, config: RetryConfig, logger?: Logger) => void;
+  removeEntry: (sessionKey: string) => void;
+} {
+  let queue: QueueEntry[] = [];
+  let queuePath = "";
+  let timer: ReturnType<typeof setInterval> | null = null;
+  let retryInProgress = false;
+  let config: RetryConfig = {
+    budgetWindowHours: 5,
+    maxRetryAttempts: 3,
+    checkIntervalMinutes: 5,
+    retryMessage: "Continue where you left off. The previous attempt failed due to a rate limit that has now reset.",
+    gatewayPort: 18789,
+    gatewayToken: undefined,
+    gatewayPassword: undefined,
+  };
+  const addEntry = (sessionKey: string, errorMessage: string, cfg: RetryConfig, logger?: Logger) => {
+    config = cfg;
+    const now = new Date();
+    const existing = queue.find((e) => e.sessionKey === sessionKey);
+    const attempts = existing ? existing.attempts + 1 : 0;
+    if (attempts >= config.maxRetryAttempts) {
+      logger?.warn(`ratelimit-retry: max attempts (${config.maxRetryAttempts}) reached for ${sessionKey}, abandoning`);
+      queue = queue.filter((e) => e.sessionKey !== sessionKey);
+      if (queuePath) saveQueue(queuePath, queue).catch(() => {});
+      return;
+    }
+    const entry: QueueEntry = {
+      sessionKey,
+      errorTime: now.getTime(),
+      retryAfter: nextResetTime(now, config.budgetWindowHours),
+      errorMessage,
+      attempts,
+    };
+    queue = addToQueue(queue, entry);
+    if (queuePath) saveQueue(queuePath, queue).catch(() => {});
+  };
+  const removeEntry = (sessionKey: string) => {
+    const existed = queue.some((e) => e.sessionKey === sessionKey);
+    if (existed) {
+      queue = queue.filter((e) => e.sessionKey !== sessionKey);
+      if (queuePath) saveQueue(queuePath, queue).catch(() => {});
+    }
+  };
+  const processTick = async (logger: Logger) => {
+    if (retryInProgress || queue.length === 0) return;
+    retryInProgress = true;
+    try {
+      const now = Date.now();
+      const ready = queue.filter((e) => e.retryAfter <= now);
+      if (ready.length === 0) return;
+      logger.info(`ratelimit-retry: ${ready.length} session(s) ready for retry`);
+      for (const entry of ready) {
+        logger.info(`ratelimit-retry: retrying session ${entry.sessionKey} (attempt ${entry.attempts + 1})`);
+        const result = await sendRetryMessage(
+          config.gatewayPort,
+          config.gatewayToken,
+          config.gatewayPassword,
+          entry.sessionKey,
+          config.retryMessage,
+        );
+        if (result.ok) {
+          // Don't remove — keep entry so attempts counter is preserved.
+          // Push retryAfter to next window to prevent re-sending on next tick.
+          // Entry is removed when agent_end fires with success=true.
+          // If the retry fails again, agent_end fires with error and increments attempts.
+          entry.retryAfter = nextResetTime(new Date(), config.budgetWindowHours);
+          logger.info(`ratelimit-retry: sent retry to ${entry.sessionKey}`);
+        } else {
+          // Push retryAfter forward to avoid hammering a down gateway every tick
+          entry.retryAfter = nextResetTime(new Date(), config.budgetWindowHours);
+          logger.warn(`ratelimit-retry: failed to send retry to ${entry.sessionKey}: ${result.error}`);
+        }
+      }
+      await saveQueue(queuePath, queue);
+    } finally {
+      retryInProgress = false;
+    }
+  };
+  const service: OpenClawPluginService = {
+    id: "ratelimit-retry",
+    async start(ctx) {
+      const stateDir = join(ctx.stateDir, "ratelimit-retry");
+      queuePath = join(stateDir, "queue.json");
+      config = {
+        ...config,
+        gatewayPort: (ctx.config as Record<string, any>).gateway?.port ?? 18789,
+        gatewayToken: (ctx.config as Record<string, any>).gateway?.auth?.token,
+        gatewayPassword: (ctx.config as Record<string, any>).gateway?.auth?.password,
+      };
+      const loaded = await loadQueue(queuePath);
+      // Merge: disk entries + any in-memory entries added between register() and start()
+      if (loaded.length > 0) {
+        const loadedKeys = new Set(loaded.map((e) => e.sessionKey));
+        const preStartEntries = queue.filter((e) => !loadedKeys.has(e.sessionKey));
+        queue = [...loaded, ...preStartEntries];
+        ctx.logger.info(`ratelimit-retry: loaded ${loaded.length} pending retry(s) from disk`);
+      }
+      const intervalMs = config.checkIntervalMinutes * 60 * 1000;
+      timer = setInterval(() => {
+        processTick(ctx.logger).catch((err) => {
+          ctx.logger.error(`ratelimit-retry: tick failed: ${err}`);
+        });
+      }, intervalMs);
+      ctx.logger.info(
+        `ratelimit-retry: service started (window=${config.budgetWindowHours}h, check=${config.checkIntervalMinutes}min, maxAttempts=${config.maxRetryAttempts})`,
+      );
+    },
+    async stop(ctx) {
+      if (timer) {
+        clearInterval(timer);
+        timer = null;
+      }
+      if (queuePath && queue.length > 0) {
+        await saveQueue(queuePath, queue);
+      }
+      ctx.logger.info("ratelimit-retry: service stopped");
+    },
+  };
+  return { service, addEntry, removeEntry };
+}