npm - @cheapestinference/openclaw-ratelimit-retry - Versions diffs - 1.0.0 - Mend

@cheapestinference/openclaw-ratelimit-retry 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (8) hide show

package/LICENSE +21 -0
package/README.md +154 -0
package/docs/superpowers/plans/2026-03-12-retry-on-error-plugin.md +604 -0
package/docs/superpowers/specs/2026-03-12-retry-on-error-plugin-design.md +211 -0
package/index.ts +67 -0
package/openclaw.plugin.json +32 -0
package/package.json +15 -0
package/src/service.ts +377 -0

package/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2026 CheapestInference
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

package/README.md ADDED Viewed

@@ -0,0 +1,154 @@
+# ratelimit-retry
+An OpenClaw plugin that automatically retries agent conversations killed by provider rate limits.
+## Problem
+When your LLM provider hits a rate limit or budget cap (HTTP 429), every running agent task dies mid-conversation. Nothing resumes them. If you close the dashboard, those conversations are gone. You have to manually find and re-trigger each one after the budget resets.
+## Solution
+This plugin hooks into OpenClaw's `agent_end` event, detects retriable errors (429s, rate limits, budget exhaustion), and parks the failed session in a persistent queue on disk. A background service waits for the provider's budget window to reset, then sends `chat.send` to the original session -- resuming the conversation with its full transcript context, as if the user had typed a message.
+## Installation
+```bash
+openclaw plugins install @cheapestinference/openclaw-ratelimit-retry
+```
+Or copy manually to your extensions directory:
+```bash
+cp -r openclaw-plugin-ratelimit-retry ~/.openclaw/extensions/ratelimit-retry
+```
+Enable it in OpenClaw config:
+```bash
+openclaw config set plugins.ratelimit-retry.budgetWindowHours 5
+openclaw config set plugins.ratelimit-retry.maxRetryAttempts 3
+```
+No `npm install` needed. The plugin has zero runtime dependencies.
+### Complete example
+```yaml
+# ~/.openclaw/config.yaml
+plugins:
+  ratelimit-retry:
+    budgetWindowHours: 5
+    maxRetryAttempts: 3
+    checkIntervalMinutes: 5
+    retryMessage: "Continue where you left off. The previous attempt failed due to a rate limit that has now reset."
+```
+## How It Works
+```
+Agent run fails (429)
+  |
+  v
+agent_end hook fires
+  |-- Non-retriable error? --> ignore
+  |-- Retriable error?     --> queue to disk
+                                 |
+                                 v
+                  Background timer (every 5 min)
+                    |
+                    |-- Budget window not reset? --> wait
+                    |-- Budget window reset?     --> chat.send to session
+                                                       |
+                                                       |--> Ack received: wait for result
+                                                       |     |--> agent_end success: remove from queue
+                                                       |     |--> agent_end 429: re-queued automatically
+                                                       |--> Send failed: wait for next window
+```
+The retry uses `chat.send` with the original `sessionKey`, which means the gateway loads the complete JSONL transcript and the agent resumes with full context. This is equivalent to the user typing a message in the chat.
+The model is **fire-and-forget with re-detection**: `chat.send` returns an immediate ack (`{ ok, runId, status: "started" }`), not the final result. If the retried run fails again with a 429, the `agent_end` hook fires again and the session is re-queued with an incremented attempt counter. This loop continues until the retry succeeds or `maxRetryAttempts` is reached.
+## Configuration
+| Option | Type | Default | Description |
+|--------|------|---------|-------------|
+| `budgetWindowHours` | `number` | `5` | Budget reset window in hours, aligned to UTC clock boundaries |
+| `maxRetryAttempts` | `number` | `3` | Max retries per session before abandoning |
+| `checkIntervalMinutes` | `number` | `5` | How often the background service checks for pending retries |
+| `retryMessage` | `string` | `"Continue where you left off..."` | Message sent to the session to resume the conversation |
+## How the Retry Timing Works
+Many LLM providers (including LiteLLM) reset budget counters on fixed UTC-aligned windows. With a 5-hour window, the boundaries are:
+```
+00:00  05:00  10:00  15:00  20:00  (next day) 00:00
+  |------|------|------|------|------|
+```
+When an error is queued, the plugin calculates the next boundary after the current time and adds a **1-minute margin** (retries at `HH:01:00` instead of `HH:00:00`) to avoid racing the provider's reset.
+**When 24 is not evenly divisible by `windowHours`**: the math still works. If `windowHours` is 7, boundaries fall at 0, 7, 14, 21, and the next one would be 28 -- which overflows to 04:00 the next day. The plugin handles day overflow correctly.
+## Error Classification
+Non-retriable patterns are checked first. If an error matches a non-retriable pattern, it is never retried, even if it also matches a retriable pattern.
+### Retriable (queued for retry)
+| Pattern | Catches |
+|---------|---------|
+| `429` | `"Error code: 429 - ..."` |
+| `rate limit`, `rate_limit` | `"RateLimitError: ..."` |
+| `too many requests` | HTTP 429 reason phrases |
+| `budget` | `"Budget exceeded for ..."` |
+| `quota exceeded` | Provider quota messages |
+| `resource exhausted` | gRPC-style exhaustion errors |
+| `tokens per minute`, `tpm` | TPM limit messages |
+### Non-retriable (ignored)
+| Pattern | Reason |
+|---------|--------|
+| `401`, `402`, `403`, `404` | HTTP client errors -- won't succeed on retry |
+| `invalid api key`, `unauthorized` | Auth errors -- fix your credentials |
+| `invalid request`, `malformed` | Bad request format -- won't succeed on retry |
+| `model not found` | Model doesn't exist |
+| `context length`, `prompt too large` | Context overflow -- message is too long |
+| `insufficient credits` | Billing issue -- requires user action |
+## Edge Cases
+- **Server restarts**: the queue is persisted to `{stateDir}/ratelimit-retry/queue.json` and reloaded on startup.
+- **Same session errors multiple times**: deduplicated by `sessionKey`. The existing entry is updated with incremented attempts and a recalculated `retryAfter`.
+- **Retry fails with 429 again**: `agent_end` fires again, re-queuing with incremented attempts. Natural loop until success or `maxRetryAttempts`.
+- **Gateway unreachable during retry**: connection error is caught, entry's `retryAfter` is pushed to the next budget window to avoid hammering a down gateway every tick.
+- **Max attempts exceeded**: entry is removed from queue and a warning is logged.
+- **Sub-agent sessions**: handled identically -- `sessionKey` format `agent:X:subagent:Y` works the same way.
+- **Timer fires during active retry**: a `retryInProgress` guard prevents overlapping batches.
+- **Queue file corrupted**: JSON parse errors are caught; service starts with an empty queue and logs a warning.
+- **Queue overflow**: capped at 100 entries. Oldest entries are evicted when full.
+- **Atomic writes**: queue is written to a uniquely-named `.tmp` file first, then renamed, to prevent corruption on crashes or concurrent writes.
+## Limitations
+- **Fire-and-forget window**: after `chat.send` returns its ack, there is a brief period where the retried run is in progress. If it fails with 429 again immediately, there is a small window before the `agent_end` hook fires and re-queues it. This is by design -- the re-detection loop handles it.
+- **`chat.send` requires a non-empty message**: the retry always sends the configured `retryMessage`. It cannot send an empty message to silently resume.
+- **No partial-run recovery**: the plugin resumes the conversation from the last completed turn. It does not replay partial streaming output that was interrupted.
+- **Single-instance only**: the queue is a local JSON file with no locking. Running multiple OpenClaw instances sharing the same `~/.openclaw/` directory is not supported.
+- **No backpressure on the provider**: the plugin retries all ready sessions in sequence. If you have many queued sessions, they all fire at the start of the next window.
+## License
+[MIT](LICENSE)
+## Contributing
+Contributions are welcome. Please open an issue first to discuss what you would like to change.
+```bash
+git clone https://github.com/cheapestinference/openclaw-plugin-ratelimit-retry
+cd openclaw-plugin-ratelimit-retry
+# No build step. OpenClaw loads .ts files directly via Jiti.
+```