@cheapestinference/openclaw-ratelimit-retry 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 CheapestInference
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
package/README.md ADDED
@@ -0,0 +1,154 @@
1
+ # ratelimit-retry
2
+
3
+ An OpenClaw plugin that automatically retries agent conversations killed by provider rate limits.
4
+
5
+ ## Problem
6
+
7
+ When your LLM provider hits a rate limit or budget cap (HTTP 429), every running agent task dies mid-conversation. Nothing resumes them. If you close the dashboard, those conversations are gone. You have to manually find and re-trigger each one after the budget resets.
8
+
9
+ ## Solution
10
+
11
+ This plugin hooks into OpenClaw's `agent_end` event, detects retriable errors (429s, rate limits, budget exhaustion), and parks the failed session in a persistent queue on disk. A background service waits for the provider's budget window to reset, then sends `chat.send` to the original session -- resuming the conversation with its full transcript context, as if the user had typed a message.
12
+
13
+ ## Installation
14
+
15
+ ```bash
16
+ openclaw plugins install @cheapestinference/openclaw-ratelimit-retry
17
+ ```
18
+
19
+ Or copy manually to your extensions directory:
20
+
21
+ ```bash
22
+ cp -r openclaw-plugin-ratelimit-retry ~/.openclaw/extensions/ratelimit-retry
23
+ ```
24
+
25
+ Enable it in OpenClaw config:
26
+
27
+ ```bash
28
+ openclaw config set plugins.ratelimit-retry.budgetWindowHours 5
29
+ openclaw config set plugins.ratelimit-retry.maxRetryAttempts 3
30
+ ```
31
+
32
+ No `npm install` needed. The plugin has zero runtime dependencies.
33
+
34
+ ### Complete example
35
+
36
+ ```yaml
37
+ # ~/.openclaw/config.yaml
38
+ plugins:
39
+ ratelimit-retry:
40
+ budgetWindowHours: 5
41
+ maxRetryAttempts: 3
42
+ checkIntervalMinutes: 5
43
+ retryMessage: "Continue where you left off. The previous attempt failed due to a rate limit that has now reset."
44
+ ```
45
+
46
+ ## How It Works
47
+
48
+ ```
49
+ Agent run fails (429)
50
+ |
51
+ v
52
+ agent_end hook fires
53
+ |-- Non-retriable error? --> ignore
54
+ |-- Retriable error? --> queue to disk
55
+ |
56
+ v
57
+ Background timer (every 5 min)
58
+ |
59
+ |-- Budget window not reset? --> wait
60
+ |-- Budget window reset? --> chat.send to session
61
+ |
62
+ |--> Ack received: wait for result
63
+ | |--> agent_end success: remove from queue
64
+ | |--> agent_end 429: re-queued automatically
65
+ |--> Send failed: wait for next window
66
+ ```
67
+
68
+ The retry uses `chat.send` with the original `sessionKey`, which means the gateway loads the complete JSONL transcript and the agent resumes with full context. This is equivalent to the user typing a message in the chat.
69
+
70
+ The model is **fire-and-forget with re-detection**: `chat.send` returns an immediate ack (`{ ok, runId, status: "started" }`), not the final result. If the retried run fails again with a 429, the `agent_end` hook fires again and the session is re-queued with an incremented attempt counter. This loop continues until the retry succeeds or `maxRetryAttempts` is reached.
71
+
72
+ ## Configuration
73
+
74
+ | Option | Type | Default | Description |
75
+ |--------|------|---------|-------------|
76
+ | `budgetWindowHours` | `number` | `5` | Budget reset window in hours, aligned to UTC clock boundaries |
77
+ | `maxRetryAttempts` | `number` | `3` | Max retries per session before abandoning |
78
+ | `checkIntervalMinutes` | `number` | `5` | How often the background service checks for pending retries |
79
+ | `retryMessage` | `string` | `"Continue where you left off..."` | Message sent to the session to resume the conversation |
80
+
81
+ ## How the Retry Timing Works
82
+
83
+ Many LLM providers (including LiteLLM) reset budget counters on fixed UTC-aligned windows. With a 5-hour window, the boundaries are:
84
+
85
+ ```
86
+ 00:00 05:00 10:00 15:00 20:00 (next day) 00:00
87
+ |------|------|------|------|------|
88
+ ```
89
+
90
+ When an error is queued, the plugin calculates the next boundary after the current time and adds a **1-minute margin** (retries at `HH:01:00` instead of `HH:00:00`) to avoid racing the provider's reset.
91
+
92
+ **When 24 is not evenly divisible by `windowHours`**: the math still works. If `windowHours` is 7, boundaries fall at 0, 7, 14, 21, and the next one would be 28 -- which overflows to 04:00 the next day. The plugin handles day overflow correctly.
93
+
94
+ ## Error Classification
95
+
96
+ Non-retriable patterns are checked first. If an error matches a non-retriable pattern, it is never retried, even if it also matches a retriable pattern.
97
+
98
+ ### Retriable (queued for retry)
99
+
100
+ | Pattern | Catches |
101
+ |---------|---------|
102
+ | `429` | `"Error code: 429 - ..."` |
103
+ | `rate limit`, `rate_limit` | `"RateLimitError: ..."` |
104
+ | `too many requests` | HTTP 429 reason phrases |
105
+ | `budget` | `"Budget exceeded for ..."` |
106
+ | `quota exceeded` | Provider quota messages |
107
+ | `resource exhausted` | gRPC-style exhaustion errors |
108
+ | `tokens per minute`, `tpm` | TPM limit messages |
109
+
110
+ ### Non-retriable (ignored)
111
+
112
+ | Pattern | Reason |
113
+ |---------|--------|
114
+ | `401`, `402`, `403`, `404` | HTTP client errors -- won't succeed on retry |
115
+ | `invalid api key`, `unauthorized` | Auth errors -- fix your credentials |
116
+ | `invalid request`, `malformed` | Bad request format -- won't succeed on retry |
117
+ | `model not found` | Model doesn't exist |
118
+ | `context length`, `prompt too large` | Context overflow -- message is too long |
119
+ | `insufficient credits` | Billing issue -- requires user action |
120
+
121
+ ## Edge Cases
122
+
123
+ - **Server restarts**: the queue is persisted to `{stateDir}/ratelimit-retry/queue.json` and reloaded on startup.
124
+ - **Same session errors multiple times**: deduplicated by `sessionKey`. The existing entry is updated with incremented attempts and a recalculated `retryAfter`.
125
+ - **Retry fails with 429 again**: `agent_end` fires again, re-queuing with incremented attempts. Natural loop until success or `maxRetryAttempts`.
126
+ - **Gateway unreachable during retry**: connection error is caught, entry's `retryAfter` is pushed to the next budget window to avoid hammering a down gateway every tick.
127
+ - **Max attempts exceeded**: entry is removed from queue and a warning is logged.
128
+ - **Sub-agent sessions**: handled identically -- `sessionKey` format `agent:X:subagent:Y` works the same way.
129
+ - **Timer fires during active retry**: a `retryInProgress` guard prevents overlapping batches.
130
+ - **Queue file corrupted**: JSON parse errors are caught; service starts with an empty queue and logs a warning.
131
+ - **Queue overflow**: capped at 100 entries. Oldest entries are evicted when full.
132
+ - **Atomic writes**: queue is written to a uniquely-named `.tmp` file first, then renamed, to prevent corruption on crashes or concurrent writes.
133
+
134
+ ## Limitations
135
+
136
+ - **Fire-and-forget window**: after `chat.send` returns its ack, there is a brief period where the retried run is in progress. If it fails with 429 again immediately, there is a small window before the `agent_end` hook fires and re-queues it. This is by design -- the re-detection loop handles it.
137
+ - **`chat.send` requires a non-empty message**: the retry always sends the configured `retryMessage`. It cannot send an empty message to silently resume.
138
+ - **No partial-run recovery**: the plugin resumes the conversation from the last completed turn. It does not replay partial streaming output that was interrupted.
139
+ - **Single-instance only**: the queue is a local JSON file with no locking. Running multiple OpenClaw instances sharing the same `~/.openclaw/` directory is not supported.
140
+ - **No backpressure on the provider**: the plugin retries all ready sessions in sequence. If you have many queued sessions, they all fire at the start of the next window.
141
+
142
+ ## License
143
+
144
+ [MIT](LICENSE)
145
+
146
+ ## Contributing
147
+
148
+ Contributions are welcome. Please open an issue first to discuss what you would like to change.
149
+
150
+ ```bash
151
+ git clone https://github.com/cheapestinference/openclaw-plugin-ratelimit-retry
152
+ cd openclaw-plugin-ratelimit-retry
153
+ # No build step. OpenClaw loads .ts files directly via Jiti.
154
+ ```