@llmist/cli 12.4.0 → 14.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -80,6 +80,88 @@ Use with:
80
80
  llmist agent "Do something" --config ./llmist.toml
81
81
  ```
82
82
 
83
+ ## Rate Limiting
84
+
85
+ llmist CLI enables **conservative rate limiting by default** to prevent hitting provider API limits and avoid agent crashes.
86
+
87
+ ### Default Behavior
88
+
89
+ Rate limits are **automatically configured** based on your model's provider:
90
+
91
+ | Provider | RPM | TPM | Daily Tokens |
92
+ |------------|-----|-----------|--------------|
93
+ | Anthropic | 50 | 40,000 | - |
94
+ | OpenAI | 3 | 40,000 | - |
95
+ | Gemini | 15 | 1,000,000 | 1,500,000 |
96
+
97
+ These defaults are **conservative** (protecting free tier users). Paid tier users should configure higher limits.
98
+
99
+ ### Configuration
100
+
101
+ **TOML Config** (`~/.llmist/cli.toml` or project `llmist.toml`):
102
+
103
+ ```toml
104
+ # Global rate limits (applies to all commands)
105
+ [rate-limits]
106
+ enabled = true
107
+ requests-per-minute = 100
108
+ tokens-per-minute = 200_000
109
+ safety-margin = 0.8 # Throttle at 80% of limit
110
+
111
+ # Profile-specific overrides
112
+ [profile-gemini]
113
+ model = "gemini:flash"
114
+
115
+ [profile-gemini.rate-limits]
116
+ requests-per-minute = 15
117
+ tokens-per-day = 1_500_000
118
+
119
+ # Disable rate limiting for a profile
120
+ [profile-fast]
121
+ model = "gpt4o"
122
+
123
+ [profile-fast.rate-limits]
124
+ enabled = false
125
+ ```
126
+
127
+ **CLI Flags** (override all config):
128
+
129
+ ```bash
130
+ # Override limits
131
+ llmist agent --rate-limit-rpm 100 --rate-limit-tpm 200000 "your prompt"
132
+
133
+ # Disable rate limiting
134
+ llmist agent --no-rate-limit "your prompt"
135
+
136
+ # Configure retry behavior
137
+ llmist agent --max-retries 5 --retry-min-timeout 2000 "your prompt"
138
+
139
+ # Disable retry
140
+ llmist agent --no-retry "your prompt"
141
+ ```
142
+
143
+ ### TUI Feedback
144
+
145
+ The Terminal UI provides real-time feedback when rate limiting is active:
146
+
147
+ - **Status Bar**: Shows `⏸ Throttled Xs` when waiting for rate limits
148
+ - **Status Bar**: Shows `🔄 Retry 2/3` during retry attempts
149
+ - **Conversation Log**: Persistent entries like:
150
+ ```
151
+ ⏸ Rate limit approaching (45 RPM, 85K TPM), waiting 5s...
152
+ 🔄 Request failed (attempt 1/3), retrying...
153
+ ```
154
+
155
+ ### Finding Your Tier Limits
156
+
157
+ To configure optimal limits for your API tier:
158
+
159
+ - **Anthropic**: [Rate Limits Documentation](https://docs.anthropic.com/en/api/rate-limits)
160
+ - **OpenAI**: [Rate Limits Guide](https://platform.openai.com/docs/guides/rate-limits)
161
+ - **Gemini**: [Quota Documentation](https://ai.google.dev/gemini-api/docs/quota)
162
+
163
+ Check your provider dashboard for current tier limits, then update your `llmist.toml` accordingly.
164
+
83
165
  ## Terminal UI
84
166
 
85
167
  The TUI provides an interactive interface to browse execution history, inspect raw payloads, and debug agent runs: