npm - @llmist/cli - Versions diffs - 12.4.0 → 14.0.0 - Mend

@llmist/cli 12.4.0 → 14.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (5) hide show

package/README.md CHANGED Viewed

@@ -80,6 +80,88 @@ Use with:
 llmist agent "Do something" --config ./llmist.toml
 ```
+## Rate Limiting
+llmist CLI enables **conservative rate limiting by default** to prevent hitting provider API limits and avoid agent crashes.
+### Default Behavior
+Rate limits are **automatically configured** based on your model's provider:
+| Provider   | RPM | TPM       | Daily Tokens |
+|------------|-----|-----------|--------------|
+| Anthropic  | 50  | 40,000    | -            |
+| OpenAI     | 3   | 40,000    | -            |
+| Gemini     | 15  | 1,000,000 | 1,500,000    |
+These defaults are **conservative** (protecting free tier users). Paid tier users should configure higher limits.
+### Configuration
+**TOML Config** (`~/.llmist/cli.toml` or project `llmist.toml`):
+```toml
+# Global rate limits (applies to all commands)
+[rate-limits]
+enabled = true
+requests-per-minute = 100
+tokens-per-minute = 200_000
+safety-margin = 0.8  # Throttle at 80% of limit
+# Profile-specific overrides
+[profile-gemini]
+model = "gemini:flash"
+[profile-gemini.rate-limits]
+requests-per-minute = 15
+tokens-per-day = 1_500_000
+# Disable rate limiting for a profile
+[profile-fast]
+model = "gpt4o"
+[profile-fast.rate-limits]
+enabled = false
+```
+**CLI Flags** (override all config):
+```bash
+# Override limits
+llmist agent --rate-limit-rpm 100 --rate-limit-tpm 200000 "your prompt"
+# Disable rate limiting
+llmist agent --no-rate-limit "your prompt"
+# Configure retry behavior
+llmist agent --max-retries 5 --retry-min-timeout 2000 "your prompt"
+# Disable retry
+llmist agent --no-retry "your prompt"
+```
+### TUI Feedback
+The Terminal UI provides real-time feedback when rate limiting is active:
+- **Status Bar**: Shows `⏸ Throttled Xs` when waiting for rate limits
+- **Status Bar**: Shows `🔄 Retry 2/3` during retry attempts
+- **Conversation Log**: Persistent entries like:
+  ```
+  ⏸ Rate limit approaching (45 RPM, 85K TPM), waiting 5s...
+  🔄 Request failed (attempt 1/3), retrying...
+  ```
+### Finding Your Tier Limits
+To configure optimal limits for your API tier:
+- **Anthropic**: [Rate Limits Documentation](https://docs.anthropic.com/en/api/rate-limits)
+- **OpenAI**: [Rate Limits Guide](https://platform.openai.com/docs/guides/rate-limits)
+- **Gemini**: [Quota Documentation](https://ai.google.dev/gemini-api/docs/quota)
+Check your provider dashboard for current tier limits, then update your `llmist.toml` accordingly.
 ## Terminal UI
 The TUI provides an interactive interface to browse execution history, inspect raw payloads, and debug agent runs: