clawmux 0.3.9 β†’ 0.3.11

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -5,7 +5,7 @@ Smart model routing + context compression proxy for OpenClaw.
5
5
 
6
6
  ## Features
7
7
 
8
- - 🧠 **Smart Routing**: Embedding-based semantic classification β†’ LIGHT/MEDIUM/HEAVY tier β†’ automatic model selection
8
+ - 🧠 **Smart Routing**: Signal-based escalation β†’ LIGHT tries first, auto-escalates to MEDIUM/HEAVY when needed
9
9
  - πŸ“¦ **Context Compression**: Preemptive background summarization at configurable threshold (default 75%)
10
10
  - πŸ”Œ **All Providers**: Supports all OpenClaw providers via 6 API format adapters
11
11
  - ⚑ **Zero Config Auth**: Uses OpenClaw's existing provider credentials β€” no separate API keys
@@ -88,17 +88,18 @@ openclaw provider clawmux
88
88
  ```
89
89
  OpenClaw β†’ ClawMux Proxy (localhost:3456) β†’ Upstream Provider(s)
90
90
  β”‚
91
- β”œβ”€β”€ 1. Classify complexity (embedding model, ~4ms first run, <1ms cached)
92
- β”œβ”€β”€ 2. Select tier β†’ LIGHT/MEDIUM/HEAVY
91
+ β”œβ”€β”€ 1. Start at LIGHT tier (or escalated tier from memory)
92
+ β”œβ”€β”€ 2. Inject escalation instruction if LIGHT/MEDIUM
93
93
  β”œβ”€β”€ 3. Compress context if threshold exceeded
94
- β”œβ”€β”€ 4. Translate request format if cross-provider
95
- β”œβ”€β”€ 5. Forward to upstream with correct model
96
- └── 6. Translate response back to original format
94
+ β”œβ”€β”€ 4. Forward to upstream with correct model
95
+ β”œβ”€β”€ 5. Detect escalation signal in response
96
+ β”œβ”€β”€ 6. If signal found β†’ retry at next tier (max 3 attempts)
97
+ └── 7. Translate response back to original format
97
98
  ```
98
99
 
99
- **Routing tiers** map to model IDs you configure. A local embedding model (`Xenova/multilingual-e5-small`) classifies the semantic complexity of each request using nearest-centroid classification (~8ms p50), supporting both Korean and English. Short queries are detected by a lightweight heuristic and routed to LIGHT tier directly. No external API calls are needed for classification.
100
+ **Signal-based escalation** routes all requests to the LIGHT model first. If the LIGHT model cannot handle the request, it emits `===CLAWMUX_ESCALATE===` and ClawMux automatically retries at the next tier (LIGHT→MEDIUM→HEAVY). Sessions that previously escalated are remembered for up to 2 hours (5 min idle timeout), so follow-up requests go directly to the appropriate tier.
100
101
 
101
- **Low confidence fallback**: When the classifier's confidence is low, the request is routed to MEDIUM tier. This prevents unreliable classifications from sending requests to an inappropriate tier β€” MEDIUM provides a safe cost/quality balance.
102
+ **Kill switch**: Set `routing.escalation.enabled` to `false` in your config to disable escalation and always use the MEDIUM model. This is useful for debugging or when you want predictable routing.
102
103
 
103
104
  **Context compression** runs in the background after each response. When the conversation approaches the configured threshold, ClawMux summarizes older messages before the next request goes out. This keeps costs down on long conversations without interrupting the flow.
104
105
 
@@ -15,9 +15,15 @@
15
15
  "HEAVY": "anthropic/claude-opus-4-20250514",
16
16
  "_comment_models": "Model IDs in 'provider/model' format for each routing tier. Do NOT use provider names starting with 'clawmux-' β€” this causes infinite loops."
17
17
  },
18
- "scoring": {
19
- "confidenceThreshold": 0.7,
20
- "_comment_confidenceThreshold": "Classification confidence below this threshold triggers a fallback to MEDIUM tier (range: 0.0–1.0)."
18
+ "escalation": {
19
+ "enabled": true,
20
+ "_comment_enabled": "Set to false to disable signal-based escalation and always route to MEDIUM tier.",
21
+ "activeThresholdMs": 300000,
22
+ "_comment_activeThresholdMs": "Idle timeout (ms) before cooling down escalated tier. 300000 = 5 minutes.",
23
+ "maxLifetimeMs": 7200000,
24
+ "_comment_maxLifetimeMs": "Maximum lifetime (ms) for escalated tier. 7200000 = 2 hours.",
25
+ "fingerprintRootCount": 5,
26
+ "_comment_fingerprintRootCount": "Number of initial messages used to identify a session for escalation memory."
21
27
  },
22
28
  "contextWindows": {
23
29
  "_comment_contextWindows": "Optional per-model context window overrides (in tokens). Keys use 'provider/model' format.",