PyPI - clawperf - Versions diffs - 0.2.2__tar.gz → 0.2.4__tar.gz - Mend

clawperf 0.2.2tar.gz → 0.2.4tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (31) hide show

{clawperf-0.2.2 → clawperf-0.2.4}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: clawperf
-Version: 0.2.2
+Version: 0.2.4
 Summary: Performance benchmarking tool for LLM Serving backends with multi-turn long-context workloads
 Author: ClawPerf Contributors
 License-Expression: Apache-2.0
@@ -281,15 +281,14 @@ Each simulated user maintains an independent conversation state with its own gro
 Each user's context follows this structure:
-```
-[System Prefix] [User Prefix] [History] [Current Input]
-```
+![Context model and compaction](docs/context_model.svg)
 When context reaches `--max-context-tokens`, append-mode compaction fires:
 1. The base context (system + user prefix + input, without history) is checked first. If it already exceeds the limit, compaction is skipped and the turn is marked as `context_overflow` — this prevents infinite compaction loops.
 2. Otherwise, history is cleared and the user prefix grows by `--compaction-prefix-increment` tokens.
 3. New random content fills the enlarged user prefix.
+4. If the grown base still exceeds the limit, the prefix growth is **reverted** (history cleared only) so the user isn't permanently trapped in overflow.
 This simulates how real LLM serving systems handle context overflow with prefix caching.
@@ -304,6 +303,8 @@ The mock server simulates vLLM's KV-block prefix cache using a trie:
 ## User Arrival Scheduling
+![User arrival patterns](docs/arrival_patterns.svg)
 - **burst**: All users start immediately
 - **steady:2**: Users arrive every 2 seconds
 - **poisson:0.5**: Users arrive following a Poisson process with rate 0.5

{clawperf-0.2.2 → clawperf-0.2.4}/README.md RENAMED Viewed

@@ -246,15 +246,14 @@ Each simulated user maintains an independent conversation state with its own gro
 Each user's context follows this structure:
-```
-[System Prefix] [User Prefix] [History] [Current Input]
-```
+![Context model and compaction](docs/context_model.svg)
 When context reaches `--max-context-tokens`, append-mode compaction fires:
 1. The base context (system + user prefix + input, without history) is checked first. If it already exceeds the limit, compaction is skipped and the turn is marked as `context_overflow` — this prevents infinite compaction loops.
 2. Otherwise, history is cleared and the user prefix grows by `--compaction-prefix-increment` tokens.
 3. New random content fills the enlarged user prefix.
+4. If the grown base still exceeds the limit, the prefix growth is **reverted** (history cleared only) so the user isn't permanently trapped in overflow.
 This simulates how real LLM serving systems handle context overflow with prefix caching.
@@ -269,6 +268,8 @@ The mock server simulates vLLM's KV-block prefix cache using a trie:
 ## User Arrival Scheduling
+![User arrival patterns](docs/arrival_patterns.svg)
 - **burst**: All users start immediately
 - **steady:2**: Users arrive every 2 seconds
 - **poisson:0.5**: Users arrive following a Poisson process with rate 0.5

{clawperf-0.2.2 → clawperf-0.2.4}/src/clawperf/__init__.py RENAMED Viewed

@@ -5,4 +5,4 @@ and adds multi-turn long-context workloads with append-mode compaction,
 user arrival scheduling, and system metrics polling.
 """
-__version__ = "0.2.2"
+__version__ = "0.2.4"

{clawperf-0.2.2 → clawperf-0.2.4}/src/clawperf.egg-info/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: clawperf
-Version: 0.2.2
+Version: 0.2.4
 Summary: Performance benchmarking tool for LLM Serving backends with multi-turn long-context workloads
 Author: ClawPerf Contributors
 License-Expression: Apache-2.0
@@ -281,15 +281,14 @@ Each simulated user maintains an independent conversation state with its own gro
 Each user's context follows this structure:
-```
-[System Prefix] [User Prefix] [History] [Current Input]
-```
+![Context model and compaction](docs/context_model.svg)
 When context reaches `--max-context-tokens`, append-mode compaction fires:
 1. The base context (system + user prefix + input, without history) is checked first. If it already exceeds the limit, compaction is skipped and the turn is marked as `context_overflow` — this prevents infinite compaction loops.
 2. Otherwise, history is cleared and the user prefix grows by `--compaction-prefix-increment` tokens.
 3. New random content fills the enlarged user prefix.
+4. If the grown base still exceeds the limit, the prefix growth is **reverted** (history cleared only) so the user isn't permanently trapped in overflow.
 This simulates how real LLM serving systems handle context overflow with prefix caching.
@@ -304,6 +303,8 @@ The mock server simulates vLLM's KV-block prefix cache using a trie:
 ## User Arrival Scheduling
+![User arrival patterns](docs/arrival_patterns.svg)
 - **burst**: All users start immediately
 - **steady:2**: Users arrive every 2 seconds
 - **poisson:0.5**: Users arrive following a Poisson process with rate 0.5