clawperf 0.2.2__tar.gz → 0.2.4__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (31) hide show
  1. {clawperf-0.2.2 → clawperf-0.2.4}/PKG-INFO +5 -4
  2. {clawperf-0.2.2 → clawperf-0.2.4}/README.md +4 -3
  3. {clawperf-0.2.2 → clawperf-0.2.4}/src/clawperf/__init__.py +1 -1
  4. {clawperf-0.2.2 → clawperf-0.2.4}/src/clawperf.egg-info/PKG-INFO +5 -4
  5. {clawperf-0.2.2 → clawperf-0.2.4}/LICENSE +0 -0
  6. {clawperf-0.2.2 → clawperf-0.2.4}/pyproject.toml +0 -0
  7. {clawperf-0.2.2 → clawperf-0.2.4}/setup.cfg +0 -0
  8. {clawperf-0.2.2 → clawperf-0.2.4}/src/clawperf/__main__.py +0 -0
  9. {clawperf-0.2.2 → clawperf-0.2.4}/src/clawperf/cli.py +0 -0
  10. {clawperf-0.2.2 → clawperf-0.2.4}/src/clawperf/config.py +0 -0
  11. {clawperf-0.2.2 → clawperf-0.2.4}/src/clawperf/context.py +0 -0
  12. {clawperf-0.2.2 → clawperf-0.2.4}/src/clawperf/logging_setup.py +0 -0
  13. {clawperf-0.2.2 → clawperf-0.2.4}/src/clawperf/mock_server.py +0 -0
  14. {clawperf-0.2.2 → clawperf-0.2.4}/src/clawperf/runner.py +0 -0
  15. {clawperf-0.2.2 → clawperf-0.2.4}/src/clawperf/scheduler.py +0 -0
  16. {clawperf-0.2.2 → clawperf-0.2.4}/src/clawperf/system_metrics.py +0 -0
  17. {clawperf-0.2.2 → clawperf-0.2.4}/src/clawperf/tokenizer.py +0 -0
  18. {clawperf-0.2.2 → clawperf-0.2.4}/src/clawperf.egg-info/SOURCES.txt +0 -0
  19. {clawperf-0.2.2 → clawperf-0.2.4}/src/clawperf.egg-info/dependency_links.txt +0 -0
  20. {clawperf-0.2.2 → clawperf-0.2.4}/src/clawperf.egg-info/entry_points.txt +0 -0
  21. {clawperf-0.2.2 → clawperf-0.2.4}/src/clawperf.egg-info/requires.txt +0 -0
  22. {clawperf-0.2.2 → clawperf-0.2.4}/src/clawperf.egg-info/top_level.txt +0 -0
  23. {clawperf-0.2.2 → clawperf-0.2.4}/tests/test_aggregation.py +0 -0
  24. {clawperf-0.2.2 → clawperf-0.2.4}/tests/test_config.py +0 -0
  25. {clawperf-0.2.2 → clawperf-0.2.4}/tests/test_context.py +0 -0
  26. {clawperf-0.2.2 → clawperf-0.2.4}/tests/test_history.py +0 -0
  27. {clawperf-0.2.2 → clawperf-0.2.4}/tests/test_mock_server.py +0 -0
  28. {clawperf-0.2.2 → clawperf-0.2.4}/tests/test_runner_math.py +0 -0
  29. {clawperf-0.2.2 → clawperf-0.2.4}/tests/test_runner_utils.py +0 -0
  30. {clawperf-0.2.2 → clawperf-0.2.4}/tests/test_scheduler.py +0 -0
  31. {clawperf-0.2.2 → clawperf-0.2.4}/tests/test_system_metrics.py +0 -0
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: clawperf
3
- Version: 0.2.2
3
+ Version: 0.2.4
4
4
  Summary: Performance benchmarking tool for LLM Serving backends with multi-turn long-context workloads
5
5
  Author: ClawPerf Contributors
6
6
  License-Expression: Apache-2.0
@@ -281,15 +281,14 @@ Each simulated user maintains an independent conversation state with its own gro
281
281
 
282
282
  Each user's context follows this structure:
283
283
 
284
- ```
285
- [System Prefix] [User Prefix] [History] [Current Input]
286
- ```
284
+ ![Context model and compaction](docs/context_model.svg)
287
285
 
288
286
  When context reaches `--max-context-tokens`, append-mode compaction fires:
289
287
 
290
288
  1. The base context (system + user prefix + input, without history) is checked first. If it already exceeds the limit, compaction is skipped and the turn is marked as `context_overflow` — this prevents infinite compaction loops.
291
289
  2. Otherwise, history is cleared and the user prefix grows by `--compaction-prefix-increment` tokens.
292
290
  3. New random content fills the enlarged user prefix.
291
+ 4. If the grown base still exceeds the limit, the prefix growth is **reverted** (history cleared only) so the user isn't permanently trapped in overflow.
293
292
 
294
293
  This simulates how real LLM serving systems handle context overflow with prefix caching.
295
294
 
@@ -304,6 +303,8 @@ The mock server simulates vLLM's KV-block prefix cache using a trie:
304
303
 
305
304
  ## User Arrival Scheduling
306
305
 
306
+ ![User arrival patterns](docs/arrival_patterns.svg)
307
+
307
308
  - **burst**: All users start immediately
308
309
  - **steady:2**: Users arrive every 2 seconds
309
310
  - **poisson:0.5**: Users arrive following a Poisson process with rate 0.5
@@ -246,15 +246,14 @@ Each simulated user maintains an independent conversation state with its own gro
246
246
 
247
247
  Each user's context follows this structure:
248
248
 
249
- ```
250
- [System Prefix] [User Prefix] [History] [Current Input]
251
- ```
249
+ ![Context model and compaction](docs/context_model.svg)
252
250
 
253
251
  When context reaches `--max-context-tokens`, append-mode compaction fires:
254
252
 
255
253
  1. The base context (system + user prefix + input, without history) is checked first. If it already exceeds the limit, compaction is skipped and the turn is marked as `context_overflow` — this prevents infinite compaction loops.
256
254
  2. Otherwise, history is cleared and the user prefix grows by `--compaction-prefix-increment` tokens.
257
255
  3. New random content fills the enlarged user prefix.
256
+ 4. If the grown base still exceeds the limit, the prefix growth is **reverted** (history cleared only) so the user isn't permanently trapped in overflow.
258
257
 
259
258
  This simulates how real LLM serving systems handle context overflow with prefix caching.
260
259
 
@@ -269,6 +268,8 @@ The mock server simulates vLLM's KV-block prefix cache using a trie:
269
268
 
270
269
  ## User Arrival Scheduling
271
270
 
271
+ ![User arrival patterns](docs/arrival_patterns.svg)
272
+
272
273
  - **burst**: All users start immediately
273
274
  - **steady:2**: Users arrive every 2 seconds
274
275
  - **poisson:0.5**: Users arrive following a Poisson process with rate 0.5
@@ -5,4 +5,4 @@ and adds multi-turn long-context workloads with append-mode compaction,
5
5
  user arrival scheduling, and system metrics polling.
6
6
  """
7
7
 
8
- __version__ = "0.2.2"
8
+ __version__ = "0.2.4"
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: clawperf
3
- Version: 0.2.2
3
+ Version: 0.2.4
4
4
  Summary: Performance benchmarking tool for LLM Serving backends with multi-turn long-context workloads
5
5
  Author: ClawPerf Contributors
6
6
  License-Expression: Apache-2.0
@@ -281,15 +281,14 @@ Each simulated user maintains an independent conversation state with its own gro
281
281
 
282
282
  Each user's context follows this structure:
283
283
 
284
- ```
285
- [System Prefix] [User Prefix] [History] [Current Input]
286
- ```
284
+ ![Context model and compaction](docs/context_model.svg)
287
285
 
288
286
  When context reaches `--max-context-tokens`, append-mode compaction fires:
289
287
 
290
288
  1. The base context (system + user prefix + input, without history) is checked first. If it already exceeds the limit, compaction is skipped and the turn is marked as `context_overflow` — this prevents infinite compaction loops.
291
289
  2. Otherwise, history is cleared and the user prefix grows by `--compaction-prefix-increment` tokens.
292
290
  3. New random content fills the enlarged user prefix.
291
+ 4. If the grown base still exceeds the limit, the prefix growth is **reverted** (history cleared only) so the user isn't permanently trapped in overflow.
293
292
 
294
293
  This simulates how real LLM serving systems handle context overflow with prefix caching.
295
294
 
@@ -304,6 +303,8 @@ The mock server simulates vLLM's KV-block prefix cache using a trie:
304
303
 
305
304
  ## User Arrival Scheduling
306
305
 
306
+ ![User arrival patterns](docs/arrival_patterns.svg)
307
+
307
308
  - **burst**: All users start immediately
308
309
  - **steady:2**: Users arrive every 2 seconds
309
310
  - **poisson:0.5**: Users arrive following a Poisson process with rate 0.5
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes