benchforge 0.1.2 → 0.1.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +15 -129
- package/dist/{TimingUtils-D4z1jpp2.mjs → TimingUtils-ClclVQ7E.mjs} +276 -278
- package/dist/TimingUtils-ClclVQ7E.mjs.map +1 -0
- package/dist/bin/benchforge.mjs +1 -1
- package/dist/index.d.mts +9 -5
- package/dist/index.mjs +2 -2
- package/dist/runners/WorkerScript.mjs +1 -1
- package/dist/{src-D7zxOFGA.mjs → src-JGOI6_Sc.mjs} +19 -17
- package/dist/src-JGOI6_Sc.mjs.map +1 -0
- package/package.json +1 -1
- package/src/StandardSections.ts +1 -8
- package/src/browser/BrowserHeapSampler.ts +3 -2
- package/src/cli/CliArgs.ts +4 -3
- package/src/cli/RunBenchCLI.ts +1 -4
- package/src/runners/BasicRunner.ts +0 -4
- package/dist/TimingUtils-D4z1jpp2.mjs.map +0 -1
- package/dist/src-D7zxOFGA.mjs.map +0 -1
package/README.md
CHANGED
|
@@ -151,18 +151,17 @@ The `--profile` flag executes exactly one iteration with no warmup, making it id
|
|
|
151
151
|
Results are displayed in a formatted table:
|
|
152
152
|
|
|
153
153
|
```
|
|
154
|
-
|
|
155
|
-
║ │ time │
|
|
156
|
-
║ name │ mean Δ% CI p50 p99 │
|
|
157
|
-
|
|
158
|
-
║ quicksort │ 0.17 +5.5% [+4.7%, +6.2%] 0.15 0.63 │
|
|
159
|
-
║ insertion sort │ 0.24 +25.9% [+25.3%, +27.4%] 0.18 0.36 │
|
|
160
|
-
║ --> native sort │ 0.16 0.15 0.41 │
|
|
161
|
-
|
|
154
|
+
╔═════════════════╤═══════════════════════════════════════════╤═════════╗
|
|
155
|
+
║ │ time │ ║
|
|
156
|
+
║ name │ mean Δ% CI p50 p99 │ runs ║
|
|
157
|
+
╟─────────────────┼───────────────────────────────────────────┼─────────╢
|
|
158
|
+
║ quicksort │ 0.17 +5.5% [+4.7%, +6.2%] 0.15 0.63 │ 1,134 ║
|
|
159
|
+
║ insertion sort │ 0.24 +25.9% [+25.3%, +27.4%] 0.18 0.36 │ 807 ║
|
|
160
|
+
║ --> native sort │ 0.16 0.15 0.41 │ 1,210 ║
|
|
161
|
+
╚═════════════════╧═══════════════════════════════════════════╧═════════╝
|
|
162
162
|
```
|
|
163
163
|
|
|
164
164
|
- **Δ% CI**: Percentage difference from baseline with bootstrap confidence interval
|
|
165
|
-
- **conv%**: Convergence percentage (100% = stable measurements)
|
|
166
165
|
|
|
167
166
|
### HTML
|
|
168
167
|
|
|
@@ -284,136 +283,23 @@ V8's sampling profiler uses Poisson-distributed sampling. When an allocation occ
|
|
|
284
283
|
- Node.js 22.6+ (for native TypeScript support)
|
|
285
284
|
- Use `--expose-gc --allow-natives-syntax` flags for garbage collection monitoring and V8 native functions
|
|
286
285
|
|
|
287
|
-
## Adaptive Mode
|
|
286
|
+
## Adaptive Mode (Experimental)
|
|
288
287
|
|
|
289
|
-
Adaptive mode automatically adjusts
|
|
288
|
+
Adaptive mode (`--adaptive`) automatically adjusts iteration count until measurements stabilize. The algorithm is still being tuned — use `--help` for available options.
|
|
290
289
|
|
|
291
|
-
|
|
290
|
+
## Interpreting Results
|
|
292
291
|
|
|
293
|
-
|
|
294
|
-
# Enable adaptive benchmarking with default settings
|
|
295
|
-
simple-cli.ts --adaptive
|
|
296
|
-
|
|
297
|
-
# Customize time limits
|
|
298
|
-
simple-cli.ts --adaptive --time 60 --min-time 5
|
|
299
|
-
|
|
300
|
-
# Combine with other options
|
|
301
|
-
simple-cli.ts --adaptive --filter "quicksort"
|
|
302
|
-
```
|
|
303
|
-
|
|
304
|
-
### CLI Options for Adaptive Mode
|
|
305
|
-
|
|
306
|
-
- `--adaptive` - Enable adaptive sampling mode
|
|
307
|
-
- `--min-time <seconds>` - Minimum time before convergence can stop (default: 1s)
|
|
308
|
-
- `--convergence <percent>` - Confidence threshold 0-100 (default: 95)
|
|
309
|
-
- `--time <seconds>` - Maximum time limit (default: 20s in adaptive mode)
|
|
310
|
-
|
|
311
|
-
### How It Works
|
|
312
|
-
|
|
313
|
-
1. **Initial Sampling**: Collects initial batch of ~100 samples (includes warmup)
|
|
314
|
-
2. **Window Comparison**: Compares recent samples against previous window
|
|
315
|
-
3. **Stability Detection**: Checks median drift and outlier impact between windows
|
|
316
|
-
4. **Convergence**: Stops when both metrics are stable (<5% drift) or reaches threshold
|
|
317
|
-
|
|
318
|
-
Progress is shown during execution:
|
|
319
|
-
```
|
|
320
|
-
◊ quicksort: 75% confident (2.1s)
|
|
321
|
-
```
|
|
322
|
-
|
|
323
|
-
### Output with Adaptive Mode
|
|
324
|
-
|
|
325
|
-
```
|
|
326
|
-
╔═════════════════╤═════════════════════════════════════════════╤═══════╤═════════╤══════╗
|
|
327
|
-
║ │ time │ │ │ ║
|
|
328
|
-
║ name │ median Δ% CI mean p99 │ conv% │ runs │ time ║
|
|
329
|
-
╟─────────────────┼─────────────────────────────────────────────┼───────┼─────────┼──────╢
|
|
330
|
-
║ quicksort │ 0.17 +17.3% [+15.4%, +20.0%] 0.20 0.65 │ 100% │ 526 │ 0.0s ║
|
|
331
|
-
║ insertion sort │ 0.18 +24.2% [+23.9%, +24.6%] 0.19 0.36 │ 100% │ 529 │ 0.0s ║
|
|
332
|
-
║ --> native sort │ 0.15 0.15 0.25 │ 100% │ 647 │ 0.0s ║
|
|
333
|
-
╚═════════════════╧═════════════════════════════════════════════╧═══════╧═════════╧══════╝
|
|
334
|
-
```
|
|
335
|
-
|
|
336
|
-
- **conv%**: Convergence percentage (100% = stable measurements)
|
|
337
|
-
- **time**: Total sampling duration for that benchmark
|
|
338
|
-
|
|
339
|
-
## Statistical Considerations: Mean vs Median
|
|
340
|
-
|
|
341
|
-
### When to Use Mean with Confidence Intervals
|
|
342
|
-
|
|
343
|
-
**Best for:**
|
|
344
|
-
- **Normally distributed data** - When benchmark times follow a bell curve
|
|
345
|
-
- **Statistical comparison** - Comparing performance between implementations
|
|
346
|
-
- **Throughput analysis** - Understanding average system performance
|
|
347
|
-
- **Resource planning** - Estimating typical resource usage
|
|
348
|
-
|
|
349
|
-
**Advantages:**
|
|
350
|
-
- Provides confidence intervals for statistical significance
|
|
351
|
-
- Captures the full distribution including outliers
|
|
352
|
-
- Better for detecting small but consistent performance differences
|
|
353
|
-
- Standard in academic performance research
|
|
354
|
-
|
|
355
|
-
**Example use cases:**
|
|
356
|
-
- Comparing algorithm implementations
|
|
357
|
-
- Measuring API response times under normal load
|
|
358
|
-
- Evaluating compiler optimizations
|
|
359
|
-
- Benchmarking pure computational functions
|
|
360
|
-
|
|
361
|
-
### When to Use Median (p50)
|
|
362
|
-
|
|
363
|
-
**Best for:**
|
|
364
|
-
- **Skewed distributions** - When outliers are common
|
|
365
|
-
- **Latency-sensitive applications** - Where typical user experience matters
|
|
366
|
-
- **Noisy environments** - Systems with unpredictable interference
|
|
367
|
-
- **Service Level Agreements** - "50% of requests complete within X ms"
|
|
368
|
-
|
|
369
|
-
**Advantages:**
|
|
370
|
-
- Robust to outliers and system noise
|
|
371
|
-
- Better represents "typical" performance
|
|
372
|
-
- More stable in virtualized/cloud environments
|
|
373
|
-
- Less affected by GC pauses and OS scheduling
|
|
374
|
-
|
|
375
|
-
**Example use cases:**
|
|
376
|
-
- Web server response times
|
|
377
|
-
- Database query performance
|
|
378
|
-
- UI responsiveness metrics
|
|
379
|
-
- Real-time system benchmarks
|
|
380
|
-
|
|
381
|
-
### Interpreting Results
|
|
382
|
-
|
|
383
|
-
#### Baseline Comparison (Δ% CI)
|
|
292
|
+
### Baseline Comparison (Δ% CI)
|
|
384
293
|
```
|
|
385
294
|
0.17 +5.5% [+4.7%, +6.2%]
|
|
386
295
|
```
|
|
387
|
-
|
|
296
|
+
The benchmark is 5.5% slower than baseline, with a bootstrap confidence interval of [+4.7%, +6.2%].
|
|
388
297
|
|
|
389
|
-
|
|
298
|
+
### Percentiles
|
|
390
299
|
```
|
|
391
300
|
p50: 0.15ms, p99: 0.27ms
|
|
392
301
|
```
|
|
393
|
-
|
|
394
|
-
|
|
395
|
-
### Practical Guidelines
|
|
396
|
-
|
|
397
|
-
1. **Use adaptive mode when:**
|
|
398
|
-
- You want automatic convergence detection
|
|
399
|
-
- Benchmarks have varying execution times
|
|
400
|
-
- You need stable measurements without guessing iteration counts
|
|
401
|
-
|
|
402
|
-
2. **Use fixed iterations when:**
|
|
403
|
-
- Comparing across runs/machines (reproducibility)
|
|
404
|
-
- You know roughly how many samples you need
|
|
405
|
-
- Running in CI pipelines with time constraints
|
|
406
|
-
|
|
407
|
-
3. **Interpreting conv%:**
|
|
408
|
-
- 100% = measurements are stable
|
|
409
|
-
- <100% = still converging or high variance
|
|
410
|
-
- Red color indicates low confidence
|
|
411
|
-
|
|
412
|
-
### Statistical Notes
|
|
413
|
-
|
|
414
|
-
- **Bootstrap CI**: Baseline comparison uses permutation testing with bootstrap confidence intervals
|
|
415
|
-
- **Window Stability**: Adaptive mode compares sliding windows for median drift and outlier impact
|
|
416
|
-
- **Independence**: Assumes benchmark iterations are independent (use `--worker` flag for better isolation)
|
|
302
|
+
50% of runs completed in ≤0.15ms and 99% in ≤0.27ms. Use percentiles when you care about consistency and tail latencies.
|
|
417
303
|
|
|
418
304
|
## Understanding GC Time Measurements
|
|
419
305
|
|