loadtest 8.0.0 → 8.0.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/doc/tcp-sockets.md +121 -57
- package/package.json +1 -1
package/doc/tcp-sockets.md
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
# TCP Sockets Performance
|
|
2
2
|
|
|
3
|
-
To improve performance the author tried out using raw TCP sockets
|
|
3
|
+
To improve performance the author has tried out using raw TCP sockets
|
|
4
4
|
using the [net module](https://nodejs.org/api/net.html),
|
|
5
5
|
instead of the [HTTP module](https://nodejs.org/api/http.html).
|
|
6
6
|
This is the story of how it went.
|
|
@@ -71,12 +71,15 @@ Finally with keep-alive, 3-core load tester against Nginx:
|
|
|
71
71
|
|
|
72
72
|
## Implementations
|
|
73
73
|
|
|
74
|
-
All measurements against the test server using 3 cores
|
|
74
|
+
All measurements against the test server using 3 cores
|
|
75
|
+
(the default configuration for our six-core machine),
|
|
76
|
+
unless specified otherwise:
|
|
75
77
|
|
|
76
|
-
```
|
|
77
|
-
node bin/testserver.js
|
|
78
|
+
```console
|
|
79
|
+
$ node bin/testserver.js
|
|
78
80
|
```
|
|
79
81
|
|
|
82
|
+
Note that the first `$` is the console prompt.
|
|
80
83
|
Tests run on an Intel Core i5-12400T processor with 6 cores,
|
|
81
84
|
with Ubuntu 22.04.3 LTS (Xubuntu actually).
|
|
82
85
|
Performance numbers are shown in bold and as thousands of requests per second (krps):
|
|
@@ -92,15 +95,15 @@ so they are not to be compared between them.
|
|
|
92
95
|
|
|
93
96
|
First target performance is against [Apache `ab`](https://httpd.apache.org/docs/2.4/programs/ab.html).
|
|
94
97
|
|
|
95
|
-
```
|
|
96
|
-
ab -V
|
|
98
|
+
```console
|
|
99
|
+
$ ab -V
|
|
97
100
|
Version 2.3 <$Revision: 1879490 $>
|
|
98
101
|
```
|
|
99
102
|
|
|
100
103
|
With 10 concurrent connections without keep-alive.
|
|
101
104
|
|
|
102
|
-
```
|
|
103
|
-
ab -t 10 -c 10 http://localhost:7357/
|
|
105
|
+
```console
|
|
106
|
+
$ ab -t 10 -c 10 http://localhost:7357/
|
|
104
107
|
[...]
|
|
105
108
|
Requests per second: 20395.83 [#/sec] (mean)
|
|
106
109
|
```
|
|
@@ -113,14 +116,14 @@ Keep-alive cannot be used with `ab` as far as the author knows.
|
|
|
113
116
|
The [autocannon](https://www.npmjs.com/package/autocannon) package uses by default
|
|
114
117
|
10 concurrent connections with keep-alive enabled:
|
|
115
118
|
|
|
116
|
-
```
|
|
117
|
-
autocannon --version
|
|
119
|
+
```console
|
|
120
|
+
$ autocannon --version
|
|
118
121
|
autocannon v7.12.0
|
|
119
122
|
node v18.17.1
|
|
120
123
|
```
|
|
121
124
|
|
|
122
|
-
```
|
|
123
|
-
autocannon http://localhost:7357/
|
|
125
|
+
```console
|
|
126
|
+
$ autocannon http://localhost:7357/
|
|
124
127
|
[...]
|
|
125
128
|
┌───────────┬─────────┬─────────┬─────────┬─────────┬──────────┬─────────┬─────────┐
|
|
126
129
|
│ Stat │ 1% │ 2.5% │ 50% │ 97.5% │ Avg │ Stdev │ Min │
|
|
@@ -137,8 +140,8 @@ Keep-alive cannot be disabled with an option,
|
|
|
137
140
|
but it can be changed directly in the code by setting the header `Connection: close`.
|
|
138
141
|
Performance is near **8 krps**:
|
|
139
142
|
|
|
140
|
-
```
|
|
141
|
-
npx autocannon http://localhost:7357/
|
|
143
|
+
```console
|
|
144
|
+
$ npx autocannon http://localhost:7357/
|
|
142
145
|
[...]
|
|
143
146
|
┌───────────┬────────┬────────┬────────┬────────┬────────┬─────────┬────────┐
|
|
144
147
|
│ Stat │ 1% │ 2.5% │ 50% │ 97.5% │ Avg │ Stdev │ Min │
|
|
@@ -153,15 +156,15 @@ npx autocannon http://localhost:7357/
|
|
|
153
156
|
|
|
154
157
|
To complete the set we try `wrk`:
|
|
155
158
|
|
|
156
|
-
```
|
|
157
|
-
wrk -v
|
|
159
|
+
```console
|
|
160
|
+
$ wrk -v
|
|
158
161
|
wrk debian/4.1.0-3build1 [epoll]
|
|
159
162
|
```
|
|
160
163
|
|
|
161
164
|
With a single thread (core) for fair comparison we get almost **73 krps**:
|
|
162
165
|
|
|
163
|
-
```
|
|
164
|
-
wrk http://localhost:7357/ -t 1
|
|
166
|
+
```console
|
|
167
|
+
$ wrk http://localhost:7357/ -t 1
|
|
165
168
|
[...]
|
|
166
169
|
Requests/sec: 72639.52
|
|
167
170
|
```
|
|
@@ -173,8 +176,8 @@ running on one core.
|
|
|
173
176
|
|
|
174
177
|
Without keep-alive close to **6 krps**:
|
|
175
178
|
|
|
176
|
-
```
|
|
177
|
-
node bin/loadtest.js http://localhost:7357 --cores 1
|
|
179
|
+
```console
|
|
180
|
+
$ node bin/loadtest.js http://localhost:7357 --cores 1
|
|
178
181
|
[...]
|
|
179
182
|
Effective rps: 6342
|
|
180
183
|
```
|
|
@@ -182,8 +185,8 @@ Effective rps: 6342
|
|
|
182
185
|
Very far away from the 20 krps given by `ab`.
|
|
183
186
|
With keep-alive:
|
|
184
187
|
|
|
185
|
-
```
|
|
186
|
-
node bin/loadtest.js http://localhost:7357 --cores 1 -k
|
|
188
|
+
```console
|
|
189
|
+
$ node bin/loadtest.js http://localhost:7357 --cores 1 -k
|
|
187
190
|
[...]
|
|
188
191
|
Effective rps: 20490
|
|
189
192
|
```
|
|
@@ -198,7 +201,7 @@ For the first implementation we want to learn if the bare sockets implementation
|
|
|
198
201
|
In this naïve implementation we open the socket,
|
|
199
202
|
send a short canned request without taking into account any parameters or headers:
|
|
200
203
|
|
|
201
|
-
```
|
|
204
|
+
```js
|
|
202
205
|
this.params.request = `${this.params.method} ${this.params.path} HTTP/1.1\r\n\r\n`
|
|
203
206
|
```
|
|
204
207
|
|
|
@@ -207,8 +210,8 @@ just assume that it is received as one packet
|
|
|
207
210
|
and disregard it.
|
|
208
211
|
The results are almost **80 krps**:
|
|
209
212
|
|
|
210
|
-
```
|
|
211
|
-
node bin/loadtest.js http://localhost:7357 --cores 1 --tcp
|
|
213
|
+
```console
|
|
214
|
+
$ node bin/loadtest.js http://localhost:7357 --cores 1 --tcp
|
|
212
215
|
[...]
|
|
213
216
|
Effective rps: 79997
|
|
214
217
|
```
|
|
@@ -339,8 +342,8 @@ which can cause memory issues when size varies constantly.
|
|
|
339
342
|
|
|
340
343
|
Now we can go back to using multiple cores:
|
|
341
344
|
|
|
342
|
-
```
|
|
343
|
-
node bin/loadtest.js http://localhost:7357 --cores 3 --tcp
|
|
345
|
+
```console
|
|
346
|
+
$ node bin/loadtest.js http://localhost:7357 --cores 3 --tcp
|
|
344
347
|
[...]
|
|
345
348
|
Effective rps: 115379
|
|
346
349
|
```
|
|
@@ -352,16 +355,16 @@ Now we go up to **115 krps**!
|
|
|
352
355
|
What about regular `http` connections without the `--tcp` option?
|
|
353
356
|
It stays at **54 krps**:
|
|
354
357
|
|
|
355
|
-
```
|
|
356
|
-
node bin/loadtest.js http://localhost:7357/ -k --cores 3
|
|
358
|
+
```console
|
|
359
|
+
$ node bin/loadtest.js http://localhost:7357/ -k --cores 3
|
|
357
360
|
[...]
|
|
358
361
|
Effective rps: 54432
|
|
359
362
|
```
|
|
360
363
|
|
|
361
364
|
For comparison we try using `autocannon` also with three workers:
|
|
362
365
|
|
|
363
|
-
```
|
|
364
|
-
autocannon http://localhost:7357/ -w 3 -c 30
|
|
366
|
+
```console
|
|
367
|
+
$ autocannon http://localhost:7357/ -w 3 -c 30
|
|
365
368
|
[...]
|
|
366
369
|
┌───────────┬───────┬───────┬─────────┬─────────┬──────────┬─────────┬───────┐
|
|
367
370
|
│ Stat │ 1% │ 2.5% │ 50% │ 97.5% │ Avg │ Stdev │ Min │
|
|
@@ -375,8 +378,8 @@ autocannon http://localhost:7357/ -w 3 -c 30
|
|
|
375
378
|
Median rate (50% percentile) is **107 krps**.
|
|
376
379
|
Now `wrk` which yields **118 krps**:
|
|
377
380
|
|
|
378
|
-
```
|
|
379
|
-
wrk http://localhost:7357/ -t 3
|
|
381
|
+
```console
|
|
382
|
+
$ wrk http://localhost:7357/ -t 3
|
|
380
383
|
[...]
|
|
381
384
|
Requests/sec: 118164.03
|
|
382
385
|
```
|
|
@@ -396,26 +399,87 @@ take them to fulfill a request and them free them back to the pool.
|
|
|
396
399
|
After the refactoring we get some bad news:
|
|
397
400
|
performance has dropped down back to **60 krps**!
|
|
398
401
|
|
|
399
|
-
```
|
|
400
|
-
node bin/loadtest.js http://localhost:7357/ --tcp --cores 1
|
|
402
|
+
```console
|
|
403
|
+
$ node bin/loadtest.js http://localhost:7357/ --tcp --cores 1
|
|
401
404
|
[...]
|
|
402
405
|
Effective rps: 60331
|
|
403
406
|
```
|
|
404
407
|
|
|
405
408
|
We need to do the painstaking exercise of getting back to our target performance.
|
|
406
409
|
|
|
410
|
+
### Profiling and Micro-profiling
|
|
411
|
+
|
|
412
|
+
We need to see where our microseconds (µs) are being spent.
|
|
413
|
+
Every microsecond counts: between 67 krps (15 µs per request) to 60 krps (16.7 µs per request)
|
|
414
|
+
the difference is... less than two microseconds.
|
|
415
|
+
|
|
416
|
+
We use the [`microprofiler`](https://github.com/alexfernandez/microprofiler) package,
|
|
417
|
+
which allows us to instrument the code that is sending and receiving requests.
|
|
418
|
+
For instance the function `makeRequest()` in `lib/tcpClient.js` which is sending out the request:
|
|
419
|
+
|
|
420
|
+
```js
|
|
421
|
+
import microprofiler from 'microprofiler'
|
|
422
|
+
|
|
423
|
+
[...]
|
|
424
|
+
makeRequest() {
|
|
425
|
+
if (!this.running) {
|
|
426
|
+
return
|
|
427
|
+
}
|
|
428
|
+
// first block: connect
|
|
429
|
+
const start1 = microprofiler.start()
|
|
430
|
+
this.connect()
|
|
431
|
+
microprofiler.measureFrom(start1, 'connect', 100000)
|
|
432
|
+
// second block: create parser
|
|
433
|
+
const start2 = microprofiler.start()
|
|
434
|
+
this.parser = new Parser(this.params.method)
|
|
435
|
+
microprofiler.measureFrom(start2, 'create parser', 100000)
|
|
436
|
+
// third block: start measuring latency
|
|
437
|
+
const start3 = microprofiler.start()
|
|
438
|
+
const id = this.latency.begin();
|
|
439
|
+
this.currentId = id
|
|
440
|
+
microprofiler.measureFrom(start3, 'latency begin', 100000)
|
|
441
|
+
// fourth block: write to socket
|
|
442
|
+
const start4 = microprofiler.start()
|
|
443
|
+
this.connection.write(this.params.request)
|
|
444
|
+
microprofiler.measureFrom(start4, 'write', 100000)
|
|
445
|
+
}
|
|
446
|
+
```
|
|
447
|
+
|
|
448
|
+
Each of the four calls are instrumented.
|
|
449
|
+
When this code runs the output has a lot of lines like this:
|
|
450
|
+
|
|
451
|
+
```console
|
|
452
|
+
$ node bin/loadtest.js http://localhost:7357/ --tcp --cores 1
|
|
453
|
+
[...]
|
|
454
|
+
Profiling connect: 100000 requests, mean time: 1.144 µs, rps: 6948026
|
|
455
|
+
Profiling create parser: 100000 requests, mean time: 0.152 µs, rps: 6582446
|
|
456
|
+
Profiling latency begin: 100000 requests, mean time: 1.138 µs, rps: 878664
|
|
457
|
+
Profiling write: 100000 requests, mean time: 5.669 µs, rps: 176409
|
|
458
|
+
```
|
|
459
|
+
|
|
460
|
+
Note that the results oscillate something like 0.3 µs from time to time,
|
|
461
|
+
so don't pay attention to very small differences.
|
|
462
|
+
Mean time is the interesting part: from 0.152 to create the parser µs to 5.669 µs for the write.
|
|
463
|
+
There is not a lot that we can do with the `connection.write()` call,
|
|
464
|
+
since it's directly speaking with the Node.js core;
|
|
465
|
+
we can try reducing the message size (not sending all headers)
|
|
466
|
+
but it doesn't seem to do much.
|
|
467
|
+
So we center on the `this.connect()` call,
|
|
468
|
+
which we can reduce to less than a µs.
|
|
469
|
+
Then we repeat again on the `finishRequest()` call to see if we can squeeze another microsecond there.
|
|
470
|
+
|
|
407
471
|
After some optimizing and a lot of bug fixing we are back to **68 krps**:
|
|
408
472
|
|
|
409
|
-
```
|
|
410
|
-
node bin/loadtest.js http://localhost:7357/ --tcp --cores 1
|
|
473
|
+
```console
|
|
474
|
+
$ node bin/loadtest.js http://localhost:7357/ --tcp --cores 1
|
|
411
475
|
[...]
|
|
412
476
|
Effective rps: 68466
|
|
413
477
|
```
|
|
414
478
|
|
|
415
479
|
With classic `loadtest` without the `--tcp` option, we still get **21 krps**:
|
|
416
480
|
|
|
417
|
-
```
|
|
418
|
-
node bin/loadtest.js http://localhost:7357/ -k --cores 1
|
|
481
|
+
```console
|
|
482
|
+
$ node bin/loadtest.js http://localhost:7357/ -k --cores 1
|
|
419
483
|
[...]
|
|
420
484
|
Effective rps: 21446
|
|
421
485
|
```
|
|
@@ -428,8 +492,8 @@ but it can be done by hacking the header as
|
|
|
428
492
|
We get a bit less performance than the barebones implementation,
|
|
429
493
|
almost **9 krps**:
|
|
430
494
|
|
|
431
|
-
```
|
|
432
|
-
node bin/loadtest.js http://localhost:7357/ --tcp --cores 1
|
|
495
|
+
```console
|
|
496
|
+
$ node bin/loadtest.js http://localhost:7357/ --tcp --cores 1
|
|
433
497
|
[...]
|
|
434
498
|
Effective rps: 8682
|
|
435
499
|
```
|
|
@@ -444,8 +508,8 @@ that starts a test server and then runs a load test with the parameters we have
|
|
|
444
508
|
Unfortunately the test server only uses one core (being run in API mode),
|
|
445
509
|
and maxes out quickly at **27 krps**.
|
|
446
510
|
|
|
447
|
-
```
|
|
448
|
-
node bin/tcp-performance.js
|
|
511
|
+
```console
|
|
512
|
+
$ node bin/tcp-performance.js
|
|
449
513
|
[...]
|
|
450
514
|
Effective rps: 27350
|
|
451
515
|
```
|
|
@@ -466,8 +530,8 @@ One part of the puzzle can be that it sends less headers,
|
|
|
466
530
|
without `user-agent` or `accepts`.
|
|
467
531
|
So we can do a quick trial of removing these headers in `loadtest`:
|
|
468
532
|
|
|
469
|
-
```
|
|
470
|
-
node bin/loadtest.js http://localhost:7357/ --tcp --cores 1
|
|
533
|
+
```console
|
|
534
|
+
$ node bin/loadtest.js http://localhost:7357/ --tcp --cores 1
|
|
471
535
|
[...]
|
|
472
536
|
Effective rps: 29694
|
|
473
537
|
```
|
|
@@ -481,8 +545,8 @@ Our last test is to run `loadtest` against a local Nginx server,
|
|
|
481
545
|
which is sure not to max out with only one core:
|
|
482
546
|
it goes to **61 krps**.
|
|
483
547
|
|
|
484
|
-
```
|
|
485
|
-
node bin/loadtest.js http://localhost:80/ --tcp --cores 1
|
|
548
|
+
```console
|
|
549
|
+
$ node bin/loadtest.js http://localhost:80/ --tcp --cores 1
|
|
486
550
|
[...]
|
|
487
551
|
Effective rps: 61059
|
|
488
552
|
```
|
|
@@ -490,8 +554,8 @@ Effective rps: 61059
|
|
|
490
554
|
While without `--tcp` we only get **19 krps**.
|
|
491
555
|
A similar test with `autocannon` yields only **40 krps**:
|
|
492
556
|
|
|
493
|
-
```
|
|
494
|
-
autocannon http://localhost:80/
|
|
557
|
+
```console
|
|
558
|
+
$ autocannon http://localhost:80/
|
|
495
559
|
[...]
|
|
496
560
|
┌───────────┬─────────┬─────────┬───────┬─────────┬─────────┬─────────┬─────────┐
|
|
497
561
|
│ Stat │ 1% │ 2.5% │ 50% │ 97.5% │ Avg │ Stdev │ Min │
|
|
@@ -507,16 +571,16 @@ than against our Node.js test server,
|
|
|
507
571
|
but the numbers are quite consistent.
|
|
508
572
|
While `wrk` takes the crown again with **111 krps**:
|
|
509
573
|
|
|
510
|
-
```
|
|
511
|
-
wrk http://localhost:80/ -t 1
|
|
574
|
+
```console
|
|
575
|
+
$ wrk http://localhost:80/ -t 1
|
|
512
576
|
[...]
|
|
513
577
|
Requests/sec: 111176.14
|
|
514
578
|
```
|
|
515
579
|
|
|
516
580
|
Running again `loadtest` with three cores we get **111 krps**:
|
|
517
581
|
|
|
518
|
-
```
|
|
519
|
-
node bin/loadtest.js http://localhost:80/ --tcp --cores 3
|
|
582
|
+
```console
|
|
583
|
+
$ node bin/loadtest.js http://localhost:80/ --tcp --cores 3
|
|
520
584
|
[...]
|
|
521
585
|
Effective rps: 110858
|
|
522
586
|
```
|
|
@@ -524,8 +588,8 @@ Effective rps: 110858
|
|
|
524
588
|
Without `--tcp` we get **49 krps**.
|
|
525
589
|
While `autocannon` with three workers reaches **80 krps**:
|
|
526
590
|
|
|
527
|
-
```
|
|
528
|
-
autocannon http://localhost:80/ -w 3
|
|
591
|
+
```console
|
|
592
|
+
$ autocannon http://localhost:80/ -w 3
|
|
529
593
|
[...]
|
|
530
594
|
┌───────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┐
|
|
531
595
|
│ Stat │ 1% │ 2.5% │ 50% │ 97.5% │ Avg │ Stdev │ Min │
|
|
@@ -540,8 +604,8 @@ Consistent with the numbers reached above against a test server with 3 cores.
|
|
|
540
604
|
|
|
541
605
|
`wrk` does not go much further with three threads than with one, at **122 krps**:
|
|
542
606
|
|
|
543
|
-
```
|
|
544
|
-
wrk http://localhost:80/ -t 3
|
|
607
|
+
```console
|
|
608
|
+
$ wrk http://localhost:80/ -t 3
|
|
545
609
|
[...]
|
|
546
610
|
Requests/sec: 121991.96
|
|
547
611
|
```
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "loadtest",
|
|
3
|
-
"version": "8.0.
|
|
3
|
+
"version": "8.0.1",
|
|
4
4
|
"type": "module",
|
|
5
5
|
"description": "Run load tests for your web application. Mostly ab-compatible interface, with an option to force requests per second. Includes an API for automated load testing.",
|
|
6
6
|
"homepage": "https://github.com/alexfernandez/loadtest",
|