loadtest 7.1.1 → 8.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,569 @@
1
+ # TCP Sockets Performance
2
+
3
+ To improve performance the author tried out using raw TCP sockets
4
+ using the [net module](https://nodejs.org/api/net.html),
5
+ instead of the [HTTP module](https://nodejs.org/api/http.html).
6
+ This is the story of how it went.
7
+
8
+ ## Rationale
9
+
10
+ Keep-alive (option `-k`) makes a huge difference in performance:
11
+ instead of opening a new socket for every request,
12
+ the same connection is reused,
13
+ so it is usually much faster.
14
+
15
+ We need to run the measurements with and without it
16
+ to see how each factor is affected.
17
+
18
+ ### Summary
19
+
20
+ The following tables summarize all comparisons.
21
+ Fastest option is shown **in bold**.
22
+ Results are shown with one core (or worker, or thread) and three cores for the load tester.
23
+ Detailed explanations follow.
24
+
25
+ First without keep-alive, one-core load tester against 3-core test server:
26
+
27
+ |package|krps|
28
+ |-------|----|
29
+ |loadtest|6|
30
+ |tcp barebones|10|
31
+ |loadtest tcp|9|
32
+ |ab|**20**|
33
+ |autocannon|8|
34
+
35
+ Now with keep-alive, also one-core load tester against 3-core test server:
36
+
37
+ |package|krps|
38
+ |-------|----|
39
+ |loadtest|21|
40
+ |tcp barebones|**80**|
41
+ |loadtest tcp|68|
42
+ |autocannon|57|
43
+ |wrk|73|
44
+
45
+ With keep-alive, 3-core load tester against 3-core test server:
46
+
47
+ |package|krps|
48
+ |-------|----|
49
+ |loadtest|54|
50
+ |loadtest tcp|115|
51
+ |autocannon|107|
52
+ |wrk|**118**|
53
+
54
+ With keep-alive, 1-core load tester against Nginx:
55
+
56
+ |package|krps|
57
+ |-------|----|
58
+ |loadtest|19|
59
+ |loadtest tcp|61|
60
+ |autocannon|40|
61
+ |wrk|**111**|
62
+
63
+ Finally with keep-alive, 3-core load tester against Nginx:
64
+
65
+ |package|krps|
66
+ |-------|----|
67
+ |loadtest|49|
68
+ |loadtest tcp|111|
69
+ |autocannon|80|
70
+ |wrk|**122**|
71
+
72
+ ## Implementations
73
+
74
+ All measurements against the test server using 3 cores (default):
75
+
76
+ ```
77
+ node bin/testserver.js
78
+ ```
79
+
80
+ Tests run on an Intel Core i5-12400T processor with 6 cores,
81
+ with Ubuntu 22.04.3 LTS (Xubuntu actually).
82
+ Performance numbers are shown in bold and as thousands of requests per second (krps):
83
+ **80 krps**.
84
+
85
+ ### Targets
86
+
87
+ We compare a few packages on the test machine.
88
+ Keep in mind that `ab` does not use keep-alive while `autocannon` does,
89
+ so they are not to be compared between them.
90
+
91
+ #### Apache ab
92
+
93
+ First target performance is against [Apache `ab`](https://httpd.apache.org/docs/2.4/programs/ab.html).
94
+
95
+ ```
96
+ ab -V
97
+ Version 2.3 <$Revision: 1879490 $>
98
+ ```
99
+
100
+ With 10 concurrent connections without keep-alive.
101
+
102
+ ```
103
+ ab -t 10 -c 10 http://localhost:7357/
104
+ [...]
105
+ Requests per second: 20395.83 [#/sec] (mean)
106
+ ```
107
+
108
+ Results are around **20 krps**.
109
+ Keep-alive cannot be used with `ab` as far as the author knows.
110
+
111
+ #### Autocannon
112
+
113
+ The [autocannon](https://www.npmjs.com/package/autocannon) package uses by default
114
+ 10 concurrent connections with keep-alive enabled:
115
+
116
+ ```
117
+ autocannon --version
118
+ autocannon v7.12.0
119
+ node v18.17.1
120
+ ```
121
+
122
+ ```
123
+ autocannon http://localhost:7357/
124
+ [...]
125
+ ┌───────────┬─────────┬─────────┬─────────┬─────────┬──────────┬─────────┬─────────┐
126
+ │ Stat │ 1% │ 2.5% │ 50% │ 97.5% │ Avg │ Stdev │ Min │
127
+ ├───────────┼─────────┼─────────┼─────────┼─────────┼──────────┼─────────┼─────────┤
128
+ │ Req/Sec │ 51295 │ 51295 │ 57343 │ 59103 │ 56798.55 │ 2226.35 │ 51285 │
129
+ ├───────────┼─────────┼─────────┼─────────┼─────────┼──────────┼─────────┼─────────┤
130
+ │ Bytes/Sec │ 6.36 MB │ 6.36 MB │ 7.11 MB │ 7.33 MB │ 7.04 MB │ 276 kB │ 6.36 MB │
131
+ └───────────┴─────────┴─────────┴─────────┴─────────┴──────────┴─────────┴─────────┘
132
+ ```
133
+
134
+ We will look at the median rate (reported as 50%),
135
+ so results are around **57 krps**.
136
+ Keep-alive cannot be disabled with an option,
137
+ but it can be changed directly in the code by setting the header `Connection: close`.
138
+ Performance is near **8 krps**:
139
+
140
+ ```
141
+ npx autocannon http://localhost:7357/
142
+ [...]
143
+ ┌───────────┬────────┬────────┬────────┬────────┬────────┬─────────┬────────┐
144
+ │ Stat │ 1% │ 2.5% │ 50% │ 97.5% │ Avg │ Stdev │ Min │
145
+ ├───────────┼────────┼────────┼────────┼────────┼────────┼─────────┼────────┤
146
+ │ Req/Sec │ 5831 │ 5831 │ 7703 │ 8735 │ 7674.4 │ 753.53 │ 5828 │
147
+ ├───────────┼────────┼────────┼────────┼────────┼────────┼─────────┼────────┤
148
+ │ Bytes/Sec │ 560 kB │ 560 kB │ 739 kB │ 839 kB │ 737 kB │ 72.4 kB │ 559 kB │
149
+ └───────────┴────────┴────────┴────────┴────────┴────────┴─────────┴────────┘
150
+ ```
151
+
152
+ #### `wrk`
153
+
154
+ To complete the set we try `wrk`:
155
+
156
+ ```
157
+ wrk -v
158
+ wrk debian/4.1.0-3build1 [epoll]
159
+ ```
160
+
161
+ With a single thread (core) for fair comparison we get almost **73 krps**:
162
+
163
+ ```
164
+ wrk http://localhost:7357/ -t 1
165
+ [...]
166
+ Requests/sec: 72639.52
167
+ ```
168
+
169
+ ### Baseline
170
+
171
+ The baseline is the existing `http` implementation in `loadtest` 7.1.1,
172
+ running on one core.
173
+
174
+ Without keep-alive close to **6 krps**:
175
+
176
+ ```
177
+ node bin/loadtest.js http://localhost:7357 --cores 1
178
+ [...]
179
+ Effective rps: 6342
180
+ ```
181
+
182
+ Very far away from the 20 krps given by `ab`.
183
+ With keep-alive:
184
+
185
+ ```
186
+ node bin/loadtest.js http://localhost:7357 --cores 1 -k
187
+ [...]
188
+ Effective rps: 20490
189
+ ```
190
+
191
+ We are around **20 krps**.
192
+ Again quite far from the 57 krps by `autocannon`;
193
+ close to `ab` but it doesn't use keep-alive so the comparison is meaningless.
194
+
195
+ ### Proof of Concept
196
+
197
+ For the first implementation we want to learn if the bare sockets implementation is worth the time.
198
+ In this naïve implementation we open the socket,
199
+ send a short canned request without taking into account any parameters or headers:
200
+
201
+ ```
202
+ this.params.request = `${this.params.method} ${this.params.path} HTTP/1.1\r\n\r\n`
203
+ ```
204
+
205
+ We don't parse the result either,
206
+ just assume that it is received as one packet
207
+ and disregard it.
208
+ The results are almost **80 krps**:
209
+
210
+ ```
211
+ node bin/loadtest.js http://localhost:7357 --cores 1 --tcp
212
+ [...]
213
+ Effective rps: 79997
214
+ ```
215
+
216
+ Very promising start!
217
+ Obviously this only works properly with GET requests without any body,
218
+ so it is only useful as a benchmark:
219
+ we want to make sure we don't lose too much performance when adding all the functionality.
220
+
221
+ We can also do a barebones implementation without keep-alive,
222
+ creating a new socket for every request.
223
+ The result is around **10 krps**,
224
+ still far from Apache `ab`.
225
+ But here there is not much we can do:
226
+ apparently writing sockets in C is more efficient than in Node.js,
227
+ or perhaps `ab` has some tricks up its sleeve,
228
+ probably some low level optimizations.
229
+ In the Node.js code there is not much fat we can trim.
230
+
231
+ So from now on we will focus on the keep-alive tests.
232
+
233
+ ### Adding Headers
234
+
235
+ First we add the proper headers in the request.
236
+ This means we are sending out more data for each round,
237
+ but performance doesn't seem to be altered much,
238
+ still around **80 krps**.
239
+
240
+ The request we are now sending is:
241
+
242
+ ```
243
+ GET / HTTP/1.1
244
+ host: localhost:7357
245
+ accept: */*
246
+ user-agent: loadtest/7.1.0
247
+ Connection: keep-alive
248
+
249
+ ```
250
+
251
+ One interesting bit is that sending the header `connection: keep-alive`
252
+ does not affect performance;
253
+ however, sending `connection: close` breaks performance to 8 requests per second.
254
+ Probably there are huge inefficiencies in the way sockets are created.
255
+ This should be investigated in depth at some point,
256
+ if we want to have a test without keep-alive at some point.
257
+
258
+ ### Parsing Responses
259
+
260
+ Now we come to the really critical part:
261
+ parsing the response including the content.
262
+
263
+ A very simple implementation just parses the response as a string,
264
+ reads the first line and extracts the status code.
265
+ Performance is now down to around **68 krps**.
266
+ Note that we are still assuming that each response is a single packet.
267
+ A sample response from the test server included with `loadtest`
268
+ can look like this:
269
+
270
+ ```
271
+ HTTP/1.1 200 OK
272
+ Date: Fri, 08 Sep 2023 11:04:21 GMT
273
+ Connection: keep-alive
274
+ Keep-Alive: timeout=5
275
+ Content-Length: 2
276
+
277
+ OK
278
+ ```
279
+
280
+ We can see a very simple HTTP response that fits in one packet.
281
+
282
+ ### Parsing All Headers
283
+
284
+ It is possible that a response comes in multiple packets,
285
+ so we need to keep some state between packets.
286
+ This is the next step:
287
+ we should make sure that we have received the whole body and not just part of it.
288
+ The way to do this is to read the `content-length` header,
289
+ and then check that the body that we have has this length;
290
+ only then can we be 100% sure that we have the whole body.
291
+
292
+ Therefore we need to parse all incoming headers,
293
+ find the content length (in the header `content-length`),
294
+ and then parse the rest of the packet to check that we have the whole body.
295
+ Again, a very simple implementation that parses content length and checks against body length
296
+ goes down to **63 krps**.
297
+
298
+ If the body is not complete we need to keep the partial body,
299
+ and add the rest as it comes until the required `content-length`.
300
+ Keep in mind that even headers can be so long that they come in several packets!
301
+ In this case even more state needs to be stored between packets.
302
+
303
+ With decent packet parsing,
304
+ including multi-packet headers and bodies,
305
+ performance goes down to **60 krps**.
306
+ Most of the time is spent parsing headers,
307
+ since the body only needs to be checked for length,
308
+ not parsed.
309
+
310
+ ### Considering Duplicates
311
+
312
+ Given that answers tend to be identical in a load test,
313
+ perhaps changing a date or a serial number,
314
+ we can apply a trick:
315
+ when receiving a packet check if it's similar enough to one received before
316
+ so we can skip parsing the headers altogether.
317
+
318
+ The algorithm checks the following conditions:
319
+
320
+ - Length of the received packet is less than 1000 bytes.
321
+ - Length of the packet is identical to one received before.
322
+ - Length of headers and body are also identical.
323
+ - Same status as before.
324
+
325
+ If all of them apply then the headers in the message are not parsed:
326
+ we estimate that the packet is complete and we don't need to check for content length.
327
+ Keep in mind that we _might_ be wrong:
328
+ we might have received a packet with just part of a response
329
+ that happens to have the same length, status and header length as a previous complete response,
330
+ and which is also below 1000 bytes.
331
+ This is however extremely unlikely.
332
+
333
+ Using this trick we go back to **67 krps**.
334
+
335
+ Packets of different lengths are stored for comparison,
336
+ which can cause memory issues when size varies constantly.
337
+
338
+ ### Multiprocess, Multi-core
339
+
340
+ Now we can go back to using multiple cores:
341
+
342
+ ```
343
+ node bin/loadtest.js http://localhost:7357 --cores 3 --tcp
344
+ [...]
345
+ Effective rps: 115379
346
+ ```
347
+
348
+ In this case half the available cores,
349
+ leaving the rest for the test server.
350
+ Now we go up to **115 krps**!
351
+
352
+ What about regular `http` connections without the `--tcp` option?
353
+ It stays at **54 krps**:
354
+
355
+ ```
356
+ node bin/loadtest.js http://localhost:7357/ -k --cores 3
357
+ [...]
358
+ Effective rps: 54432
359
+ ```
360
+
361
+ For comparison we try using `autocannon` also with three workers:
362
+
363
+ ```
364
+ autocannon http://localhost:7357/ -w 3 -c 30
365
+ [...]
366
+ ┌───────────┬───────┬───────┬─────────┬─────────┬──────────┬─────────┬───────┐
367
+ │ Stat │ 1% │ 2.5% │ 50% │ 97.5% │ Avg │ Stdev │ Min │
368
+ ├───────────┼───────┼───────┼─────────┼─────────┼──────────┼─────────┼───────┤
369
+ │ Req/Sec │ 88511 │ 88511 │ 107071 │ 110079 │ 105132.8 │ 6148.39 │ 88460 │
370
+ ├───────────┼───────┼───────┼─────────┼─────────┼──────────┼─────────┼───────┤
371
+ │ Bytes/Sec │ 11 MB │ 11 MB │ 13.3 MB │ 13.6 MB │ 13 MB │ 764 kB │ 11 MB │
372
+ └───────────┴───────┴───────┴─────────┴─────────┴──────────┴─────────┴───────┘
373
+ ```
374
+
375
+ Median rate (50% percentile) is **107 krps**.
376
+ Now `wrk` which yields **118 krps**:
377
+
378
+ ```
379
+ wrk http://localhost:7357/ -t 3
380
+ [...]
381
+ Requests/sec: 118164.03
382
+ ```
383
+
384
+ So `loadtest` has managed to be slightly above `autocannon` using multiple tricks,
385
+ but below `wrk`.
386
+
387
+ ### Pool of Clients
388
+
389
+ We are not done yet.
390
+ As it happens the new code is not very precise with connections and clients:
391
+ in particular it doesn't play nice with our `--rps` feature,
392
+ which is used to send an exact number of requests per second.
393
+ We need to do a complete refactoring to have a pool of clients,
394
+ take them to fulfill a request and them free them back to the pool.
395
+
396
+ After the refactoring we get some bad news:
397
+ performance has dropped down back to **60 krps**!
398
+
399
+ ```
400
+ node bin/loadtest.js http://localhost:7357/ --tcp --cores 1
401
+ [...]
402
+ Effective rps: 60331
403
+ ```
404
+
405
+ We need to do the painstaking exercise of getting back to our target performance.
406
+
407
+ After some optimizing and a lot of bug fixing we are back to **68 krps**:
408
+
409
+ ```
410
+ node bin/loadtest.js http://localhost:7357/ --tcp --cores 1
411
+ [...]
412
+ Effective rps: 68466
413
+ ```
414
+
415
+ With classic `loadtest` without the `--tcp` option, we still get **21 krps**:
416
+
417
+ ```
418
+ node bin/loadtest.js http://localhost:7357/ -k --cores 1
419
+ [...]
420
+ Effective rps: 21446
421
+ ```
422
+
423
+ Marginally better than before.
424
+ By the way, it would be a good idea to try again without keep-alive.
425
+ There is currently no option to disable keep-alive,
426
+ but it can be done by hacking the header as
427
+ `Keep-alive: close`.
428
+ We get a bit less performance than the barebones implementation,
429
+ almost **9 krps**:
430
+
431
+ ```
432
+ node bin/loadtest.js http://localhost:7357/ --tcp --cores 1
433
+ [...]
434
+ Effective rps: 8682
435
+ ```
436
+
437
+ ### Reproducible Script
438
+
439
+ The current setup is a bit cumbersome: start the server,
440
+ then start the load test with the right parameters.
441
+ We need to have a reproducible way of getting performance measurements.
442
+ So we introduce the script `bin/tcp-performance.js`,
443
+ that starts a test server and then runs a load test with the parameters we have been using.
444
+ Unfortunately the test server only uses one core (being run in API mode),
445
+ and maxes out quickly at **27 krps**.
446
+
447
+ ```
448
+ node bin/tcp-performance.js
449
+ [...]
450
+ Effective rps: 27350
451
+ ```
452
+
453
+ The author has carried out multiple attempts at getting a multi-core test server running:
454
+ use the cluster module,
455
+ run as a multi-core process,
456
+ run it as a script using
457
+ [child_process.exec()](https://nodejs.org/api/child_process.html#child_processexeccommand-options-callback)...
458
+ They all add too much complexity.
459
+ So we can use the single-core measurements as a benchmark,
460
+ even if they are not representative of full operation.
461
+
462
+ By the way, `autocannon` does a bit better in this scenario (single-core test server),
463
+ as it reaches **43 krps**.
464
+ How does it do this magic?
465
+ One part of the puzzle can be that it sends less headers,
466
+ without `user-agent` or `accepts`.
467
+ So we can do a quick trial of removing these headers in `loadtest`:
468
+
469
+ ```
470
+ node bin/loadtest.js http://localhost:7357/ --tcp --cores 1
471
+ [...]
472
+ Effective rps: 29694
473
+ ```
474
+
475
+ Performance is improved a bit but not much, to almost **30 krps**.
476
+ How `autocannon` does this wizardry is not evident.
477
+
478
+ ### Face-off with Nginx
479
+
480
+ Our last test is to run `loadtest` against a local Nginx server,
481
+ which is sure not to max out with only one core:
482
+ it goes to **61 krps**.
483
+
484
+ ```
485
+ node bin/loadtest.js http://localhost:80/ --tcp --cores 1
486
+ [...]
487
+ Effective rps: 61059
488
+ ```
489
+
490
+ While without `--tcp` we only get **19 krps**.
491
+ A similar test with `autocannon` yields only **40 krps**:
492
+
493
+ ```
494
+ autocannon http://localhost:80/
495
+ [...]
496
+ ┌───────────┬─────────┬─────────┬───────┬─────────┬─────────┬─────────┬─────────┐
497
+ │ Stat │ 1% │ 2.5% │ 50% │ 97.5% │ Avg │ Stdev │ Min │
498
+ ├───────────┼─────────┼─────────┼───────┼─────────┼─────────┼─────────┼─────────┤
499
+ │ Req/Sec │ 34591 │ 34591 │ 40735 │ 43679 │ 40400 │ 2664.56 │ 34590 │
500
+ ├───────────┼─────────┼─────────┼───────┼─────────┼─────────┼─────────┼─────────┤
501
+ │ Bytes/Sec │ 29.7 MB │ 29.7 MB │ 35 MB │ 37.5 MB │ 34.7 MB │ 2.29 MB │ 29.7 MB │
502
+ └───────────┴─────────┴─────────┴───────┴─────────┴─────────┴─────────┴─────────┘
503
+ ```
504
+
505
+ Now it's not evident either how it reaches less performance against an Nginx
506
+ than against our Node.js test server,
507
+ but the numbers are quite consistent.
508
+ While `wrk` takes the crown again with **111 krps**:
509
+
510
+ ```
511
+ wrk http://localhost:80/ -t 1
512
+ [...]
513
+ Requests/sec: 111176.14
514
+ ```
515
+
516
+ Running again `loadtest` with three cores we get **111 krps**:
517
+
518
+ ```
519
+ node bin/loadtest.js http://localhost:80/ --tcp --cores 3
520
+ [...]
521
+ Effective rps: 110858
522
+ ```
523
+
524
+ Without `--tcp` we get **49 krps**.
525
+ While `autocannon` with three workers reaches **80 krps**:
526
+
527
+ ```
528
+ autocannon http://localhost:80/ -w 3
529
+ [...]
530
+ ┌───────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┐
531
+ │ Stat │ 1% │ 2.5% │ 50% │ 97.5% │ Avg │ Stdev │ Min │
532
+ ├───────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
533
+ │ Req/Sec │ 65727 │ 65727 │ 80191 │ 84223 │ 78668.8 │ 5071.38 │ 65676 │
534
+ ├───────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
535
+ │ Bytes/Sec │ 56.4 MB │ 56.4 MB │ 68.9 MB │ 72.4 MB │ 67.6 MB │ 4.36 MB │ 56.4 MB │
536
+ └───────────┴─────────┴─────────┴─────────┴─────────┴─────────┴─────────┴─────────┘
537
+ ```
538
+
539
+ Consistent with the numbers reached above against a test server with 3 cores.
540
+
541
+ `wrk` does not go much further with three threads than with one, at **122 krps**:
542
+
543
+ ```
544
+ wrk http://localhost:80/ -t 3
545
+ [...]
546
+ Requests/sec: 121991.96
547
+ ```
548
+
549
+ ## Conclusions
550
+
551
+ It is good to know that `loadtest` can hold its own against such beasts like `ab`,
552
+ `autocannon` or `wrk`.
553
+ `ab` and `wrk` are written in C,
554
+ while `autocannon` is maintained by Matteo Collina who is one of the leading Node.js performance gurus.
555
+
556
+ There are some unexplained effects,
557
+ like why does `autocannon` perform so poorly against Nginx.
558
+ It would be really interesting to understand it.
559
+
560
+ Now with TCP sockets and keep-alive you can use `loadtest`
561
+ to go beyond the paltry 6 to 20 krps that we used to get:
562
+ especially with multiple cores you can reach 100 krps locally.
563
+ If you need performance that goes beyond that,
564
+ you can try some of the other options used here.
565
+
566
+ Note that there are many options not yet implemented for TCP sockets,
567
+ like secure connections with HTTPS.
568
+ They will come in the next releases.
569
+
package/lib/baseClient.js CHANGED
@@ -3,9 +3,9 @@ import {addUserAgent} from './headers.js'
3
3
 
4
4
 
5
5
  export class BaseClient {
6
- constructor(loadTest, options) {
6
+ constructor(loadTest) {
7
7
  this.loadTest = loadTest;
8
- this.options = options;
8
+ this.options = loadTest.options;
9
9
  this.generateMessage = undefined;
10
10
  }
11
11
 
@@ -23,11 +23,7 @@ export class BaseClient {
23
23
  }
24
24
  }
25
25
  this.loadTest.latency.end(id, errorCode);
26
- let callback;
27
- if (!this.options.requestsPerSecond) {
28
- callback = () => this.makeRequest();
29
- }
30
- this.loadTest.finishRequest(error, result, callback);
26
+ this.loadTest.pool.finishRequest(this, result, error);
31
27
  };
32
28
  }
33
29
 
package/lib/cluster.js CHANGED
@@ -17,7 +17,7 @@ export async function runTask(cores, task) {
17
17
  if (cluster.isPrimary) {
18
18
  return await runWorkers(cores)
19
19
  } else {
20
- const result = await task(cluster.worker.id)
20
+ const result = await task(cluster.worker.id) || '0'
21
21
  process.send(result)
22
22
  }
23
23
  }