loadtest 8.0.0 → 8.0.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,7 +1,10 @@
1
- # TCP Sockets Performance
1
+ # Load Testing with TCP Sockets
2
2
 
3
- To improve performance the author tried out using raw TCP sockets
4
- using the [net module](https://nodejs.org/api/net.html),
3
+ The `loadtest` module has impressive performance,
4
+ and it has got better during the years as the Node.js core improves.
5
+ Heavily inspired by `autocannon`,
6
+ the author has tried out using raw TCP sockets to improve performance.
7
+ They use the [net module](https://nodejs.org/api/net.html)
5
8
  instead of the [HTTP module](https://nodejs.org/api/http.html).
6
9
  This is the story of how it went.
7
10
 
@@ -71,12 +74,15 @@ Finally with keep-alive, 3-core load tester against Nginx:
71
74
 
72
75
  ## Implementations
73
76
 
74
- All measurements against the test server using 3 cores (default):
77
+ All measurements against the test server using 3 cores
78
+ (the default configuration for our six-core machine),
79
+ unless specified otherwise:
75
80
 
76
- ```
77
- node bin/testserver.js
81
+ ```console
82
+ $ node bin/testserver.js
78
83
  ```
79
84
 
85
+ Note that the first `$` is the console prompt.
80
86
  Tests run on an Intel Core i5-12400T processor with 6 cores,
81
87
  with Ubuntu 22.04.3 LTS (Xubuntu actually).
82
88
  Performance numbers are shown in bold and as thousands of requests per second (krps):
@@ -92,15 +98,15 @@ so they are not to be compared between them.
92
98
 
93
99
  First target performance is against [Apache `ab`](https://httpd.apache.org/docs/2.4/programs/ab.html).
94
100
 
95
- ```
96
- ab -V
101
+ ```console
102
+ $ ab -V
97
103
  Version 2.3 <$Revision: 1879490 $>
98
104
  ```
99
105
 
100
106
  With 10 concurrent connections without keep-alive.
101
107
 
102
- ```
103
- ab -t 10 -c 10 http://localhost:7357/
108
+ ```console
109
+ $ ab -t 10 -c 10 http://localhost:7357/
104
110
  [...]
105
111
  Requests per second: 20395.83 [#/sec] (mean)
106
112
  ```
@@ -110,17 +116,18 @@ Keep-alive cannot be used with `ab` as far as the author knows.
110
116
 
111
117
  #### Autocannon
112
118
 
113
- The [autocannon](https://www.npmjs.com/package/autocannon) package uses by default
114
- 10 concurrent connections with keep-alive enabled:
119
+ Next we will try out [`autocannon`](https://www.npmjs.com/package/autocannon),
120
+ the package that actually inspired this approach.
121
+ `autocannon` uses by default 10 concurrent connections with keep-alive enabled:
115
122
 
116
- ```
117
- autocannon --version
123
+ ```console
124
+ $ autocannon --version
118
125
  autocannon v7.12.0
119
126
  node v18.17.1
120
127
  ```
121
128
 
122
- ```
123
- autocannon http://localhost:7357/
129
+ ```console
130
+ $ autocannon http://localhost:7357/
124
131
  [...]
125
132
  ┌───────────┬─────────┬─────────┬─────────┬─────────┬──────────┬─────────┬─────────┐
126
133
  │ Stat │ 1% │ 2.5% │ 50% │ 97.5% │ Avg │ Stdev │ Min │
@@ -137,8 +144,8 @@ Keep-alive cannot be disabled with an option,
137
144
  but it can be changed directly in the code by setting the header `Connection: close`.
138
145
  Performance is near **8 krps**:
139
146
 
140
- ```
141
- npx autocannon http://localhost:7357/
147
+ ```console
148
+ $ npx autocannon http://localhost:7357/
142
149
  [...]
143
150
  ┌───────────┬────────┬────────┬────────┬────────┬────────┬─────────┬────────┐
144
151
  │ Stat │ 1% │ 2.5% │ 50% │ 97.5% │ Avg │ Stdev │ Min │
@@ -153,15 +160,15 @@ npx autocannon http://localhost:7357/
153
160
 
154
161
  To complete the set we try `wrk`:
155
162
 
156
- ```
157
- wrk -v
163
+ ```console
164
+ $ wrk -v
158
165
  wrk debian/4.1.0-3build1 [epoll]
159
166
  ```
160
167
 
161
168
  With a single thread (core) for fair comparison we get almost **73 krps**:
162
169
 
163
- ```
164
- wrk http://localhost:7357/ -t 1
170
+ ```console
171
+ $ wrk http://localhost:7357/ -t 1
165
172
  [...]
166
173
  Requests/sec: 72639.52
167
174
  ```
@@ -173,8 +180,8 @@ running on one core.
173
180
 
174
181
  Without keep-alive close to **6 krps**:
175
182
 
176
- ```
177
- node bin/loadtest.js http://localhost:7357 --cores 1
183
+ ```console
184
+ $ node bin/loadtest.js http://localhost:7357 --cores 1
178
185
  [...]
179
186
  Effective rps: 6342
180
187
  ```
@@ -182,8 +189,8 @@ Effective rps: 6342
182
189
  Very far away from the 20 krps given by `ab`.
183
190
  With keep-alive:
184
191
 
185
- ```
186
- node bin/loadtest.js http://localhost:7357 --cores 1 -k
192
+ ```console
193
+ $ node bin/loadtest.js http://localhost:7357 --cores 1 -k
187
194
  [...]
188
195
  Effective rps: 20490
189
196
  ```
@@ -192,13 +199,13 @@ We are around **20 krps**.
192
199
  Again quite far from the 57 krps by `autocannon`;
193
200
  close to `ab` but it doesn't use keep-alive so the comparison is meaningless.
194
201
 
195
- ### Proof of Concept
202
+ ### Proof of Concept: Barebones
196
203
 
197
204
  For the first implementation we want to learn if the bare sockets implementation is worth the time.
198
- In this naïve implementation we open the socket,
205
+ In this naïve barebones implementation we open the socket,
199
206
  send a short canned request without taking into account any parameters or headers:
200
207
 
201
- ```
208
+ ```js
202
209
  this.params.request = `${this.params.method} ${this.params.path} HTTP/1.1\r\n\r\n`
203
210
  ```
204
211
 
@@ -207,8 +214,8 @@ just assume that it is received as one packet
207
214
  and disregard it.
208
215
  The results are almost **80 krps**:
209
216
 
210
- ```
211
- node bin/loadtest.js http://localhost:7357 --cores 1 --tcp
217
+ ```console
218
+ $ node bin/loadtest.js http://localhost:7357 --cores 1 --tcp
212
219
  [...]
213
220
  Effective rps: 79997
214
221
  ```
@@ -339,8 +346,8 @@ which can cause memory issues when size varies constantly.
339
346
 
340
347
  Now we can go back to using multiple cores:
341
348
 
342
- ```
343
- node bin/loadtest.js http://localhost:7357 --cores 3 --tcp
349
+ ```console
350
+ $ node bin/loadtest.js http://localhost:7357 --cores 3 --tcp
344
351
  [...]
345
352
  Effective rps: 115379
346
353
  ```
@@ -352,16 +359,16 @@ Now we go up to **115 krps**!
352
359
  What about regular `http` connections without the `--tcp` option?
353
360
  It stays at **54 krps**:
354
361
 
355
- ```
356
- node bin/loadtest.js http://localhost:7357/ -k --cores 3
362
+ ```console
363
+ $ node bin/loadtest.js http://localhost:7357/ -k --cores 3
357
364
  [...]
358
365
  Effective rps: 54432
359
366
  ```
360
367
 
361
368
  For comparison we try using `autocannon` also with three workers:
362
369
 
363
- ```
364
- autocannon http://localhost:7357/ -w 3 -c 30
370
+ ```console
371
+ $ autocannon http://localhost:7357/ -w 3 -c 30
365
372
  [...]
366
373
  ┌───────────┬───────┬───────┬─────────┬─────────┬──────────┬─────────┬───────┐
367
374
  │ Stat │ 1% │ 2.5% │ 50% │ 97.5% │ Avg │ Stdev │ Min │
@@ -375,8 +382,8 @@ autocannon http://localhost:7357/ -w 3 -c 30
375
382
  Median rate (50% percentile) is **107 krps**.
376
383
  Now `wrk` which yields **118 krps**:
377
384
 
378
- ```
379
- wrk http://localhost:7357/ -t 3
385
+ ```console
386
+ $ wrk http://localhost:7357/ -t 3
380
387
  [...]
381
388
  Requests/sec: 118164.03
382
389
  ```
@@ -396,26 +403,87 @@ take them to fulfill a request and them free them back to the pool.
396
403
  After the refactoring we get some bad news:
397
404
  performance has dropped down back to **60 krps**!
398
405
 
399
- ```
400
- node bin/loadtest.js http://localhost:7357/ --tcp --cores 1
406
+ ```console
407
+ $ node bin/loadtest.js http://localhost:7357/ --tcp --cores 1
401
408
  [...]
402
409
  Effective rps: 60331
403
410
  ```
404
411
 
405
412
  We need to do the painstaking exercise of getting back to our target performance.
406
413
 
414
+ ### Profiling and Micro-profiling
415
+
416
+ We need to see where our microseconds (µs) are being spent.
417
+ Every microsecond counts: between 67 krps (15 µs per request) to 60 krps (16.7 µs per request)
418
+ the difference is... less than two microseconds.
419
+
420
+ We use the [`microprofiler`](https://github.com/alexfernandez/microprofiler) package,
421
+ which allows us to instrument the code that is sending and receiving requests.
422
+ For instance the function `makeRequest()` in `lib/tcpClient.js` which is sending out the request:
423
+
424
+ ```js
425
+ import microprofiler from 'microprofiler'
426
+
427
+ [...]
428
+ makeRequest() {
429
+ if (!this.running) {
430
+ return
431
+ }
432
+ // first block: connect
433
+ const start1 = microprofiler.start()
434
+ this.connect()
435
+ microprofiler.measureFrom(start1, 'connect', 100000)
436
+ // second block: create parser
437
+ const start2 = microprofiler.start()
438
+ this.parser = new Parser(this.params.method)
439
+ microprofiler.measureFrom(start2, 'create parser', 100000)
440
+ // third block: start measuring latency
441
+ const start3 = microprofiler.start()
442
+ const id = this.latency.begin();
443
+ this.currentId = id
444
+ microprofiler.measureFrom(start3, 'latency begin', 100000)
445
+ // fourth block: write to socket
446
+ const start4 = microprofiler.start()
447
+ this.connection.write(this.params.request)
448
+ microprofiler.measureFrom(start4, 'write', 100000)
449
+ }
450
+ ```
451
+
452
+ Each of the four calls are instrumented.
453
+ When this code runs the output has a lot of lines like this:
454
+
455
+ ```console
456
+ $ node bin/loadtest.js http://localhost:7357/ --tcp --cores 1
457
+ [...]
458
+ Profiling connect: 100000 requests, mean time: 1.144 µs, rps: 6948026
459
+ Profiling create parser: 100000 requests, mean time: 0.152 µs, rps: 6582446
460
+ Profiling latency begin: 100000 requests, mean time: 1.138 µs, rps: 878664
461
+ Profiling write: 100000 requests, mean time: 5.669 µs, rps: 176409
462
+ ```
463
+
464
+ Note that the results oscillate something like 0.3 µs from time to time,
465
+ so don't pay attention to very small differences.
466
+ Mean time is the interesting part: from 0.152 to create the parser µs to 5.669 µs for the write.
467
+ There is not a lot that we can do with the `connection.write()` call,
468
+ since it's directly speaking with the Node.js core;
469
+ we can try reducing the message size (not sending all headers)
470
+ but it doesn't seem to do much.
471
+ So we center on the `this.connect()` call,
472
+ which we can reduce to less than a µs.
473
+ Then we repeat again on the `finishRequest()` call to see if we can squeeze another microsecond there.
474
+
407
475
  After some optimizing and a lot of bug fixing we are back to **68 krps**:
408
476
 
409
- ```
410
- node bin/loadtest.js http://localhost:7357/ --tcp --cores 1
477
+ ```console
478
+ $ node bin/loadtest.js http://localhost:7357/ --tcp --cores 1
411
479
  [...]
412
480
  Effective rps: 68466
413
481
  ```
414
482
 
415
483
  With classic `loadtest` without the `--tcp` option, we still get **21 krps**:
416
484
 
417
- ```
418
- node bin/loadtest.js http://localhost:7357/ -k --cores 1
485
+ ```console
486
+ $ node bin/loadtest.js http://localhost:7357/ -k --cores 1
419
487
  [...]
420
488
  Effective rps: 21446
421
489
  ```
@@ -428,8 +496,8 @@ but it can be done by hacking the header as
428
496
  We get a bit less performance than the barebones implementation,
429
497
  almost **9 krps**:
430
498
 
431
- ```
432
- node bin/loadtest.js http://localhost:7357/ --tcp --cores 1
499
+ ```console
500
+ $ node bin/loadtest.js http://localhost:7357/ --tcp --cores 1
433
501
  [...]
434
502
  Effective rps: 8682
435
503
  ```
@@ -444,8 +512,8 @@ that starts a test server and then runs a load test with the parameters we have
444
512
  Unfortunately the test server only uses one core (being run in API mode),
445
513
  and maxes out quickly at **27 krps**.
446
514
 
447
- ```
448
- node bin/tcp-performance.js
515
+ ```console
516
+ $ node bin/tcp-performance.js
449
517
  [...]
450
518
  Effective rps: 27350
451
519
  ```
@@ -466,8 +534,8 @@ One part of the puzzle can be that it sends less headers,
466
534
  without `user-agent` or `accepts`.
467
535
  So we can do a quick trial of removing these headers in `loadtest`:
468
536
 
469
- ```
470
- node bin/loadtest.js http://localhost:7357/ --tcp --cores 1
537
+ ```console
538
+ $ node bin/loadtest.js http://localhost:7357/ --tcp --cores 1
471
539
  [...]
472
540
  Effective rps: 29694
473
541
  ```
@@ -481,8 +549,8 @@ Our last test is to run `loadtest` against a local Nginx server,
481
549
  which is sure not to max out with only one core:
482
550
  it goes to **61 krps**.
483
551
 
484
- ```
485
- node bin/loadtest.js http://localhost:80/ --tcp --cores 1
552
+ ```console
553
+ $ node bin/loadtest.js http://localhost:80/ --tcp --cores 1
486
554
  [...]
487
555
  Effective rps: 61059
488
556
  ```
@@ -490,8 +558,8 @@ Effective rps: 61059
490
558
  While without `--tcp` we only get **19 krps**.
491
559
  A similar test with `autocannon` yields only **40 krps**:
492
560
 
493
- ```
494
- autocannon http://localhost:80/
561
+ ```console
562
+ $ autocannon http://localhost:80/
495
563
  [...]
496
564
  ┌───────────┬─────────┬─────────┬───────┬─────────┬─────────┬─────────┬─────────┐
497
565
  │ Stat │ 1% │ 2.5% │ 50% │ 97.5% │ Avg │ Stdev │ Min │
@@ -507,16 +575,16 @@ than against our Node.js test server,
507
575
  but the numbers are quite consistent.
508
576
  While `wrk` takes the crown again with **111 krps**:
509
577
 
510
- ```
511
- wrk http://localhost:80/ -t 1
578
+ ```console
579
+ $ wrk http://localhost:80/ -t 1
512
580
  [...]
513
581
  Requests/sec: 111176.14
514
582
  ```
515
583
 
516
584
  Running again `loadtest` with three cores we get **111 krps**:
517
585
 
518
- ```
519
- node bin/loadtest.js http://localhost:80/ --tcp --cores 3
586
+ ```console
587
+ $ node bin/loadtest.js http://localhost:80/ --tcp --cores 3
520
588
  [...]
521
589
  Effective rps: 110858
522
590
  ```
@@ -524,8 +592,8 @@ Effective rps: 110858
524
592
  Without `--tcp` we get **49 krps**.
525
593
  While `autocannon` with three workers reaches **80 krps**:
526
594
 
527
- ```
528
- autocannon http://localhost:80/ -w 3
595
+ ```console
596
+ $ autocannon http://localhost:80/ -w 3
529
597
  [...]
530
598
  ┌───────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┐
531
599
  │ Stat │ 1% │ 2.5% │ 50% │ 97.5% │ Avg │ Stdev │ Min │
@@ -540,8 +608,8 @@ Consistent with the numbers reached above against a test server with 3 cores.
540
608
 
541
609
  `wrk` does not go much further with three threads than with one, at **122 krps**:
542
610
 
543
- ```
544
- wrk http://localhost:80/ -t 3
611
+ ```console
612
+ $ wrk http://localhost:80/ -t 3
545
613
  [...]
546
614
  Requests/sec: 121991.96
547
615
  ```
package/lib/parser.js CHANGED
@@ -1,4 +1,3 @@
1
- import microprofiler from 'microprofiler'
2
1
 
3
2
  const bodySeparator = '\r\n\r\n'
4
3
  const lineSeparator = '\r\n'
@@ -36,13 +35,10 @@ export class Parser {
36
35
  // cannot parse yet
37
36
  return
38
37
  }
39
- //const start1 = microprofiler.start()
40
38
  this.packetInfo = new PacketInfo(this.pending.length, division)
41
39
  const messageHeader = this.pending.subarray(0, division)
42
40
  const messageBody = this.pending.subarray(division + 4)
43
41
  this.parseFirstLine(messageHeader)
44
- //microprofiler.measureFrom(start1, 'parse first line', 100000)
45
- //const start2 = microprofiler.start()
46
42
  const key = this.packetInfo.getKey()
47
43
  const existing = packetInfos.get(key)
48
44
  if (existing && this.packetInfo.isDuplicated(existing)) {
@@ -52,10 +48,7 @@ export class Parser {
52
48
  }
53
49
  packetInfos.set(key, this.packetInfo)
54
50
  this.parseHeaders(messageHeader)
55
- //microprofiler.measureFrom(start2, 'parse headers', 100000)
56
- const start3 = microprofiler.start()
57
51
  this.parseBody(messageBody)
58
- microprofiler.measureFrom(start3, 'parse body', 100000)
59
52
  }
60
53
 
61
54
  parseFirstLine(messageHeader) {
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "loadtest",
3
- "version": "8.0.0",
3
+ "version": "8.0.2",
4
4
  "type": "module",
5
5
  "description": "Run load tests for your web application. Mostly ab-compatible interface, with an option to force requests per second. Includes an API for automated load testing.",
6
6
  "homepage": "https://github.com/alexfernandez/loadtest",