polyphony 1.1 → 1.1.1

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 61fa840f595a0e75da1d6e1b761506b18fa90d5a94b25ce5832d22aae2e08fbd
4
- data.tar.gz: 7d3a79225f68bb889ff0491ced1249c2679a222a3f79a5c4cf48f9571865ebc4
3
+ metadata.gz: a30f7362ca02a1e3b3fe8a76394d5bca243f8dc774b3a6f3f7e9ffa81aac04f7
4
+ data.tar.gz: 6fa0684c3e4ddf3fe62ea6d40d5e49578e36042ff57d20848cff6da165ab6027
5
5
  SHA512:
6
- metadata.gz: 32d61cf7e0858e704fc39fe317048e3e531afcf97da70b9b3d1e4b59a078e2f5c8c65a3f72c656b61285df9e5c99d7430938403560ca6e8e7aa5d920dada274e
7
- data.tar.gz: c5c1488b1cec55810ffccf8116d70e8312ccb7d93875c1047ab7608595eb81fda9601f44ef308b3e1f35319d683b6ceda423284c9cda76c6f44882a7341e681a
6
+ metadata.gz: 1eb08ca45b2129c25c5a1b023aea14fdbade30323a8f5db824f3c33c9b386f24cd541cfa7d536696c2ecd253dd7b4eeb976e87f53a93a59aa1ff96166e54fe05
7
+ data.tar.gz: 2f9145ea40f5d8aeb280cbc70249793805e770735c4758ff9f26f80138a752aacdf347e8e6705604a602ddec3216f33bc1797a00d9521585646dc2d8ccd432c4
data/CHANGELOG.md CHANGED
@@ -1,3 +1,7 @@
1
+ ## 1.1.1 2023-06-08
2
+
3
+ - Minor improvements to documentation
4
+
1
5
  ## 1.1 2023-06-08
2
6
 
3
7
  - Add advanced I/O doc page
data/docs/advanced-io.md CHANGED
@@ -1,5 +1,7 @@
1
1
  # @title Advanced I/O with Polyphony
2
2
 
3
+ # Advanced I/O with Polyphony
4
+
3
5
  ## Using splice for moving data between files and sockets
4
6
 
5
7
  Splice is linux-specific API that lets you move data between two file
@@ -10,12 +12,15 @@ size. Using splice, you can avoid the cost of having to load a file's content
10
12
  into memory, in order to send it to a TCP connection.
11
13
 
12
14
  In order to use `splice`, at least one of the file descriptors involved needs to
13
- be a pipe. This is because in Linux, pipes are actually kernel buffers. The
14
- normal way of using splice is that first you splice data from the source fd to
15
- the pipe (to its *write* fd), and then you splice data from the pipe (from its
16
- *read* fd) to the destination fd.
15
+ be a pipe. This is because in Linux, pipes are actually kernel buffers. The idea
16
+ is that you first move data from a source fd into a kernel buffer, then you move
17
+ data from the kernel buffer to the destination fd. In some cases, this lets the
18
+ Linux kernel completely avoid having to copy data in order to move it from the
19
+ source to the destination. So the normal way of using splice is that first you
20
+ splice data from the source fd to the pipe (to its *write* fd), and then you
21
+ splice data from the pipe (from its *read* fd) to the destination fd.
17
22
 
18
- Here's how we do splicing using Polyphony:
23
+ Here's how you can use splice with Polyphony:
19
24
 
20
25
  ```ruby
21
26
  def send_file_using_splice(src, dest)
@@ -25,24 +30,29 @@ def send_file_using_splice(src, dest)
25
30
  pipe = Polyphony::Pipe.new
26
31
  loop do
27
32
  # splices data from src to the pipe
28
- bytes_spliced = IO.splice(src, pipe, 2**14)
29
- break if bytes_spliced == 0 # EOF
33
+ bytes_available = IO.splice(src, pipe, 2**14)
34
+ break if bytes_available == 0 # EOF
30
35
 
31
36
  # splices data from the pipe to the dest
32
- IO.splice(pipe, dest, bytes_spliced)
37
+ while (bytes_avilable > 0)
38
+ written = IO.splice(pipe, dest, bytes_avilable)
39
+ bytes_avilable -= written
40
+ end
33
41
  end
34
42
  end
35
43
  ```
36
44
 
37
45
  Let's examine the code above. First of all, we have a loop that repeatedly
38
- splices data in chunks of 16KB. We break from the loop once EOF is encountered.
39
- Secondly, on each iteration of the loop we perform two splice operations
40
- sequentially. So, we need to repeatedly perform two splice operations, one after
41
- the other. Would there be a better way to do this?
46
+ splices data in chunks of 16KB, using the `IO.splice` API provided by Polyphony.
47
+ We break from the loop once EOF is encountered. Secondly, for moving data from
48
+ the pipe to the destination, we need to make sure *all* data made avilable on
49
+ the pipe has been spliced to the destination, since the call to `IO.splice` can
50
+ actually write fewer bytes than specified. So, we need to repeatedly perform two
51
+ splice operations, one after the other, and we need to make sure all data is
52
+ spliced to the destination. Would there be a better way to do this?
42
53
 
43
- Fortunately, Polyphony provides just the tools needed to do that. Firstly, we
44
- can tell Polyphony to splice data repeatedly until EOF is encountered by passing
45
- a negative max size:
54
+ Fortunately, with Polyphony there is! Firstly, we can tell Polyphony to splice
55
+ data repeatedly until EOF is encountered by passing a negative max size:
46
56
 
47
57
  ```ruby
48
58
  IO.splice(src, pipe, -2**14)
@@ -65,22 +75,22 @@ end
65
75
  ```
66
76
 
67
77
  There are a few things to notice here: While we have two concurrent operations
68
- running in two separate fibers, their are still inter-dependent in their
69
- individual progress, as one is filling a kernel buffer, and the other is
70
- flushing it, and thus the progress of whole will be bound by the slowest
71
- operation.
72
-
73
- Imagine an HTTP server that serves a large file to a slow client, or a client
74
- with a bad network connection. The web server is perfectly capable of reading
75
- the file from its disk very fast, but sending data to the HTTP client can be
76
- much much slower. The second splice operation, splicing from the pipe to the
77
- destination, will flush the kernel much more slowly that it is being filled. At
78
- a certain point, the buffer is full, and the first splice operation from the
79
- source to the pipe cannot continue. It will need to wait for the other splice
80
- operation to progress, in order to continue filling the buffer. This is called
81
- back-pressure propagation, and we get it automatically.
82
-
83
- So let's look at all the things we didn't need to do: we didn't need to read
78
+ running in two separate fibers, they are still inter-dependent in their
79
+ progress, as one is filling a kernel buffer, and the other is flushing it, and
80
+ thus the progress of the whole will be bound by the slowest operation.
81
+
82
+ Take an HTTP server that serves a large file to a slow client, or a client with
83
+ a bad network connection. The web server is perfectly capable of reading the
84
+ file from its disk very fast, but sending data to the HTTP client can be much
85
+ much slower. The second splice operation, splicing from the pipe to the
86
+ destination, will flush the kernel buffer much more slowly that it is being
87
+ filled. At a certain point, the buffer is full, and the first splice operation
88
+ from the source to the pipe cannot continue. It will need to wait for the other
89
+ splice operation to progress, in order to continue filling the buffer. This is
90
+ called back-pressure propagation, it's a good thing, and we get it
91
+ automatically.
92
+
93
+ Let's now look at all the things we didn't need to do: we didn't need to read
84
94
  data into a Ruby string (which is costly in CPU time, in memory, and eventually
85
95
  in GC pressure), we didn't need to manage a buffer and take care of
86
96
  synchronizing access to the buffer. We got to move data from the source to the
@@ -97,17 +107,17 @@ end
97
107
  ```
98
108
 
99
109
  The `IO.double_splice` creates a pipe and repeatedly splices data concurrently
100
- from the source to pipe and from the pipe to the destination until the source is
101
- exhausted. All this, without needing to instantiate a `Polyphony::Pipe` object,
102
- and without needing to spin up a second fiber, further minimizing memory use and
103
- GC pressure.
110
+ from the source to the pipe and from the pipe to the destination until the
111
+ source is exhausted. All this, without needing to instantiate a
112
+ `Polyphony::Pipe` object, and without needing to spin up a second fiber, further
113
+ minimizing memory use and GC pressure.
104
114
 
105
115
  ## Compressing and decompressing in-flight data
106
116
 
107
117
  You might be familiar with Ruby's [zlib](https://github.com/ruby/zlib) gem (docs
108
118
  [here](https://rubyapi.org/3.2/o/zlib)), which can be used to compress and
109
119
  uncompress data using the popular gzip format. Imagine we want to implement an
110
- HTTP server that can serve files compresszed using gzip:
120
+ HTTP server that can serve files compressed using gzip:
111
121
 
112
122
  ```ruby
113
123
  def serve_compressed_file(socket, file)
@@ -117,10 +127,10 @@ def serve_compressed_file(socket, file)
117
127
  end
118
128
  ```
119
129
 
120
- In the above example, we have read the file contents into a Ruby string, then
121
- passed the contents to `Zlib.gzip`, which returned the compressed contents in
122
- another Ruby string, then wrote the compressed data to the socket. We can see
123
- how this can lead to large allocations of memory (if the file is large), and
130
+ In the above example, we read the file contents into a Ruby string, then pass
131
+ the contents to `Zlib.gzip`, which returns the compressed contents in another
132
+ Ruby string, then write the compressed data to the socket. We can see how this
133
+ can lead to lots of memory allocations (especially if the file is large), and
124
134
  more pressure on the Ruby GC. How can we improve this?
125
135
 
126
136
  One way would be to utilise Zlib's `GzipWriter` class:
@@ -165,7 +175,7 @@ through some object that parses the data, or otherwise manipulates it. Normally,
165
175
  we would write a loop that repeatedly reads the data from the source, then
166
176
  passes it to the parser object. Imagine we have data transmitted using the
167
177
  `MessagePack` format that we need to convert back into its original form. We
168
- might do something like this:
178
+ might do something like the folowing:
169
179
 
170
180
  ```ruby
171
181
  def with_message_pack_data_from_io(io, &block)
@@ -215,10 +225,89 @@ With `IO#feed_loop` we get to write even less code, and as with `IO#read_loop`,
215
225
  `IO#feed_loop` is implemented at the C-extension level using a tight loop that
216
226
  maximizes performance.
217
227
 
228
+ ## Fast and easy chunked transfer-encoding in HTTP/1
229
+
230
+ [Chunked transfer
231
+ encoding](https://en.wikipedia.org/wiki/Chunked_transfer_encoding) is a great
232
+ way to serve HTTP responses of arbitrary size, because we don't need to know
233
+ their size in advance, which means we don't necessarily need to hold them in
234
+ memory, or perform expensive fstat calls to get file metadata. Sending HTTP
235
+ responses in chunked transfer encoding is simple enough:
236
+
237
+ ```ruby
238
+ def send_chunked_response_from_io(socket, io)
239
+ while true
240
+ chunk = io.read(MAX_CHUNK_SIZE)
241
+ socket << "#{chunk.bytesize.to_s(16)}\r\n#{chunk}\r\n"
242
+ break if chunk.empty?
243
+ end
244
+ end
245
+ ```
246
+
247
+ Note how we read the chunk into memory and then send it on to the client. Would
248
+ it be possible to splice the data instead? Let's see how that would look:
249
+
250
+ ```ruby
251
+ def send_chunked_response_from_io(socket, io)
252
+ pipe = Polyphony::Pipe.new
253
+ while true
254
+ bytes_spliced = IO.splice(io, pipe, MAX_CHUNK_SIZE)
255
+ socket << "#{bytes_spliced.to_s(16)}\r\n"
256
+ IO.splice(pipe, socket, bytes_spliced) if bytes_spliced > 0
257
+ socket << "\r\n"
258
+ break if bytes_spliced == 0
259
+ end
260
+ end
261
+ ```
262
+
263
+ In the code above, while we avoid having to read chunks of the source data into
264
+ Ruby strings, we now perform 3 I/O operations for each chunk: writing the chunk
265
+ size, splicing the data from the pipe (the kernel buffer), and finally writing
266
+ the `"\r\n"` delimiter. We can probably write some more complex logic to reduce
267
+ this to 2 operations (coalescing the two write operations into one), but still
268
+ this implementation involves a lot of back and forth between our code, the
269
+ Polyphony I/O backend, and the operating system.
270
+
271
+ Fortunately, Polyphony provides a special API for sending HTTP chunked
272
+ responses:
273
+
274
+ ```ruby
275
+ def send_chunked_response_from_io(socket, io)
276
+ IO.http1_splice_chunked(io, socket, MAX_CHUNK_SIZE)
277
+ end
278
+ ```
279
+
280
+ A single method call replaces the whole mechanism we devised above, and in
281
+ addition Polyphony makes sure to perform it with the minimum possible number of
282
+ I/O operations!
283
+
284
+ # Sending compressed data using chunked transfer encoding
285
+
286
+ We can now combine the different APIs discussed above to create even more
287
+ complex behaviour. Let's see how we can send an HTTP response using compressed
288
+ content encoding and chunked transfer encoding:
289
+
290
+ ```ruby
291
+ def send_compressed_chunked_response_from_io(socket, io)
292
+ pipe = Polyphony::Pipe.new
293
+ spin { IO.gzip(io, pipe) }
294
+ IO.http1_splice_chunked(pipe, socket, MAX_CHUNK_SIZE)
295
+ end
296
+ ```
297
+
298
+ The code above looks simple enough, but it actually packs a lot of power in just
299
+ 3 lines of code: we create a pipe, then spin up a fiber that compresses data
300
+ data `io` into the pipe. We then serve data from the pipe to the socket using
301
+ chunked transfer encoding. As discussed above, we do this without actually
302
+ allocating any Ruby strings for holding the data, we take maximum advantage of
303
+ kernel buffers (a.k.a. pipes) and we perform the two operations - compressing
304
+ the data and sending it to the client - concurrently.
305
+
218
306
  ## Conclusion
219
307
 
220
308
  In this article we have looked at some of the advanced I/O functionality
221
- provided by Polyphony, which lets us write less code, have it run faster, and
222
- minimize memory allocations and pressure on the Ruby GC. Feel free to browse the
223
- [IO examples](https://github.com/digital-fabric/polyphony/tree/master/examples/io)
309
+ provided by Polyphony, which lets us write less code, have it run faster, have
310
+ it run concurrently, and minimize memory allocations and pressure on the Ruby
311
+ GC. Feel free to browse the [IO
312
+ examples](https://github.com/digital-fabric/polyphony/tree/master/examples/io)
224
313
  included in Polyphony.
@@ -2,5 +2,5 @@
2
2
 
3
3
  module Polyphony
4
4
  # @!visibility private
5
- VERSION = '1.1'
5
+ VERSION = '1.1.1'
6
6
  end
data/polyphony.gemspec CHANGED
@@ -8,7 +8,7 @@ Gem::Specification.new do |s|
8
8
  s.author = 'Sharon Rosner'
9
9
  s.email = 'sharon@noteflakes.com'
10
10
  s.files = `git ls-files --recurse-submodules`.split.reject { |fn| fn =~ /liburing\/man/ }
11
- s.homepage = 'https://digital-fabric.github.io/polyphony'
11
+ s.homepage = 'https://github.com/digital-fabric/polyphony'
12
12
  s.metadata = {
13
13
  "source_code_uri" => "https://github.com/digital-fabric/polyphony",
14
14
  "documentation_uri" => "https://www.rubydoc.info/gems/polyphony",
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: polyphony
3
3
  version: !ruby/object:Gem::Version
4
- version: '1.1'
4
+ version: 1.1.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - Sharon Rosner
@@ -647,7 +647,7 @@ files:
647
647
  - vendor/liburing/test/version.c
648
648
  - vendor/liburing/test/wakeup-hang.c
649
649
  - vendor/liburing/test/xattr.c
650
- homepage: https://digital-fabric.github.io/polyphony
650
+ homepage: https://github.com/digital-fabric/polyphony
651
651
  licenses:
652
652
  - MIT
653
653
  metadata: