h1p 0.1 → 0.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 4fb3573e65df46350986981759a454c8bc2c4166901a46dd827fdc17a27b08b5
4
- data.tar.gz: afc10ead82e3286f61fa2bee1739499f275b75a386ed3e3401988805470c9b3a
3
+ metadata.gz: a27a3323482c8a0cec845e60962516b755338ccc54d0ba76bf6716bbb449a8ff
4
+ data.tar.gz: 22613ade68e261e3fca56cafcab410b1bed4587efe8c1da4ef52d1d335bc7efd
5
5
  SHA512:
6
- metadata.gz: 72ba304cda888fcfb1c8afaee1a03b94ddb15fde0110c303886e4036125fb5b464d214ebb731ea91dac792070c9cc3d5eccb789f667c8a590fae01afb88d5665
7
- data.tar.gz: 31cd6c6d7bb6695d338596b56be808cd1fed99cc76f11ee5f38fbdf7ff5d1206960a7134932ac82e9ca9c19116a20ee99db8f2a002fb76c1997134ac4d432dae
6
+ metadata.gz: 36cb9020745437627ad3dca5f83de6f81f8ad6e04075909db192a509b9f022960c5a7768c1f1ab8fdb807ae0d7b54e6febbc76581e68475c5f27057ff5b7aeb5
7
+ data.tar.gz: 20943cfead5e0236e60cb3a0d70057c69a91bbde45973f0f4ff87f38a1866523c1a02132254eebc13b8c5e2c4f75cd8e5791246257a792f63391cff88a6e1011
data/CHANGELOG.md CHANGED
@@ -1,4 +1,15 @@
1
- ## 0.42 2021-08-16
1
+ ## 0.4 2022-02-28
2
+
3
+ - Rename `__parser_read_method__` to `__read_method__`
4
+
5
+ ## 0.3 2022-02-03
6
+
7
+ - Add support for parsing HTTP responses (#1)
8
+ - Put state directly in parser struct
9
+
10
+ ## 0.2 2021-08-20
11
+
12
+ - Use uppercase for HTTP method
2
13
 
3
14
  ## 0.1 2021-08-19
4
15
 
data/Gemfile.lock CHANGED
@@ -1,33 +1,24 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- h1p (0.1)
4
+ h1p (0.4)
5
5
 
6
6
  GEM
7
7
  remote: https://rubygems.org/
8
8
  specs:
9
- ansi (1.5.0)
10
- builder (3.2.4)
11
- minitest (5.11.3)
12
- minitest-reporters (1.4.3)
13
- ansi
14
- builder
15
- minitest (>= 5.0)
16
- ruby-progressbar
17
- rake (12.3.3)
9
+ minitest (5.14.4)
10
+ rake (13.0.6)
18
11
  rake-compiler (1.1.1)
19
12
  rake
20
- ruby-progressbar (1.11.0)
21
13
 
22
14
  PLATFORMS
23
15
  ruby
24
16
 
25
17
  DEPENDENCIES
26
18
  h1p!
27
- minitest (~> 5.11.3)
28
- minitest-reporters (~> 1.4.2)
29
- rake (~> 12.3.3)
19
+ minitest (~> 5.14.4)
20
+ rake (~> 13.0.6)
30
21
  rake-compiler (= 1.1.1)
31
22
 
32
23
  BUNDLED WITH
33
- 2.1.4
24
+ 2.2.26
data/README.md CHANGED
@@ -1,5 +1,236 @@
1
- # h1p
1
+ # H1P - a blocking HTTP/1 parser for Ruby
2
2
 
3
- h1p
3
+ [![Gem Version](https://badge.fury.io/rb/h1p.svg)](http://rubygems.org/gems/h1p)
4
+ [![H1P Test](https://github.com/digital-fabric/h1p/workflows/Tests/badge.svg)](https://github.com/digital-fabric/h1p/actions?query=workflow%3ATests)
5
+ [![MIT licensed](https://img.shields.io/badge/license-MIT-blue.svg)](https://github.com/digital-fabric/h1p/blob/master/LICENSE)
4
6
 
5
- HTTP 1.1 parser for Ruby
7
+ H1P is a blocking/synchronous HTTP/1 parser for Ruby with a simple and intuitive
8
+ API. Its design lends itself to writing HTTP servers in a sequential style. As
9
+ such, it might prove useful in conjunction with the new fiber scheduler
10
+ introduced in Ruby 3.0, but is also useful with a normal thread-based server
11
+ (see
12
+ [example](https://github.com/digital-fabric/h1p/blob/main/examples/http_server.rb).)
13
+ The H1P was originally written as part of
14
+ [Tipi](https://github.com/digital-fabric/tipi), a web server running on top of
15
+ [Polyphony](https://github.com/digital-fabric/polyphony).
16
+
17
+ > H1P is still a very young project and as such should be used with caution. It
18
+ > has not undergone any significant conformance or security testing, and its API
19
+ > is not yet stable.
20
+
21
+ ## Features
22
+
23
+ - Simple, blocking/synchronous API
24
+ - Zero dependencies
25
+ - Transport-agnostic
26
+ - Parses both HTTP request and HTTP response
27
+ - Support for chunked encoding
28
+ - Support for both `LF` and `CRLF` line breaks
29
+ - Track total incoming traffic
30
+
31
+ ## Installing
32
+
33
+ If you're using bundler just add it to your `Gemfile`:
34
+
35
+ ```ruby
36
+ source 'https://rubygems.org'
37
+
38
+ gem 'h1p'
39
+ ```
40
+
41
+ You can then run `bundle install` to install it. Otherwise, just run `gem install h1p`.
42
+
43
+ ## Usage
44
+
45
+ Start by creating an instance of `H1P::Parser`, passing a connection instance and the parsing mode:
46
+
47
+ ```ruby
48
+ require 'h1p'
49
+
50
+ parser = H1P::Parser.new(conn, :server)
51
+ ```
52
+
53
+ In order to parse HTTP responses, change the mode to `:client`:
54
+
55
+ ```ruby
56
+ parser = H1P::Parser.new(conn, :client)
57
+ ```
58
+
59
+ To read the next message from the connection, call `#parse_headers`:
60
+
61
+ ```ruby
62
+ loop do
63
+ headers = parser.parse_headers
64
+ break unless headers
65
+
66
+ handle_request(headers)
67
+ end
68
+ ```
69
+
70
+ The `#parse_headers` method returns a single hash containing the different HTTP
71
+ headers. In case the client has closed the connection, `#parse_headers` will
72
+ return `nil` (see the guard clause above).
73
+
74
+ In addition to the header keys and values, the resulting hash also contains the
75
+ following "pseudo-headers" (in server mode):
76
+
77
+ - `:method`: the HTTP method (in upper case)
78
+ - `:path`: the request target
79
+ - `:protocol`: the protocol used (either `'http/1.0'` or `'http/1.1'`)
80
+ - `:rx`: the total bytes read by the parser
81
+
82
+ In client mode, the following pseudo-headers will be present:
83
+
84
+ - `:protocol`: the protocol used (either `'http/1.0'` or `'http/1.1'`)
85
+ - `:status': the HTTP status as an integer
86
+ - `:status_message`: the HTTP status message
87
+ - `:rx`: the total bytes read by the parser
88
+
89
+
90
+ The header keys are always lower-cased. Consider the following HTTP request:
91
+
92
+ ```
93
+ GET /foo HTTP/1.1
94
+ Host: example.com
95
+ User-Agent: curl/7.74.0
96
+ Accept: */*
97
+
98
+ ```
99
+
100
+ The request will be parsed into the following Ruby hash:
101
+
102
+ ```ruby
103
+ {
104
+ ":method" => "get",
105
+ ":path" => "/foo",
106
+ ":protocol" => "http/1.1",
107
+ "host" => "example.com",
108
+ "user-agent" => "curl/7.74.0",
109
+ "accept" => "*/*",
110
+ ":rx" => 78
111
+ }
112
+ ```
113
+
114
+ Multiple headers with the same key will be coalesced into a single key-value
115
+ where the value is an array containing the corresponding values. For example,
116
+ multiple `Cookie` headers will appear in the hash as a single `"cookie"` entry,
117
+ e.g. `{ "cookie" => ['a=1', 'b=2'] }`
118
+
119
+ ### Handling of invalid message
120
+
121
+ When an invalid message is encountered, the parser will raise a `H1P::Error`
122
+ exception. An incoming message may be considered invalid if an invalid character
123
+ has been encountered at any point in parsing the message, or if any of the
124
+ tokens have an invalid length. You can consult the limits used by the parser
125
+ [here](https://github.com/digital-fabric/h1p/blob/main/ext/h1p/limits.rb).
126
+
127
+ ### Reading the message body
128
+
129
+ To read the message body use `#read_body`:
130
+
131
+ ```ruby
132
+ # read entire body
133
+ body = parser.read_body
134
+ ```
135
+
136
+ The H1P parser knows how to read both message bodies with a specified
137
+ `Content-Length` and request bodies in chunked encoding. The method call will
138
+ return when the entire body has been read. If the body is incomplete or has
139
+ invalid formatting, the parser will raise a `H1P::Error` exception.
140
+
141
+ You can also read a single chunk of the body by calling `#read_body_chunk`:
142
+
143
+ ```ruby
144
+ # read a body chunk
145
+ chunk = parser.read_body_chunk(false)
146
+
147
+ # read chunk only from buffer:
148
+ chunk = parser.read_body_chunk(true)
149
+ ```
150
+
151
+ If no more chunks are availble, `#read_body_chunk` will return nil. To test
152
+ whether the request is complete, you can call `#complete?`:
153
+
154
+ ```ruby
155
+ headers = parser.parse_headers
156
+ unless parser.complete?
157
+ body = parser.read_body
158
+ end
159
+ ```
160
+
161
+ The `#read_body` and `#read_body_chunk` methods will return `nil` if no body is
162
+ expected (based on the received headers).
163
+
164
+ ## Parsing from arbitrary transports
165
+
166
+ The H1P parser was built to read from any arbitrary transport or source, as long
167
+ as they conform to one of two alternative interfaces:
168
+
169
+ - An object implementing a `__read_method__` method, which returns any of
170
+ the following values:
171
+
172
+ - `:stock_readpartial` - to be used for instances of `IO`, `Socket`,
173
+ `TCPSocket`, `SSLSocket` etc.
174
+ - `:backend_read` - for use in Polyphony-based servers.
175
+ - `:backend_recv` - for use in Polyphony-based servers.
176
+ - `:readpartial` - for use in Polyphony-based servers.
177
+
178
+ - An object implementing a `call` method, such as a `Proc` or any other. The
179
+ call is given a single argument signifying the maximum number of bytes to
180
+ read, and is expected to return either a string with the read data, or `nil`
181
+ if no more data is available. The callable can be passed as an argument or as
182
+ a block. Here's an example for parsing from a callable:
183
+
184
+ ```ruby
185
+ data = ['GET ', '/foo', " HTTP/1.1\r\n", "\r\n"]
186
+ data = ['GET ', '/foo', " HTTP/1.1\r\n", "\r\n"]
187
+ parser = H1P::Parser.new { data.shift }
188
+ parser.parse_headers
189
+ #=> {":method"=>"get", ":path"=>"/foo", ":protocol"=>"http/1.1", ":rx"=>21}
190
+ ```
191
+
192
+ ## Design
193
+
194
+ The H1P parser design is based on the following principles:
195
+
196
+ - Implement a blocking API for use with a sequential programming style.
197
+ - Minimize copying of data between buffers.
198
+ - Parse each piece of data only once.
199
+ - Minimize object and buffer allocations.
200
+ - Minimize the API surface area.
201
+
202
+ One of the unique aspects of H1P is that instead of the server needing to feed
203
+ data to the parser, the parser itself reads data from its source whenever it
204
+ needs more of it. If no data is yet available, the parser blocks until more data
205
+ is received.
206
+
207
+ The different parts of the request are parsed one byte at a time, and once each
208
+ token is considered complete, it is copied from the buffer into a new string, to
209
+ be stored in the headers hash.
210
+
211
+ ## Performance
212
+
213
+ The included benchmark (against
214
+ [http_parser.rb](https://github.com/tmm1/http_parser.rb), based on the *old*
215
+ [node.js HTTP parser](https://github.com/nodejs/http-parser)) shows the H1P
216
+ parser to be about 10-20% slower than http_parser.rb.
217
+
218
+ However, in a fiber-based environment such as
219
+ [Polyphony](https://github.com/digital-fabric/polyphony), H1P is slightly
220
+ faster, as the overhead of dealing with pipelined requests (which will cause
221
+ `http_parser.rb` to emit callbacks multiple times) significantly affects its
222
+ performance.
223
+
224
+ ## Roadmap
225
+
226
+ Here are some of the features and enhancements planned for H1P:
227
+
228
+ - Add conformance and security tests
229
+ - Add ability to splice the message body into an arbitrary fd
230
+ (Polyphony-specific)
231
+ - Improve performance
232
+
233
+ ## Contributing
234
+
235
+ Issues and pull requests will be gladly accepted. If you have found this gem
236
+ useful, please let me know.
data/Rakefile CHANGED
@@ -12,5 +12,5 @@ task :recompile => [:clean, :compile]
12
12
  task :default => [:compile, :test]
13
13
 
14
14
  task :test do
15
- exec 'ruby test/test_h1p.rb'
15
+ exec 'ruby test/run.rb'
16
16
  end
@@ -1,8 +1,18 @@
1
1
  # frozen_string_literal: true
2
2
 
3
- require 'bundler/setup'
3
+ HTTP_REQUEST = "GET /foo HTTP/1.1\r\nHost: example.com\r\nAccept: */*\r\nUser-Agent: foobar\r\n\r\n"
4
4
 
5
- HTTP_REQUEST = "GET /foo HTTP/1.1\r\nHost: example.com\r\nAccept: */*\r\n\r\n"
5
+ # HTTP_REQUEST =
6
+ # "GET /wp-content/uploads/2010/03/hello-kitty-darth-vader-pink.jpg HTTP/1.1\r\n" +
7
+ # "Host: www.kittyhell.com\r\n" +
8
+ # "User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; ja-JP-mac; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3 Pathtraq/0.9\r\n" +
9
+ # "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\n" +
10
+ # "Accept-Language: ja,en-us;q=0.7,en;q=0.3\r\n" +
11
+ # "Accept-Encoding: gzip,deflate\r\n" +
12
+ # "Accept-Charset: Shift_JIS,utf-8;q=0.7,*;q=0.7\r\n" +
13
+ # "Keep-Alive: 115\r\n" +
14
+ # "Connection: keep-alive\r\n" +
15
+ # "Cookie: wp_ozh_wsa_visits=2; wp_ozh_wsa_visit_lasttime=xxxxxxxxxx; __utma=xxxxxxxxx.xxxxxxxxxx.xxxxxxxxxx.xxxxxxxxxx.xxxxxxxxxx.x; __utmz=xxxxxxxxx.xxxxxxxxxx.x.x.utmccn=(referral)|utmcsr=reader.livedoor.com|utmcct=/reader/|utmcmd=referral\r\n\r\n"
6
16
 
7
17
  def measure_time_and_allocs
8
18
  4.times { GC.start }
@@ -26,15 +36,17 @@ end
26
36
  def benchmark_other_http1_parser(iterations)
27
37
  STDOUT << "http_parser.rb: "
28
38
  require 'http_parser.rb'
29
-
39
+
30
40
  i, o = IO.pipe
31
41
  parser = Http::Parser.new
32
42
  done = false
33
43
  headers = nil
44
+ rx = 0
34
45
  parser.on_headers_complete = proc do |h|
35
46
  headers = h
36
47
  headers[':method'] = parser.http_method
37
48
  headers[':path'] = parser.request_url
49
+ headers[':rx'] = rx
38
50
  end
39
51
  parser.on_message_complete = proc { done = true }
40
52
 
@@ -42,8 +54,10 @@ def benchmark_other_http1_parser(iterations)
42
54
  iterations.times do
43
55
  o << HTTP_REQUEST
44
56
  done = false
57
+ rx = 0
45
58
  while !done
46
59
  msg = i.readpartial(4096)
60
+ rx += msg.bytesize
47
61
  parser << msg
48
62
  end
49
63
  end
@@ -52,11 +66,10 @@ def benchmark_other_http1_parser(iterations)
52
66
  end
53
67
 
54
68
  def benchmark_tipi_http1_parser(iterations)
55
- STDOUT << "tipi parser: "
56
- require_relative '../lib/tipi_ext'
69
+ STDOUT << "H1P parser: "
70
+ require_relative '../lib/h1p'
57
71
  i, o = IO.pipe
58
- reader = proc { |len| i.readpartial(len) }
59
- parser = Tipi::HTTP1Parser.new(reader)
72
+ parser = H1P::Parser.new(i)
60
73
 
61
74
  elapsed, allocated = measure_time_and_allocs do
62
75
  iterations.times do
@@ -78,8 +91,8 @@ def fork_benchmark(method, iterations)
78
91
  Process.wait(pid)
79
92
  end
80
93
 
81
- x = 500000
82
- # fork_benchmark(:benchmark_other_http1_parser, x)
83
- # fork_benchmark(:benchmark_tipi_http1_parser, x)
94
+ x = 100000
95
+ fork_benchmark(:benchmark_other_http1_parser, x)
96
+ fork_benchmark(:benchmark_tipi_http1_parser, x)
84
97
 
85
- benchmark_tipi_http1_parser(x)
98
+ # benchmark_tipi_http1_parser(x)
@@ -0,0 +1,101 @@
1
+ # frozen_string_literal: true
2
+
3
+ HTTP_REQUEST = "GET /foo HTTP/1.1\r\nHost: example.com\r\nAccept: */*\r\nUser-Agent: foobar\r\n\r\n" +
4
+ "GET /bar HTTP/1.1\r\nHost: example.com\r\nAccept: */*\r\nUser-Agent: foobar\r\n\r\n"
5
+
6
+ def measure_time_and_allocs
7
+ 4.times { GC.start }
8
+ GC.disable
9
+
10
+ t0 = Time.now
11
+ a0 = object_count
12
+ yield
13
+ t1 = Time.now
14
+ a1 = object_count
15
+ [t1 - t0, a1 - a0]
16
+ ensure
17
+ GC.enable
18
+ end
19
+
20
+ def object_count
21
+ count = ObjectSpace.count_objects
22
+ count[:TOTAL] - count[:FREE]
23
+ end
24
+
25
+ def benchmark_other_http1_parser(iterations)
26
+ STDOUT << "http_parser.rb: "
27
+ require 'http_parser.rb'
28
+
29
+ i, o = IO.pipe
30
+ parser = Http::Parser.new
31
+ done = false
32
+ queue = nil
33
+ rx = 0
34
+ req_count = 0
35
+ parser.on_headers_complete = proc do |h|
36
+ h[':method'] = parser.http_method
37
+ h[':path'] = parser.request_url
38
+ h[':rx'] = rx
39
+ queue << h
40
+ end
41
+ parser.on_message_complete = proc { done = true }
42
+
43
+ writer = Thread.new do
44
+ iterations.times { o << HTTP_REQUEST }
45
+ o.close
46
+ end
47
+
48
+ elapsed, allocated = measure_time_and_allocs do
49
+ queue = []
50
+ done = false
51
+ rx = 0
52
+ loop do
53
+ data = i.readpartial(4096) rescue nil
54
+ break unless data
55
+
56
+ rx += data.bytesize
57
+ parser << data
58
+ while (req = queue.shift)
59
+ req_count += 1
60
+ end
61
+ end
62
+ end
63
+ puts(format('count: %d, elapsed: %f, allocated: %d (%f/req), rate: %f ips', req_count, elapsed, allocated, allocated.to_f / iterations, iterations / elapsed))
64
+ end
65
+
66
+ def benchmark_h1p_parser(iterations)
67
+ STDOUT << "H1P parser: "
68
+ require_relative '../lib/h1p'
69
+ i, o = IO.pipe
70
+ parser = H1P::Parser.new(i)
71
+ req_count = 0
72
+
73
+ writer = Thread.new do
74
+ iterations.times { o << HTTP_REQUEST }
75
+ o.close
76
+ end
77
+
78
+ elapsed, allocated = measure_time_and_allocs do
79
+ while (headers = parser.parse_headers)
80
+ req_count += 1
81
+ end
82
+ end
83
+ puts(format('count: %d, elapsed: %f, allocated: %d (%f/req), rate: %f ips', req_count, elapsed, allocated, allocated.to_f / iterations, iterations / elapsed))
84
+ end
85
+
86
+ def fork_benchmark(method, iterations)
87
+ pid = fork do
88
+ send(method, iterations)
89
+ rescue Exception => e
90
+ p e
91
+ p e.backtrace
92
+ exit!
93
+ end
94
+ Process.wait(pid)
95
+ end
96
+
97
+ x = 100000
98
+ fork_benchmark(:benchmark_other_http1_parser, x)
99
+ fork_benchmark(:benchmark_h1p_parser, x)
100
+
101
+ # benchmark_h1p_parser(x)
@@ -0,0 +1,10 @@
1
+ # frozen_string_literal: true
2
+
3
+ require 'bundler/setup'
4
+ require 'h1p'
5
+
6
+ data = ['GET ', '/foo', " HTTP/1.1\r\n", "\r\n"]
7
+ parser = H1P::Parser.new(proc { data.shift }, :server)
8
+
9
+ headers = parser.parse_headers
10
+ p headers
@@ -1,41 +1,39 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  require 'bundler/setup'
4
- require 'tipi'
5
-
6
- opts = {
7
- reuse_addr: true,
8
- dont_linger: true
9
- }
4
+ require 'h1p'
10
5
 
11
6
  puts "pid: #{Process.pid}"
12
- puts 'Listening on port 10080...'
7
+ puts 'Listening on port 1234...'
13
8
 
14
- # GC.disable
15
- # Thread.current.backend.idle_gc_period = 60
9
+ trap('SIGINT') { exit! }
16
10
 
17
- spin_loop(interval: 10) { p Thread.backend.stats }
11
+ def handle_client(conn)
12
+ Thread.new do
13
+ parser = H1P::Parser.new(conn, :server)
14
+ loop do
15
+ headers = parser.parse_headers
16
+ break unless headers
18
17
 
19
- spin_loop(interval: 10) do
20
- GC.compact
21
- end
18
+ req_body = parser.read_body
19
+
20
+ p headers: headers
21
+ p body: req_body
22
22
 
23
- spin do
24
- Tipi.serve('0.0.0.0', 10080, opts) do |req|
25
- if req.path == '/stream'
26
- req.send_headers('Foo' => 'Bar')
27
- sleep 1
28
- req.send_chunk("foo\n")
29
- sleep 1
30
- req.send_chunk("bar\n")
31
- req.finish
32
- elsif req.path == '/upload'
33
- body = req.read
34
- req.respond("Body: #{body.inspect} (#{body.bytesize} bytes)")
35
- else
36
- req.respond("Hello world!\n")
23
+ resp = 'Hello, world!'
24
+ conn << "HTTP/1.1 200 OK\r\nContent-Length: #{resp.bytesize}\r\n\r\n#{resp}"
25
+ rescue H1P::Error => e
26
+ puts "Invalid request: #{e.message}"
27
+ ensure
28
+ conn.close
29
+ break
37
30
  end
38
- # p req.transfer_counts
39
31
  end
40
- p 'done...'
41
- end.await
32
+ end
33
+
34
+ require 'socket'
35
+ server = TCPServer.new('0.0.0.0', 1234)
36
+ loop do
37
+ conn = server.accept
38
+ handle_client(conn)
39
+ end