h1p 0.1 → 0.4

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 4fb3573e65df46350986981759a454c8bc2c4166901a46dd827fdc17a27b08b5
4
- data.tar.gz: afc10ead82e3286f61fa2bee1739499f275b75a386ed3e3401988805470c9b3a
3
+ metadata.gz: a27a3323482c8a0cec845e60962516b755338ccc54d0ba76bf6716bbb449a8ff
4
+ data.tar.gz: 22613ade68e261e3fca56cafcab410b1bed4587efe8c1da4ef52d1d335bc7efd
5
5
  SHA512:
6
- metadata.gz: 72ba304cda888fcfb1c8afaee1a03b94ddb15fde0110c303886e4036125fb5b464d214ebb731ea91dac792070c9cc3d5eccb789f667c8a590fae01afb88d5665
7
- data.tar.gz: 31cd6c6d7bb6695d338596b56be808cd1fed99cc76f11ee5f38fbdf7ff5d1206960a7134932ac82e9ca9c19116a20ee99db8f2a002fb76c1997134ac4d432dae
6
+ metadata.gz: 36cb9020745437627ad3dca5f83de6f81f8ad6e04075909db192a509b9f022960c5a7768c1f1ab8fdb807ae0d7b54e6febbc76581e68475c5f27057ff5b7aeb5
7
+ data.tar.gz: 20943cfead5e0236e60cb3a0d70057c69a91bbde45973f0f4ff87f38a1866523c1a02132254eebc13b8c5e2c4f75cd8e5791246257a792f63391cff88a6e1011
data/CHANGELOG.md CHANGED
@@ -1,4 +1,15 @@
1
- ## 0.42 2021-08-16
1
+ ## 0.4 2022-02-28
2
+
3
+ - Rename `__parser_read_method__` to `__read_method__`
4
+
5
+ ## 0.3 2022-02-03
6
+
7
+ - Add support for parsing HTTP responses (#1)
8
+ - Put state directly in parser struct
9
+
10
+ ## 0.2 2021-08-20
11
+
12
+ - Use uppercase for HTTP method
2
13
 
3
14
  ## 0.1 2021-08-19
4
15
 
data/Gemfile.lock CHANGED
@@ -1,33 +1,24 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- h1p (0.1)
4
+ h1p (0.4)
5
5
 
6
6
  GEM
7
7
  remote: https://rubygems.org/
8
8
  specs:
9
- ansi (1.5.0)
10
- builder (3.2.4)
11
- minitest (5.11.3)
12
- minitest-reporters (1.4.3)
13
- ansi
14
- builder
15
- minitest (>= 5.0)
16
- ruby-progressbar
17
- rake (12.3.3)
9
+ minitest (5.14.4)
10
+ rake (13.0.6)
18
11
  rake-compiler (1.1.1)
19
12
  rake
20
- ruby-progressbar (1.11.0)
21
13
 
22
14
  PLATFORMS
23
15
  ruby
24
16
 
25
17
  DEPENDENCIES
26
18
  h1p!
27
- minitest (~> 5.11.3)
28
- minitest-reporters (~> 1.4.2)
29
- rake (~> 12.3.3)
19
+ minitest (~> 5.14.4)
20
+ rake (~> 13.0.6)
30
21
  rake-compiler (= 1.1.1)
31
22
 
32
23
  BUNDLED WITH
33
- 2.1.4
24
+ 2.2.26
data/README.md CHANGED
@@ -1,5 +1,236 @@
1
- # h1p
1
+ # H1P - a blocking HTTP/1 parser for Ruby
2
2
 
3
- h1p
3
+ [![Gem Version](https://badge.fury.io/rb/h1p.svg)](http://rubygems.org/gems/h1p)
4
+ [![H1P Test](https://github.com/digital-fabric/h1p/workflows/Tests/badge.svg)](https://github.com/digital-fabric/h1p/actions?query=workflow%3ATests)
5
+ [![MIT licensed](https://img.shields.io/badge/license-MIT-blue.svg)](https://github.com/digital-fabric/h1p/blob/master/LICENSE)
4
6
 
5
- HTTP 1.1 parser for Ruby
7
+ H1P is a blocking/synchronous HTTP/1 parser for Ruby with a simple and intuitive
8
+ API. Its design lends itself to writing HTTP servers in a sequential style. As
9
+ such, it might prove useful in conjunction with the new fiber scheduler
10
+ introduced in Ruby 3.0, but is also useful with a normal thread-based server
11
+ (see
12
+ [example](https://github.com/digital-fabric/h1p/blob/main/examples/http_server.rb).)
13
+ The H1P was originally written as part of
14
+ [Tipi](https://github.com/digital-fabric/tipi), a web server running on top of
15
+ [Polyphony](https://github.com/digital-fabric/polyphony).
16
+
17
+ > H1P is still a very young project and as such should be used with caution. It
18
+ > has not undergone any significant conformance or security testing, and its API
19
+ > is not yet stable.
20
+
21
+ ## Features
22
+
23
+ - Simple, blocking/synchronous API
24
+ - Zero dependencies
25
+ - Transport-agnostic
26
+ - Parses both HTTP request and HTTP response
27
+ - Support for chunked encoding
28
+ - Support for both `LF` and `CRLF` line breaks
29
+ - Track total incoming traffic
30
+
31
+ ## Installing
32
+
33
+ If you're using bundler just add it to your `Gemfile`:
34
+
35
+ ```ruby
36
+ source 'https://rubygems.org'
37
+
38
+ gem 'h1p'
39
+ ```
40
+
41
+ You can then run `bundle install` to install it. Otherwise, just run `gem install h1p`.
42
+
43
+ ## Usage
44
+
45
+ Start by creating an instance of `H1P::Parser`, passing a connection instance and the parsing mode:
46
+
47
+ ```ruby
48
+ require 'h1p'
49
+
50
+ parser = H1P::Parser.new(conn, :server)
51
+ ```
52
+
53
+ In order to parse HTTP responses, change the mode to `:client`:
54
+
55
+ ```ruby
56
+ parser = H1P::Parser.new(conn, :client)
57
+ ```
58
+
59
+ To read the next message from the connection, call `#parse_headers`:
60
+
61
+ ```ruby
62
+ loop do
63
+ headers = parser.parse_headers
64
+ break unless headers
65
+
66
+ handle_request(headers)
67
+ end
68
+ ```
69
+
70
+ The `#parse_headers` method returns a single hash containing the different HTTP
71
+ headers. In case the client has closed the connection, `#parse_headers` will
72
+ return `nil` (see the guard clause above).
73
+
74
+ In addition to the header keys and values, the resulting hash also contains the
75
+ following "pseudo-headers" (in server mode):
76
+
77
+ - `:method`: the HTTP method (in upper case)
78
+ - `:path`: the request target
79
+ - `:protocol`: the protocol used (either `'http/1.0'` or `'http/1.1'`)
80
+ - `:rx`: the total bytes read by the parser
81
+
82
+ In client mode, the following pseudo-headers will be present:
83
+
84
+ - `:protocol`: the protocol used (either `'http/1.0'` or `'http/1.1'`)
85
+ - `:status': the HTTP status as an integer
86
+ - `:status_message`: the HTTP status message
87
+ - `:rx`: the total bytes read by the parser
88
+
89
+
90
+ The header keys are always lower-cased. Consider the following HTTP request:
91
+
92
+ ```
93
+ GET /foo HTTP/1.1
94
+ Host: example.com
95
+ User-Agent: curl/7.74.0
96
+ Accept: */*
97
+
98
+ ```
99
+
100
+ The request will be parsed into the following Ruby hash:
101
+
102
+ ```ruby
103
+ {
104
+ ":method" => "get",
105
+ ":path" => "/foo",
106
+ ":protocol" => "http/1.1",
107
+ "host" => "example.com",
108
+ "user-agent" => "curl/7.74.0",
109
+ "accept" => "*/*",
110
+ ":rx" => 78
111
+ }
112
+ ```
113
+
114
+ Multiple headers with the same key will be coalesced into a single key-value
115
+ where the value is an array containing the corresponding values. For example,
116
+ multiple `Cookie` headers will appear in the hash as a single `"cookie"` entry,
117
+ e.g. `{ "cookie" => ['a=1', 'b=2'] }`
118
+
119
+ ### Handling of invalid message
120
+
121
+ When an invalid message is encountered, the parser will raise a `H1P::Error`
122
+ exception. An incoming message may be considered invalid if an invalid character
123
+ has been encountered at any point in parsing the message, or if any of the
124
+ tokens have an invalid length. You can consult the limits used by the parser
125
+ [here](https://github.com/digital-fabric/h1p/blob/main/ext/h1p/limits.rb).
126
+
127
+ ### Reading the message body
128
+
129
+ To read the message body use `#read_body`:
130
+
131
+ ```ruby
132
+ # read entire body
133
+ body = parser.read_body
134
+ ```
135
+
136
+ The H1P parser knows how to read both message bodies with a specified
137
+ `Content-Length` and request bodies in chunked encoding. The method call will
138
+ return when the entire body has been read. If the body is incomplete or has
139
+ invalid formatting, the parser will raise a `H1P::Error` exception.
140
+
141
+ You can also read a single chunk of the body by calling `#read_body_chunk`:
142
+
143
+ ```ruby
144
+ # read a body chunk
145
+ chunk = parser.read_body_chunk(false)
146
+
147
+ # read chunk only from buffer:
148
+ chunk = parser.read_body_chunk(true)
149
+ ```
150
+
151
+ If no more chunks are availble, `#read_body_chunk` will return nil. To test
152
+ whether the request is complete, you can call `#complete?`:
153
+
154
+ ```ruby
155
+ headers = parser.parse_headers
156
+ unless parser.complete?
157
+ body = parser.read_body
158
+ end
159
+ ```
160
+
161
+ The `#read_body` and `#read_body_chunk` methods will return `nil` if no body is
162
+ expected (based on the received headers).
163
+
164
+ ## Parsing from arbitrary transports
165
+
166
+ The H1P parser was built to read from any arbitrary transport or source, as long
167
+ as they conform to one of two alternative interfaces:
168
+
169
+ - An object implementing a `__read_method__` method, which returns any of
170
+ the following values:
171
+
172
+ - `:stock_readpartial` - to be used for instances of `IO`, `Socket`,
173
+ `TCPSocket`, `SSLSocket` etc.
174
+ - `:backend_read` - for use in Polyphony-based servers.
175
+ - `:backend_recv` - for use in Polyphony-based servers.
176
+ - `:readpartial` - for use in Polyphony-based servers.
177
+
178
+ - An object implementing a `call` method, such as a `Proc` or any other. The
179
+ call is given a single argument signifying the maximum number of bytes to
180
+ read, and is expected to return either a string with the read data, or `nil`
181
+ if no more data is available. The callable can be passed as an argument or as
182
+ a block. Here's an example for parsing from a callable:
183
+
184
+ ```ruby
185
+ data = ['GET ', '/foo', " HTTP/1.1\r\n", "\r\n"]
186
+ data = ['GET ', '/foo', " HTTP/1.1\r\n", "\r\n"]
187
+ parser = H1P::Parser.new { data.shift }
188
+ parser.parse_headers
189
+ #=> {":method"=>"get", ":path"=>"/foo", ":protocol"=>"http/1.1", ":rx"=>21}
190
+ ```
191
+
192
+ ## Design
193
+
194
+ The H1P parser design is based on the following principles:
195
+
196
+ - Implement a blocking API for use with a sequential programming style.
197
+ - Minimize copying of data between buffers.
198
+ - Parse each piece of data only once.
199
+ - Minimize object and buffer allocations.
200
+ - Minimize the API surface area.
201
+
202
+ One of the unique aspects of H1P is that instead of the server needing to feed
203
+ data to the parser, the parser itself reads data from its source whenever it
204
+ needs more of it. If no data is yet available, the parser blocks until more data
205
+ is received.
206
+
207
+ The different parts of the request are parsed one byte at a time, and once each
208
+ token is considered complete, it is copied from the buffer into a new string, to
209
+ be stored in the headers hash.
210
+
211
+ ## Performance
212
+
213
+ The included benchmark (against
214
+ [http_parser.rb](https://github.com/tmm1/http_parser.rb), based on the *old*
215
+ [node.js HTTP parser](https://github.com/nodejs/http-parser)) shows the H1P
216
+ parser to be about 10-20% slower than http_parser.rb.
217
+
218
+ However, in a fiber-based environment such as
219
+ [Polyphony](https://github.com/digital-fabric/polyphony), H1P is slightly
220
+ faster, as the overhead of dealing with pipelined requests (which will cause
221
+ `http_parser.rb` to emit callbacks multiple times) significantly affects its
222
+ performance.
223
+
224
+ ## Roadmap
225
+
226
+ Here are some of the features and enhancements planned for H1P:
227
+
228
+ - Add conformance and security tests
229
+ - Add ability to splice the message body into an arbitrary fd
230
+ (Polyphony-specific)
231
+ - Improve performance
232
+
233
+ ## Contributing
234
+
235
+ Issues and pull requests will be gladly accepted. If you have found this gem
236
+ useful, please let me know.
data/Rakefile CHANGED
@@ -12,5 +12,5 @@ task :recompile => [:clean, :compile]
12
12
  task :default => [:compile, :test]
13
13
 
14
14
  task :test do
15
- exec 'ruby test/test_h1p.rb'
15
+ exec 'ruby test/run.rb'
16
16
  end
@@ -1,8 +1,18 @@
1
1
  # frozen_string_literal: true
2
2
 
3
- require 'bundler/setup'
3
+ HTTP_REQUEST = "GET /foo HTTP/1.1\r\nHost: example.com\r\nAccept: */*\r\nUser-Agent: foobar\r\n\r\n"
4
4
 
5
- HTTP_REQUEST = "GET /foo HTTP/1.1\r\nHost: example.com\r\nAccept: */*\r\n\r\n"
5
+ # HTTP_REQUEST =
6
+ # "GET /wp-content/uploads/2010/03/hello-kitty-darth-vader-pink.jpg HTTP/1.1\r\n" +
7
+ # "Host: www.kittyhell.com\r\n" +
8
+ # "User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; ja-JP-mac; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3 Pathtraq/0.9\r\n" +
9
+ # "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\n" +
10
+ # "Accept-Language: ja,en-us;q=0.7,en;q=0.3\r\n" +
11
+ # "Accept-Encoding: gzip,deflate\r\n" +
12
+ # "Accept-Charset: Shift_JIS,utf-8;q=0.7,*;q=0.7\r\n" +
13
+ # "Keep-Alive: 115\r\n" +
14
+ # "Connection: keep-alive\r\n" +
15
+ # "Cookie: wp_ozh_wsa_visits=2; wp_ozh_wsa_visit_lasttime=xxxxxxxxxx; __utma=xxxxxxxxx.xxxxxxxxxx.xxxxxxxxxx.xxxxxxxxxx.xxxxxxxxxx.x; __utmz=xxxxxxxxx.xxxxxxxxxx.x.x.utmccn=(referral)|utmcsr=reader.livedoor.com|utmcct=/reader/|utmcmd=referral\r\n\r\n"
6
16
 
7
17
  def measure_time_and_allocs
8
18
  4.times { GC.start }
@@ -26,15 +36,17 @@ end
26
36
  def benchmark_other_http1_parser(iterations)
27
37
  STDOUT << "http_parser.rb: "
28
38
  require 'http_parser.rb'
29
-
39
+
30
40
  i, o = IO.pipe
31
41
  parser = Http::Parser.new
32
42
  done = false
33
43
  headers = nil
44
+ rx = 0
34
45
  parser.on_headers_complete = proc do |h|
35
46
  headers = h
36
47
  headers[':method'] = parser.http_method
37
48
  headers[':path'] = parser.request_url
49
+ headers[':rx'] = rx
38
50
  end
39
51
  parser.on_message_complete = proc { done = true }
40
52
 
@@ -42,8 +54,10 @@ def benchmark_other_http1_parser(iterations)
42
54
  iterations.times do
43
55
  o << HTTP_REQUEST
44
56
  done = false
57
+ rx = 0
45
58
  while !done
46
59
  msg = i.readpartial(4096)
60
+ rx += msg.bytesize
47
61
  parser << msg
48
62
  end
49
63
  end
@@ -52,11 +66,10 @@ def benchmark_other_http1_parser(iterations)
52
66
  end
53
67
 
54
68
  def benchmark_tipi_http1_parser(iterations)
55
- STDOUT << "tipi parser: "
56
- require_relative '../lib/tipi_ext'
69
+ STDOUT << "H1P parser: "
70
+ require_relative '../lib/h1p'
57
71
  i, o = IO.pipe
58
- reader = proc { |len| i.readpartial(len) }
59
- parser = Tipi::HTTP1Parser.new(reader)
72
+ parser = H1P::Parser.new(i)
60
73
 
61
74
  elapsed, allocated = measure_time_and_allocs do
62
75
  iterations.times do
@@ -78,8 +91,8 @@ def fork_benchmark(method, iterations)
78
91
  Process.wait(pid)
79
92
  end
80
93
 
81
- x = 500000
82
- # fork_benchmark(:benchmark_other_http1_parser, x)
83
- # fork_benchmark(:benchmark_tipi_http1_parser, x)
94
+ x = 100000
95
+ fork_benchmark(:benchmark_other_http1_parser, x)
96
+ fork_benchmark(:benchmark_tipi_http1_parser, x)
84
97
 
85
- benchmark_tipi_http1_parser(x)
98
+ # benchmark_tipi_http1_parser(x)
@@ -0,0 +1,101 @@
1
+ # frozen_string_literal: true
2
+
3
+ HTTP_REQUEST = "GET /foo HTTP/1.1\r\nHost: example.com\r\nAccept: */*\r\nUser-Agent: foobar\r\n\r\n" +
4
+ "GET /bar HTTP/1.1\r\nHost: example.com\r\nAccept: */*\r\nUser-Agent: foobar\r\n\r\n"
5
+
6
+ def measure_time_and_allocs
7
+ 4.times { GC.start }
8
+ GC.disable
9
+
10
+ t0 = Time.now
11
+ a0 = object_count
12
+ yield
13
+ t1 = Time.now
14
+ a1 = object_count
15
+ [t1 - t0, a1 - a0]
16
+ ensure
17
+ GC.enable
18
+ end
19
+
20
+ def object_count
21
+ count = ObjectSpace.count_objects
22
+ count[:TOTAL] - count[:FREE]
23
+ end
24
+
25
+ def benchmark_other_http1_parser(iterations)
26
+ STDOUT << "http_parser.rb: "
27
+ require 'http_parser.rb'
28
+
29
+ i, o = IO.pipe
30
+ parser = Http::Parser.new
31
+ done = false
32
+ queue = nil
33
+ rx = 0
34
+ req_count = 0
35
+ parser.on_headers_complete = proc do |h|
36
+ h[':method'] = parser.http_method
37
+ h[':path'] = parser.request_url
38
+ h[':rx'] = rx
39
+ queue << h
40
+ end
41
+ parser.on_message_complete = proc { done = true }
42
+
43
+ writer = Thread.new do
44
+ iterations.times { o << HTTP_REQUEST }
45
+ o.close
46
+ end
47
+
48
+ elapsed, allocated = measure_time_and_allocs do
49
+ queue = []
50
+ done = false
51
+ rx = 0
52
+ loop do
53
+ data = i.readpartial(4096) rescue nil
54
+ break unless data
55
+
56
+ rx += data.bytesize
57
+ parser << data
58
+ while (req = queue.shift)
59
+ req_count += 1
60
+ end
61
+ end
62
+ end
63
+ puts(format('count: %d, elapsed: %f, allocated: %d (%f/req), rate: %f ips', req_count, elapsed, allocated, allocated.to_f / iterations, iterations / elapsed))
64
+ end
65
+
66
+ def benchmark_h1p_parser(iterations)
67
+ STDOUT << "H1P parser: "
68
+ require_relative '../lib/h1p'
69
+ i, o = IO.pipe
70
+ parser = H1P::Parser.new(i)
71
+ req_count = 0
72
+
73
+ writer = Thread.new do
74
+ iterations.times { o << HTTP_REQUEST }
75
+ o.close
76
+ end
77
+
78
+ elapsed, allocated = measure_time_and_allocs do
79
+ while (headers = parser.parse_headers)
80
+ req_count += 1
81
+ end
82
+ end
83
+ puts(format('count: %d, elapsed: %f, allocated: %d (%f/req), rate: %f ips', req_count, elapsed, allocated, allocated.to_f / iterations, iterations / elapsed))
84
+ end
85
+
86
+ def fork_benchmark(method, iterations)
87
+ pid = fork do
88
+ send(method, iterations)
89
+ rescue Exception => e
90
+ p e
91
+ p e.backtrace
92
+ exit!
93
+ end
94
+ Process.wait(pid)
95
+ end
96
+
97
+ x = 100000
98
+ fork_benchmark(:benchmark_other_http1_parser, x)
99
+ fork_benchmark(:benchmark_h1p_parser, x)
100
+
101
+ # benchmark_h1p_parser(x)
@@ -0,0 +1,10 @@
1
+ # frozen_string_literal: true
2
+
3
+ require 'bundler/setup'
4
+ require 'h1p'
5
+
6
+ data = ['GET ', '/foo', " HTTP/1.1\r\n", "\r\n"]
7
+ parser = H1P::Parser.new(proc { data.shift }, :server)
8
+
9
+ headers = parser.parse_headers
10
+ p headers
@@ -1,41 +1,39 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  require 'bundler/setup'
4
- require 'tipi'
5
-
6
- opts = {
7
- reuse_addr: true,
8
- dont_linger: true
9
- }
4
+ require 'h1p'
10
5
 
11
6
  puts "pid: #{Process.pid}"
12
- puts 'Listening on port 10080...'
7
+ puts 'Listening on port 1234...'
13
8
 
14
- # GC.disable
15
- # Thread.current.backend.idle_gc_period = 60
9
+ trap('SIGINT') { exit! }
16
10
 
17
- spin_loop(interval: 10) { p Thread.backend.stats }
11
+ def handle_client(conn)
12
+ Thread.new do
13
+ parser = H1P::Parser.new(conn, :server)
14
+ loop do
15
+ headers = parser.parse_headers
16
+ break unless headers
18
17
 
19
- spin_loop(interval: 10) do
20
- GC.compact
21
- end
18
+ req_body = parser.read_body
19
+
20
+ p headers: headers
21
+ p body: req_body
22
22
 
23
- spin do
24
- Tipi.serve('0.0.0.0', 10080, opts) do |req|
25
- if req.path == '/stream'
26
- req.send_headers('Foo' => 'Bar')
27
- sleep 1
28
- req.send_chunk("foo\n")
29
- sleep 1
30
- req.send_chunk("bar\n")
31
- req.finish
32
- elsif req.path == '/upload'
33
- body = req.read
34
- req.respond("Body: #{body.inspect} (#{body.bytesize} bytes)")
35
- else
36
- req.respond("Hello world!\n")
23
+ resp = 'Hello, world!'
24
+ conn << "HTTP/1.1 200 OK\r\nContent-Length: #{resp.bytesize}\r\n\r\n#{resp}"
25
+ rescue H1P::Error => e
26
+ puts "Invalid request: #{e.message}"
27
+ ensure
28
+ conn.close
29
+ break
37
30
  end
38
- # p req.transfer_counts
39
31
  end
40
- p 'done...'
41
- end.await
32
+ end
33
+
34
+ require 'socket'
35
+ server = TCPServer.new('0.0.0.0', 1234)
36
+ loop do
37
+ conn = server.accept
38
+ handle_client(conn)
39
+ end