em-pg-client 0.2.1 → 0.3.0

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1 @@
1
+ lib/pg/em/client/*.rb lib/pg/em/*.rb lib/pg/*.rb - BENCHMARKS.md LICENSE HISTORY.md
@@ -1,26 +1,30 @@
1
- == Benchmarks
1
+ Benchmarks
2
+ ----------
2
3
 
3
- I've done some benchmark tests[link:benchmarks/em_pg.rb] to compare fully async and blocking em-pg drivers.
4
+ I've done some benchmark {file:benchmarks/em_pg.rb tests} to compare fully async and blocking em-pg drivers.
4
5
 
5
6
  The goal of the test is simply to retrieve (~80000) rows from table with a lot of text data, in chunks, using parallel connections.
6
7
  The parallel method uses synchrony for simplicity.
7
8
 
8
- * +single+ is (eventmachine-less) job for retrieving a whole data table in
9
+ * `single` is (eventmachine-less) job for retrieving a whole data table in
9
10
  one simple query "select * from resources"
10
- * +parallel+ chunk_row_count / concurrency] uses em-pg-client for retrieving
11
- result in chunks by +chunk_row_count+ rows and using +concurrency+ parallel
11
+ * `parallel` chunk_row_count / concurrency] uses em-pg-client for retrieving
12
+ result in chunks by `chunk_row_count` rows and using `concurrency` parallel
12
13
  connections
13
- * +blocking+ chunk_row_count / concurrency is similiar to +parallel+ except
14
+ * `blocking` chunk_row_count / concurrency is similiar to `parallel` except
14
15
  that it uses special patched version of library that uses blocking
15
16
  PGConnection methods
16
17
 
17
- == Environment
18
+ Environment
19
+ -----------
18
20
 
19
21
  The machine used for test is Linux CentOS 2.6.18-194.32.1.el5xen #1 SMP with Quad Core Xeon X3360 @ 2.83GHz, 4GB RAM.
20
22
  Postgres version used: 9.0.3.
21
23
 
22
- == The results:
24
+ The results:
25
+ ------------
23
26
 
27
+ ```
24
28
  >> benchmark 1000
25
29
  user system total real
26
30
  single: 80.970000 0.350000 81.320000 (205.592592)
@@ -34,10 +38,11 @@ Postgres version used: 9.0.3.
34
38
  blocking 5000/5: 79.930000 1.810000 81.740000 (223.342432)
35
39
  blocking 2000/10: 76.990000 2.820000 79.810000 (225.347169)
36
40
  blocking 1000/20: 78.790000 3.230000 82.020000 (225.949107)
41
+ ```
37
42
 
38
43
  As we can see the gain from using asynchronous pg client while
39
- using +parallel+ queries is noticeable (up to ~30%).
44
+ using `parallel` queries is noticeable (up to ~30%).
40
45
 
41
- The +blocking+ client however doesn't gain much from parallel execution.
46
+ The `blocking` client however doesn't gain much from parallel execution.
42
47
  This was expected because it freezes eventmachine until the whole
43
48
  dataset is consumed by the client.
@@ -1,20 +1,43 @@
1
+ 0.3.0
2
+
3
+ - dedicated asynchronous connection pool
4
+ - works on windows (with ruby 2.0+): uses PGConn#socket_io object instead of
5
+ #socket file descriptor
6
+ - socket watch handler is not being detached between command calls
7
+ - no more separate em and em-synchrony client
8
+ - api changes: async_exec and async_query command are now fiber-synchronized
9
+ - api changes: other async_* methods are removed or deprecated
10
+ - api changes: asynchronous methods renamed to *_defer
11
+ - transaction() helper method that can be called recursively
12
+ - requirements updated: eventmachine >~ 1.0.0, pg >= 0.17.0, ruby >= 1.9.2
13
+ - spec: more auto re-connect test cases
14
+ - spec: more tests for connection establishing
15
+ - comply with pg: do not close the client on connection failure
16
+ - comply with pg: asynchronous connect_timeout fallbacks to environment variable
17
+ - fix: auto re-connect raises an error if the failed connection had unfinished
18
+ transaction state
19
+ - yardoc docs
20
+
1
21
  0.2.1
22
+
2
23
  - support for pg >= 0.14 native PG::Result#check
3
24
  - support for pg >= 0.14 native PG::Connection#set_default_encoding
4
25
  - fix: connection option Hash argument was modified by Client.new and Client.async_connect
5
26
 
6
27
  0.2.0
28
+
7
29
  - disabled async_autoreconnect by default unless on_autoreconnect is set
8
30
  - async_connect sets #internal_encoding to Encoding.default_internal
9
31
  - fix: finish connection on async connect_timeout
10
32
  - nice errors generated on missing dependencies
11
33
  - blocking #reset() should clear async_command_aborted flag
12
34
  - less calls to #is_busy in Watcher#notify_readable
13
- - #async_describe_portal + specs
14
- - #async_describe_prepared + specs
35
+ - async_describe_portal() + specs
36
+ - async_describe_prepared() + specs
15
37
 
16
38
  0.2.0.pre.3
17
- - #status() returns CONNECTION_BAD for connections with expired query
39
+
40
+ - status() returns CONNECTION_BAD for connections with expired query
18
41
  - spec: query timeout expiration
19
42
  - non-blocking result processing for multiple data query statements sent at once
20
43
  - refine code in em-synchrony/pg
@@ -23,20 +46,23 @@
23
46
  - spec: autoreconnect
24
47
 
25
48
  0.2.0.pre.2
49
+
26
50
  - errors from consume_input fails deferrable
27
- - query_timeout now measures only network response timeout,
51
+ - query_timeout now measures only network response timeout,
28
52
  so it's not fired for large datasets
29
53
 
30
54
  0.2.0.pre.1
55
+
31
56
  - added query_timeout feature for async query commands
32
57
  - added connect_timeout property for async connect/reset
33
58
  - fix: async_autoreconnect for tcp/ip connections
34
59
  - fix: async_* does not raise errors; errors handled by deferrable
35
60
  - rework async_autoreconnect in fully async manner
36
- - added async_connect() and #async_reset()
61
+ - added async_connect() and async_reset()
37
62
  - API change: on_reconnect -> on_autoreconnect
38
63
 
39
64
  0.1.1
65
+
40
66
  - added on_reconnect callback
41
67
  - docs updated
42
68
  - added development dependency for eventmachine >= 1.0.0.beta.1
@@ -44,4 +70,5 @@
44
70
  - added error checking to eventmachine specs
45
71
 
46
72
  0.1.0
73
+
47
74
  - first release
data/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ Copyright (c) 2013 Rafal Michalski (rafal at yeondir dot com)
2
+
3
+ MIT License
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in
13
+ all copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
21
+ THE SOFTWARE.
@@ -0,0 +1,392 @@
1
+ em-pg-client
2
+ ============
3
+
4
+ Author: Rafał Michalski (rafal at yeondir dot com)
5
+
6
+ * http://github.com/royaltm/ruby-em-pg-client
7
+
8
+ Description
9
+ -----------
10
+
11
+ __em-pg-client__ is the Ruby EventMachine driver interface to the
12
+ PostgreSQL RDBMS. It is based on [ruby-pg](https://bitbucket.org/ged/ruby-pg).
13
+
14
+ __em-pg-client__ provides {PG::EM::Client} class which inherits
15
+ [PG::Connection](http://deveiate.org/code/pg/PG/Connection.html).
16
+ You can work with {PG::EM::Client} almost the same way you would work
17
+ with PG::Connection.
18
+
19
+ The real difference begins when you turn the EventMachine reactor on.
20
+
21
+ ```ruby
22
+ require 'pg/em'
23
+
24
+ pg = PG::EM::Client.new dbname: 'test'
25
+
26
+ # no async
27
+ pg.query('select * from foo') do |result|
28
+ puts Array(result).inspect
29
+ end
30
+
31
+ # asynchronous
32
+ EM.run do
33
+ Fiber.new do
34
+ pg.query('select * from foo') do |result|
35
+ puts Array(result).inspect
36
+ end
37
+ EM.stop
38
+ end.resume
39
+ end
40
+
41
+ # asynchronous + deferrable
42
+ EM.run do
43
+ df = pg.query_defer('select * from foo')
44
+ df.callback { |result|
45
+ puts Array(result).inspect
46
+ EM.stop
47
+ }
48
+ df.errback {|ex|
49
+ raise ex
50
+ }
51
+ puts "sent"
52
+ end
53
+ ```
54
+
55
+ Features
56
+ --------
57
+
58
+ * Non-blocking / fully asynchronous processing with EventMachine.
59
+ * Event reactor auto-detecting, asynchronous fiber-synchronized command methods
60
+ (the same code can be used regardless of the EventMachine reactor state)
61
+ * Asynchronous EM-style (deferrable returning) command methods.
62
+ * Fully asynchronous automatic re-connects on connection failures
63
+ (e.g.: RDBMS restarts, network failures).
64
+ * Minimal changes to [PG::Connection](http://deveiate.org/code/pg/PG/Connection.html) API.
65
+ * Configurable timeouts (connect or execute) of asynchronous processing.
66
+ * Dedicated connection pool with dynamic size, supporting asynchronous
67
+ processing and transactions.
68
+ * [Sequel Adapter](https://github.com/fl00r/em-pg-sequel) by Peter Yanovich.
69
+ * Works on windows (requires ruby 2.0) (issue #7).
70
+
71
+ Requirements
72
+ ------------
73
+
74
+ * ruby >= 1.9.2 (tested: 2.1.0, 2.0.0-p353, 1.9.3-p374, 1.9.2-p320)
75
+ * https://bitbucket.org/ged/ruby-pg >= 0.17.0
76
+ * [PostgreSQL](http://www.postgresql.org/ftp/source/) RDBMS >= 8.3
77
+ * http://rubyeventmachine.com >= 1.0.0
78
+ * [EM-Synchrony](https://github.com/igrigorik/em-synchrony)
79
+ (optional - not needed for any of the client functionality,
80
+ just wrap your code in a fiber)
81
+
82
+ Install
83
+ -------
84
+
85
+ ```
86
+ $ [sudo] gem install em-pg-client
87
+ ```
88
+
89
+ #### Gemfile
90
+
91
+ ```ruby
92
+ gem "em-pg-client", "~> 0.3.0"
93
+ ```
94
+
95
+ #### Github
96
+
97
+ ```
98
+ git clone git://github.com/royaltm/ruby-em-pg-client.git
99
+ ```
100
+
101
+ Usage
102
+ -----
103
+
104
+ ### PG::Connection commands adapted to the EventMachine
105
+
106
+ #### Asynchronous, the EventMachine style:
107
+
108
+ * `Client.connect_defer` (singleton method)
109
+ * `reset_defer`
110
+ * `exec_defer` (alias: `query_defer`)
111
+ * `prepare_defer`
112
+ * `exec_prepared_defer`
113
+ * `describe_prepared_defer`
114
+ * `describe_portal_defer`
115
+
116
+ For arguments of these methods consult their original (without the `_defer`
117
+ suffix) counterparts in the
118
+ [PG::Connection](http://deveiate.org/code/pg/PG/Connection.html) manual.
119
+
120
+ Use `callback` with a block on the returned deferrable object to receive the
121
+ result. In case of `connect_defer` and `reset_defer` the result is an instance
122
+ of the {PG::EM::Client}. The received client is in connected state and ready
123
+ for the queries. Otherwise an instance of the
124
+ [PG::Result](http://deveiate.org/code/pg/PG/Result.html) is received. You may
125
+ `clear` the obtained result object or leave it to `gc`.
126
+
127
+ To detect an error in the executed command call `errback` on the deferrable
128
+ with a block. You should expect an instance of the raised `Exception`
129
+ (usually PG::Error) as the block argument.
130
+
131
+ #### Reactor sensing methods, EM-Synchrony style:
132
+
133
+ * `Client.new` (singleton, alias: `connect`, `open`, `setdb`, `setdblogin`)
134
+ * `reset`
135
+ * `exec` (alias: `query`, `async_exec`, `async_query`)
136
+ * `prepare`
137
+ * `exec_prepared`
138
+ * `describe_prepared`
139
+ * `describe_portal`
140
+
141
+ The above methods call `*_defer` counterparts of themselves and `yield`
142
+ from the current fiber awaiting for the result. The PG::Result instance
143
+ (or PG::EM::Client for `new`) is then returned to the caller.
144
+ If a code block is given, it will be passed the result as an argument.
145
+ In that case the value of the block is returned instead and the result is
146
+ being cleared (or in case of `new` - client is being closed) after block
147
+ terminates.
148
+
149
+ These methods check if EventMachine's reactor is running and the current fiber
150
+ is not a root fiber. Otherwise the parent (thread-blocking) PG::Connection
151
+ methods are being called.
152
+
153
+ You can call asynchronous, fiber aware and blocking methods without finishing
154
+ the connection. You only need to start/stop EventMachine in between the
155
+ asynchronous calls.
156
+
157
+ Although the [em-synchrony](https://github.com/igrigorik/em-synchrony/)
158
+ provides very nice set of tools for the untangled EventMachine, you don't
159
+ really require it to fully benefit from the PG::EM::Client. Just wrap your
160
+ asynchronous code in a fiber:
161
+
162
+ Fiber.new { ... }.resume
163
+
164
+ #### Special options
165
+
166
+ There are four special connection options and one of them is a standard `pg`
167
+ option used by the async methods. You may pass them as one of the __hash__
168
+ options to {PG::EM::Client.new} or {PG::EM::Client.connect_defer} or simply
169
+ use the accessor methods to change them on the fly.
170
+
171
+ The options are:
172
+
173
+ * `connect_timeout`
174
+ * `query_timeout`
175
+ * `async_autoreconnect`
176
+ * `on_autoreconnect`
177
+
178
+ Only `connect_timeout` is a standard `libpq` option, although changing it with
179
+ the accessor method affects asynchronous functions only.
180
+ See {PG::EM::Client} for more details.
181
+
182
+ #### Handling errors
183
+
184
+ Exactly like in `pg`:
185
+
186
+ ```ruby
187
+ EM.synchrony do
188
+ begin
189
+ pg.query('smellect 1')
190
+ rescue => e
191
+ puts "error: #{e.inspect}"
192
+ end
193
+ EM.stop
194
+ end
195
+ ```
196
+
197
+ with *_defer methods:
198
+
199
+ ```ruby
200
+ EM.run do
201
+ pg.query_defer('smellect 1') do |ret|
202
+ if ret.is_a?(Exception)
203
+ puts "PSQL error: #{ret.inspect}"
204
+ end
205
+ end
206
+ end
207
+ ```
208
+
209
+ or
210
+
211
+ ```ruby
212
+ EM.run do
213
+ pg.query_defer('smellect 1').callback do |ret|
214
+ puts "do something with #{ret}"
215
+ end.errback do |err|
216
+ puts "PSQL error: #{err.inspect}"
217
+ end
218
+ end
219
+ ```
220
+
221
+ ### Auto re-connecting in asynchronous mode
222
+
223
+ Connection reset is done in a non-blocking manner using `reset_defer` internally.
224
+
225
+ ```ruby
226
+ EM.run do
227
+ Fiber.new do
228
+ pg = PG::EM::Client.new async_autoreconnect: true
229
+
230
+ try_query = lambda do
231
+ pg.query('select * from foo') do |result|
232
+ puts Array(result).inspect
233
+ end
234
+ end
235
+
236
+ try_query.call
237
+ system 'pg_ctl stop -m fast'
238
+ system 'pg_ctl start -w'
239
+ try_query.call
240
+
241
+ EM.stop
242
+ end.resume
243
+ end
244
+ ```
245
+
246
+ to enable this feature call:
247
+
248
+ ```ruby
249
+ pg.async_autoreconnect = true
250
+ ```
251
+
252
+ Additionally the `on_autoreconnect` callback may be set on the connection.
253
+ It's being invoked after successfull connection restart, just before the
254
+ pending command is sent again to the server.
255
+
256
+ ### Connection Pool
257
+
258
+ Forever alone? Not anymore! There is a dedicated {PG::EM::ConnectionPool}
259
+ class with dynamic pool for both types of asynchronous commands (deferral
260
+ and fiber-synchronized).
261
+
262
+ It also provides a #transaction method which locks the in-transaction
263
+ connection to the calling fiber and allows to execute commands
264
+ on the same connection within a transaction block. The transactions may
265
+ be nested. See also docs for the {PG::EM::Client#transaction} method.
266
+
267
+ #### Parallel async queries
268
+
269
+ ```ruby
270
+ require 'pg/em/connection_pool'
271
+ require 'em-synchrony'
272
+
273
+ EM.synchrony do
274
+ pg = PG::EM::ConnectionPool.new(size: 2, dbname: 'test')
275
+
276
+ multi = EM::Synchrony::Multi.new
277
+ multi.add :foo, pg.query_defer('select pg_sleep(1)')
278
+ multi.add :bar, pg.query_defer('select pg_sleep(1)')
279
+
280
+ start = Time.now
281
+ res = multi.perform
282
+ # around 1 sec.
283
+ puts Time.now - start
284
+
285
+ EM.stop
286
+ end
287
+ ```
288
+
289
+ #### Fiber Concurrency
290
+
291
+ ```ruby
292
+ require 'pg/em/connection_pool'
293
+ require 'em-synchrony'
294
+ require "em-synchrony/fiber_iterator"
295
+
296
+ EM.synchrony do
297
+ concurrency = 5
298
+ queries = (1..10).map {|i| "select pg_sleep(1); select #{i}" }
299
+
300
+ pg = PG::EM::ConnectionPool.new(size: concurrency, dbname: 'test')
301
+
302
+ start = Time.now
303
+ EM::Synchrony::FiberIterator.new(queries, concurrency).each do |query|
304
+ pg.query(query) do |result|
305
+ puts "recv: #{result.getvalue(0,0)}"
306
+ end
307
+ end
308
+ # around 2 secs.
309
+ puts Time.now - start
310
+
311
+ EM.stop
312
+ end
313
+ ```
314
+
315
+ API Changes
316
+ -----------
317
+
318
+ ### 0.2.x -> 0.3.x
319
+
320
+ There is a substantial difference in the API between this and the previous
321
+ releases. The idea behind it was to make this implementation as much
322
+ compatible as possible with the threaded `pg` interface.
323
+ E.g. the `#async_exec` is now an alias to `#exec`.
324
+
325
+ The other reason was to get rid of the ugly em / em-synchrony duality.
326
+
327
+ * There is no separate em-synchrony client version anymore.
328
+ * The methods returning Deferrable have now the `*_defer` suffix.
329
+ * The `#async_exec` and `#async_query` (in <= 0.2 they were deferrable methods)
330
+ are now aliases to `#exec`.
331
+ * The command methods `#exec`, `#query`, `#exec_*`, `#describe_*` are now
332
+ em-synchrony style methods (fiber-synchronized).
333
+ * The following methods were removed:
334
+
335
+ - `#async_prepare`,
336
+ - `#async_exec_prepared`,
337
+ - `#async_describe_prepared`,
338
+ - `#async_describe_portal`
339
+
340
+ as their names were confusing due to the unfortunate `#async_exec`.
341
+
342
+ * The `async_connect` and `#async_reset` are renamed to `connect_defer` and `#reset_defer`
343
+ respectively.
344
+
345
+ ### 0.1.x -> 0.2.x
346
+
347
+ * `on_reconnect` renamed to more accurate `on_autoreconnect`
348
+ (well, it's not used by PG::EM::Client#reset call).
349
+ * `async_autoreconnect` is `false` by default if `on_autoreconnect`
350
+ is __not__ specified as initialization option.
351
+
352
+ Bugs/Limitations
353
+ ----------------
354
+
355
+ * no async support for: COPY commands (`get_copy_data`, `put_copy_data`),
356
+ `wait_for_notify`
357
+ * actually no ActiveRecord support (you are welcome to contribute).
358
+
359
+ TODO:
360
+ -----
361
+
362
+ * implement streaming results (Postgres >= 9.2)
363
+ * implement EM adapted version of `get_copy_data`, `put_copy_data`,
364
+ `wait_for_notify` and `transaction`
365
+ * ORM (ActiveRecord and maybe Datamapper) support as separate projects
366
+ * present more benchmarks
367
+
368
+ More Info
369
+ ---------
370
+
371
+ This implementation makes use of non-blocking:
372
+ [PGConn#is_busy](http://deveiate.org/code/pg/PG/Connection.html#method-i-is_busy) and
373
+ [PGConn#consume_input](http://deveiate.org/code/pg/PG/Connection.html#method-i-consume_input) methods.
374
+ Depending on the size of queried results and the concurrency level, the gain
375
+ in overall speed and responsiveness of your application might be actually quite huge.
376
+ See {file:BENCHMARKS.md BENCHMARKING}.
377
+
378
+ Thanks
379
+ ------
380
+
381
+ The greetz go to:
382
+
383
+ * [Authors](https://bitbucket.org/ged/ruby-pg/wiki/Home#!copying) of __pg__
384
+ driver (especially for its async-api)
385
+ * Francis Cianfrocca for great reactor framework
386
+ [EventMachine](https://github.com/eventmachine/eventmachine)
387
+ * Ilya Grigorik [igrigorik](https://github.com/igrigorik) for
388
+ [untangling EM with Fibers](http://www.igvita.com/2010/03/22/untangling-evented-code-with-ruby-fibers/)
389
+ * Peter Yanovich [fl00r](https://github.com/fl00r) for the
390
+ [em-pg-sequel](https://github.com/fl00r/em-pg-sequel)
391
+ * Andrew Rudenko [prepor](https://github.com/prepor) for the implicit idea
392
+ of the re-usable watcher from his [em-pg](https://github.com/prepor/em-pg).