perfectqueue 0.7.0

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,16 @@
1
+
2
+ == 2011-08-31 version 0.7.0
3
+
4
+ * Backend uses logical delete instead of deleting tasks when task is finished
5
+ or canceled. It surely raises error when duplicated job id is submitted.
6
+
7
+
8
+ == 2011-08-25 version 0.6.1
9
+
10
+ * Supports SimpleDB Backend
11
+
12
+
13
+ == 2011-08-23 version 0.6.0
14
+
15
+ * First release
16
+
@@ -0,0 +1,223 @@
1
+ = PerfectQueue
2
+
3
+ Highly available distributed queue.
4
+
5
+ It provides exactly-once semantics unless backend database fails. Pushed tasks are surely retried by another worker node even if a worker fails. And it never delivers finished/canceled tasks.
6
+
7
+ Backend database is pluggable. You can use any databases that supports CAS (compare-and-swap) operation. PerfectQueue supports RDBMS and Amazon SimpleDB for now.
8
+
9
+
10
+ == Architecture
11
+
12
+ PerfectQueue uses following database schema:
13
+
14
+ (
15
+ id:string -- unique identifier of the task
16
+ data:blob -- additional attributes of the task
17
+ created_at:int -- unix time when the task is created (or null for canceled tasks)
18
+ timeout:int
19
+ )
20
+
21
+ 1. list: lists tasks whose timeout column is old enough.
22
+ 2. lock: updates timeout column of the first task
23
+ 3. run: executes a command
24
+ * if the task takes long time, updates the timeout column. this is repeated until the task is finished
25
+ * if the task takes more long time, kills the process
26
+ 4. remove: if it succeeded, removes the row from the backend database
27
+ 5. or leave: if it failed, leave the row and expect to be retried
28
+
29
+
30
+ == Usage
31
+
32
+ === Submitting a task
33
+
34
+ *Using* *command* *line:*
35
+
36
+ # RDBMS
37
+ $ perfectqueue \
38
+ --database mysql://user:password@localhost/mydb \
39
+ --table perfectqueue \
40
+ --push unique-key-id='{"any":"data"}'
41
+
42
+ # SimpleDB
43
+ $ perfectqueue \
44
+ --simpledb your-simpledb-domain-name \
45
+ -k AWS_KEY_ID \
46
+ -s AWS_SECRET_KEY \
47
+ --push unique-key-id='{"any":"data"}'
48
+
49
+ *Using* *PerfectQueue* *library:*
50
+
51
+ require 'perfectqueue'
52
+
53
+ # RDBMS
54
+ require 'perfectqueue/backend/rdb'
55
+ queue = PerfectQueue::Backend::RDBBackend.new(
56
+ 'mysql://user:password@localhost/mydb', table='perfectqueue')
57
+
58
+ # SimpleDB
59
+ require 'perfectqueue/backend/simpledb'
60
+ queue = PerfectQueue::Backend::SimpleDBBackend.new(
61
+ 'AWS_KEY_ID', 'AWS_SECRET_KEY', 'your-simpledb-domain-name')
62
+
63
+ queue.submit('unique-key-id', '{"any":"data"}')
64
+
65
+
66
+ Alternatively, you can insert a row into the backend database directly.
67
+
68
+ *RDBMS:*
69
+
70
+ > CREATE TABLE IF NOT EXISTS perfectqueue (
71
+ id VARCHAR(256) NOT NULL,
72
+ timeout INT NOT NULL,
73
+ data BLOB NOT NULL,
74
+ created_at INT,
75
+ PRIMARY KEY (id)
76
+ );
77
+ > SET @now = UNIX_TIMESTAMP();
78
+ > INSERT INTO perfectqueue (id, timeout, data, created_at)
79
+ VALUES ('unique-task-id', @now, '{"any":"data"}', @now);
80
+
81
+ *SimpleDB:*
82
+
83
+ require 'aws' # gem install aws-sdk
84
+ queue = AWS::SimpleDB.new
85
+ domain = queue.domains['your-simpledb-domain-name']
86
+
87
+ now = "%08x" % Time.now.to_i
88
+ domain.items['unique-task-id'].attributes.replace(
89
+ 'timeout'=>now, 'data'=>'{"any":"data"}', 'created_at'=>now,
90
+ :unless=>'timeout')
91
+
92
+
93
+ === Canceling a queued task
94
+
95
+ *Using* *command* *line:*
96
+
97
+ $ perfectqueue ... --cancel unique-key-id
98
+
99
+ *Using* *PerfectQueue* *library:*
100
+
101
+ queue.cancel('unique-key-id')
102
+
103
+
104
+ Alternatively, you can delete a row from the backend database directly.
105
+
106
+ *RDBMS:*
107
+
108
+ > DELETE FROM perfectqueue WHERE id='unique-key-id';
109
+
110
+ *SimpleDB:*
111
+
112
+ domain.items['unique-task-id'].delete
113
+
114
+
115
+ === Running worker node
116
+
117
+ Use _perfectqueue_ command to execute a command.
118
+
119
+ Usage: perfectqueue [options] [-- <ARGV-for-exec-or-run>]
120
+
121
+ --push ID=DATA Push a task to the queue
122
+ --list Show queued tasks
123
+ --cancel ID Cancel a queued task
124
+ --configure PATH.yaml Write configuration file
125
+
126
+ --exec COMMAND Execute command
127
+ --run SCRIPT.rb Run method named 'run' defined in the script
128
+
129
+ -f, --file PATH.yaml Read configuration file
130
+ -C, --run-class Class name for --run (default: ::Run)
131
+ -t, --timeout SEC Time for another worker to take over a task when this worker goes down (default: 600)
132
+ -b, --heartbeat-interval SEC Threshold time to extend the timeout (heartbeat interval) (default: timeout * 3/4)
133
+ -x, --kill-timeout SEC Threshold time to kill a task process (default: timeout * 10)
134
+ -X, --kill-interval SEC Threshold time to retry killing a task process (default: 60)
135
+ -i, --poll-interval SEC Polling interval (default: 1)
136
+ -r, --retry-wait SEC Time to retry a task when it is failed (default: same as timeout)
137
+ -e, --expire SEC Threshold time to expire a task (default: 345600 (4days))
138
+
139
+ --database URI Use RDBMS for the backend database (e.g.: mysql://user:password@localhost/mydb)
140
+ --table NAME backend: name of the table (default: perfectqueue)
141
+ --simpledb DOMAIN Use Amazon SimpleDB for the backend database (e.g.: --simpledb mydomain -k KEY_ID -s SEC_KEY)
142
+ -k, --key-id ID AWS Access Key ID
143
+ -s, --secret-key KEY AWS Secret Access Key
144
+
145
+ -w, --worker NUM Number of worker threads (default: 1)
146
+ -d, --daemon PIDFILE Daemonize (default: foreground)
147
+ -o, --log PATH log file path
148
+ -v, --verbose verbose mode
149
+
150
+
151
+ ==== --exec
152
+
153
+ Execute a command when a task is received. The the data column is passed to the stdin and the id column passed to the last argument. The command have to exit with status code 0 when it succeeded.
154
+
155
+ *Example:*
156
+
157
+ #!/usr/bin/env ruby
158
+
159
+ require 'json'
160
+ js = JSON.load(STDIN.read)
161
+ puts "received: id=#{ARGV.last} #{js.inspect}"
162
+
163
+ #$ perfectqueue --database sqlite://test.db \
164
+ --exec ./this_file -- my cmd args
165
+
166
+ When the kill timeout (-x, --kill-timeout) is elapsed, SIGTERM signal will be sent to the child process. The signal will be repeated every few seconds (-X, --kill-interval).
167
+
168
+
169
+ ==== --run
170
+
171
+ This is same as 'exec' except that it creates a instance of a class named 'Run' defined in the file. The class should has 'initialize(task)', 'run' and 'kill' methods. You can get data column and id column of the task from the argument of the initialize method. It is assumed it succeeded if the method doesn't raise any errors.
172
+
173
+ *Example:*
174
+
175
+ require 'json'
176
+
177
+ class Run
178
+ def initialize(task)
179
+ @task = task
180
+ end
181
+
182
+ def run
183
+ js = JSON.load(@task.data)
184
+ puts "received: id=#{@task.id} #{js.inspect}"
185
+ end
186
+
187
+ def kill
188
+ puts "kill!"
189
+ end
190
+ end
191
+
192
+ #$ perfectqueue --database sqlite://test.db \
193
+ --run ./this_file.rb -- my cmd args
194
+
195
+ When the kill timeout (-x, --klill-timeout) is elapsed, Run#kill method will be called (if it is defined). It will be repeated every few seconds (-X, --kill-retry).
196
+
197
+
198
+ ==== --configure
199
+
200
+ Write configuration file and exit. Written configuration file can be used with -f option:
201
+
202
+ *Example:*
203
+
204
+ ## create myqueue.yaml file
205
+ $ perfectqueue --database mysql://root:my@localhost/mydb \
206
+ --run myrun.rb -- my cmd args \
207
+ --configure myqueue.yaml
208
+
209
+ ## run perfectqueue using the configuration file
210
+ $ perfectqueue -f myqueue.yaml
211
+
212
+
213
+ ==== --list
214
+
215
+ Show queued tasks.
216
+
217
+ *Example:*
218
+
219
+ $ perfectqueue --database sqlite://test.db --list
220
+ id created_at timeout data
221
+ task1 2011-08-23 23:07:45 +0900 2011-08-23 23:07:45 +0900 {"attr1":"val1","attr":"val2"}
222
+ 1 entries.
223
+
@@ -0,0 +1,6 @@
1
+ #!/usr/bin/env ruby
2
+ # -*- coding: utf-8 -*-
3
+ require 'rubygems' unless defined?(gem)
4
+ here = File.dirname(__FILE__)
5
+ $LOAD_PATH << File.expand_path(File.join(here, '..', 'lib'))
6
+ require 'perfectqueue/command/perfectqueue'
@@ -0,0 +1,4 @@
1
+ require 'perfectqueue/engine'
2
+ require 'perfectqueue/worker'
3
+ require 'perfectqueue/backend'
4
+ require 'perfectqueue/version'
@@ -0,0 +1,51 @@
1
+
2
+ module PerfectQueue
3
+
4
+
5
+ class Task
6
+ def initialize(id, created_at, data)
7
+ @id = id
8
+ @created_at = created_at
9
+ @data = data
10
+ end
11
+
12
+ attr_reader :id, :created_at, :data
13
+ end
14
+
15
+
16
+ class CanceledError < RuntimeError
17
+ end
18
+
19
+
20
+ class Backend
21
+ # => list {|id,created_at,data,timeout| ... }
22
+ def list(&block)
23
+ end
24
+
25
+ # => token, task
26
+ def acquire(timeout, now=Time.now.to_i)
27
+ end
28
+
29
+ # => true (success) or false (canceled)
30
+ def finish(token, delete_timeout=3600, now=Time.now.to_i)
31
+ end
32
+
33
+ # => nil
34
+ def update(token, timeout)
35
+ end
36
+
37
+ # => true (success) or false (not found, canceled or finished)
38
+ def cancel(id, delete_timeout=3600, now=Time.now.to_i)
39
+ end
40
+
41
+ # => true (success) or nil (already exists)
42
+ def submit(id, data, time=Time.now.to_i)
43
+ end
44
+
45
+ def close
46
+ end
47
+ end
48
+
49
+
50
+ end
51
+
@@ -0,0 +1,119 @@
1
+
2
+ module PerfectQueue
3
+
4
+
5
+ class RDBBackend < Backend
6
+ def initialize(uri, table)
7
+ require 'sequel'
8
+ @uri = uri
9
+ @table = table
10
+ @db = Sequel.connect(@uri)
11
+ init_db(@uri.split(':',2)[0])
12
+ end
13
+
14
+ private
15
+ def init_db(type)
16
+ sql = ''
17
+ case type
18
+ when /mysql/i
19
+ sql << "CREATE TABLE IF NOT EXISTS `#{@table}` ("
20
+ sql << " id VARCHAR(256) NOT NULL,"
21
+ sql << " timeout INT NOT NULL,"
22
+ sql << " data BLOB NOT NULL,"
23
+ sql << " created_at INT,"
24
+ sql << " PRIMARY KEY (id)"
25
+ sql << ") ENGINE=INNODB;"
26
+ else
27
+ sql << "CREATE TABLE IF NOT EXISTS `#{@table}` ("
28
+ sql << " id VARCHAR(256) NOT NULL,"
29
+ sql << " timeout INT NOT NULL,"
30
+ sql << " data BLOB NOT NULL,"
31
+ sql << " created_at INT,"
32
+ sql << " PRIMARY KEY (id)"
33
+ sql << ");"
34
+ end
35
+ # TODO index
36
+ connect {
37
+ @db.run sql
38
+ }
39
+ end
40
+
41
+ def connect(&block)
42
+ begin
43
+ block.call
44
+ ensure
45
+ @db.disconnect
46
+ end
47
+ end
48
+
49
+ public
50
+ def list(&block)
51
+ @db.fetch("SELECT id, timeout, data, created_at FROM `#{@table}` WHERE created_at IS NOT NULL ORDER BY timeout ASC;") {|row|
52
+ yield row[:id], row[:created_at], row[:data], row[:timeout]
53
+ }
54
+ end
55
+
56
+ MAX_SELECT_ROW = 32
57
+
58
+ def acquire(timeout, now=Time.now.to_i)
59
+ connect {
60
+ while true
61
+ rows = 0
62
+ @db.fetch("SELECT id, timeout, data, created_at FROM `#{@table}` WHERE timeout <= ? ORDER BY timeout ASC LIMIT #{MAX_SELECT_ROW};", now) {|row|
63
+
64
+ unless row[:created_at]
65
+ # finished/canceled task
66
+ @db["DELETE FROM `#{@table}` WHERE id=?;", row[:id]].delete
67
+
68
+ else
69
+ n = @db["UPDATE `#{@table}` SET timeout=? WHERE id=? AND timeout=?;", timeout, row[:id], row[:timeout]].update
70
+ if n > 0
71
+ return row[:id], Task.new(row[:id], row[:created_at], row[:data])
72
+ end
73
+ end
74
+
75
+ rows += 1
76
+ }
77
+ if rows < MAX_SELECT_ROW
78
+ return nil
79
+ end
80
+ end
81
+ }
82
+ end
83
+
84
+ def finish(id, delete_timeout=3600, now=Time.now.to_i)
85
+ connect {
86
+ n = @db["UPDATE `#{@table}` SET timeout=?, created_at=NULL WHERE id=? AND created_at IS NOT NULL;", now+delete_timeout, id].update
87
+ return n > 0
88
+ }
89
+ end
90
+
91
+ def update(id, timeout)
92
+ connect {
93
+ n = @db["UPDATE `#{@table}` SET timeout=? WHERE id=? AND created_at IS NOT NULL;", timeout, id].update
94
+ if n <= 0
95
+ raise CanceledError, "Task id=#{id} is canceled."
96
+ end
97
+ return nil
98
+ }
99
+ end
100
+
101
+ def cancel(id, delete_timeout=3600, now=Time.now.to_i)
102
+ finish(id, delete_timeout, now)
103
+ end
104
+
105
+ def submit(id, data, time=Time.now.to_i)
106
+ connect {
107
+ begin
108
+ n = @db["INSERT INTO `#{@table}` (id, timeout, data, created_at) VALUES (?, ?, ?, ?);", id, time, data, time].insert
109
+ return true
110
+ rescue Sequel::DatabaseError
111
+ return nil
112
+ end
113
+ }
114
+ end
115
+ end
116
+
117
+
118
+ end
119
+
@@ -0,0 +1,136 @@
1
+
2
+ module PerfectQueue
3
+
4
+
5
+ class SimpleDBBackend < Backend
6
+ def initialize(key_id, secret_key, domain)
7
+ gem "aws-sdk"
8
+ require 'aws'
9
+ @consistent_read = false
10
+
11
+ @db = AWS::SimpleDB.new(
12
+ :access_key_id => key_id,
13
+ :secret_access_key => secret_key)
14
+
15
+ @domain_name = domain
16
+ @domain = @db.domains[@domain_name]
17
+ unless @domain.exists?
18
+ @domain = @db.domains.create(@domain_name)
19
+ end
20
+ end
21
+
22
+ attr_accessor :consistent_read
23
+
24
+ def use_consistent_read(b=true)
25
+ @consistent_read = b
26
+ self
27
+ end
28
+
29
+ def list(&block)
30
+ @domain.items.each {|item|
31
+ id = item.name
32
+ attrs = item.data.attributes
33
+ salt = attrs['created_at'].first
34
+ if salt && !salt.empty?
35
+ created_at = int_decode(salt)
36
+ data = attrs['data'].first
37
+ timeout = int_decode(attrs['timeout'].first)
38
+ yield id, created_at, data, timeout
39
+ end
40
+ }
41
+ end
42
+
43
+ MAX_SELECT_ROW = 32
44
+
45
+ def acquire(timeout, now=Time.now.to_i)
46
+ while true
47
+ rows = 0
48
+ @domain.items.select('timeout', 'data', 'created_at',
49
+ :where => "timeout <= '#{int_encode(now)}'",
50
+ :order => [:timeout, :asc],
51
+ :consistent_read => @consistent_read,
52
+ :limit => MAX_SELECT_ROW) {|itemdata|
53
+ begin
54
+ id = itemdata.name
55
+ attrs = itemdata.attributes
56
+ salt = attrs['created_at'].first
57
+
58
+ if !salt || salt.empty?
59
+ # finished/canceled task
60
+ @domain.items[id].delete(:if=>{'created_at'=>''})
61
+
62
+ else
63
+ created_at = int_decode(salt)
64
+ @domain.items[id].attributes.replace('timeout'=>int_encode(timeout),
65
+ :if=>{'timeout'=>attrs['timeout'].first})
66
+
67
+ data = attrs['data'].first
68
+
69
+ return [id,salt], Task.new(id, created_at, data)
70
+ end
71
+
72
+ rescue AWS::SimpleDB::Errors::ConditionalCheckFailed, AWS::SimpleDB::Errors::AttributeDoesNotExist
73
+ end
74
+
75
+ rows += 1
76
+ }
77
+ if rows < MAX_SELECT_ROW
78
+ return nil
79
+ end
80
+ end
81
+ end
82
+
83
+ def finish(token, delete_timeout=3600, now=Time.now.to_i)
84
+ begin
85
+ id, salt = *token
86
+ @domain.items[id].attributes.replace('timeout'=>int_encode(now+delete_timeout), 'created_at'=>'',
87
+ :if=>{'created_at'=>salt})
88
+ return true
89
+ rescue AWS::SimpleDB::Errors::ConditionalCheckFailed, AWS::SimpleDB::Errors::AttributeDoesNotExist
90
+ return false
91
+ end
92
+ end
93
+
94
+ def update(token, timeout)
95
+ begin
96
+ id, salt = *token
97
+ @domain.items[id].attributes.replace('timeout'=>int_encode(timeout),
98
+ :if=>{'created_at'=>salt})
99
+ rescue AWS::SimpleDB::Errors::ConditionalCheckFailed, AWS::SimpleDB::Errors::AttributeDoesNotExist
100
+ raise CanceledError, "Task id=#{id} is canceled."
101
+ end
102
+ nil
103
+ end
104
+
105
+ def cancel(id, delete_timeout=3600, now=Time.now.to_i)
106
+ salt = @domain.items[id].attributes['created_at'].first
107
+ unless salt
108
+ return false
109
+ end
110
+ token = [id,salt]
111
+ finish(token, delete_timeout, now)
112
+ end
113
+
114
+ def submit(id, data, time=Time.now.to_i)
115
+ begin
116
+ @domain.items[id].attributes.replace('timeout'=>int_encode(time), 'created_at'=>int_encode(time), 'data'=>data,
117
+ :unless=>'timeout')
118
+ return true
119
+ rescue AWS::SimpleDB::Errors::ConditionalCheckFailed, AWS::SimpleDB::Errors::ExistsAndExpectedValue
120
+ return nil
121
+ end
122
+ end
123
+
124
+ private
125
+ def int_encode(num)
126
+ "%08x" % num
127
+ end
128
+
129
+ def int_decode(str)
130
+ str.to_i(16)
131
+ end
132
+ end
133
+
134
+
135
+ end
136
+