exel 1.0.0 → 1.0.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.codeclimate.yml +16 -0
- data/.rubocop.yml +4 -0
- data/.rubocop_todo.yml +0 -10
- data/README.md +65 -2
- data/lib/exel/context.rb +11 -4
- data/lib/exel/processors/split_processor.rb +35 -26
- data/lib/exel/providers/threaded_async_provider.rb +1 -1
- data/lib/exel/version.rb +1 -1
- data/spec/exel/context_spec.rb +13 -0
- data/spec/exel/processors/split_processor_spec.rb +4 -6
- data/spec/exel/providers/threaded_async_provider_spec.rb +53 -0
- metadata +5 -4
- data/spec/exel/providers/local_async_provider_spec.rb +0 -19
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 85ab889713fbe1368ad2e1e3186f16fb1fe39ac5
|
4
|
+
data.tar.gz: ed633de60e0cdbf125bfe5a1cfd21d110341c43e
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 3e739773310941eac8d09ac6793bb5d9b296c1cb22b7015f1c467ed4c14cbe12a3d92ca30cdb7fc39d7e676a12af30f2502bdd3a1dc395c2cce67673abbaed20
|
7
|
+
data.tar.gz: 316c744a8f814dec1191bbfd7964a28bb7c2628ae7bb2726ce0884a4d8392b71338364aa14be51a7c6524a9489cd7c1089a29e86801544dfa08e65576f6d8986
|
data/.codeclimate.yml
ADDED
data/.rubocop.yml
CHANGED
data/.rubocop_todo.yml
CHANGED
@@ -6,16 +6,6 @@
|
|
6
6
|
# Note that changes in the inspected code, or installation of new
|
7
7
|
# versions of RuboCop, may require this file to be generated again.
|
8
8
|
|
9
|
-
# Offense count: 1
|
10
|
-
# Configuration parameters: CountComments.
|
11
|
-
Metrics/MethodLength:
|
12
|
-
Max: 11
|
13
|
-
|
14
|
-
# Offense count: 2
|
15
|
-
# Configuration parameters: CountComments.
|
16
|
-
Metrics/ModuleLength:
|
17
|
-
Max: 166
|
18
|
-
|
19
9
|
# Offense count: 65
|
20
10
|
# Configuration parameters: CustomTransform, IgnoredWords.
|
21
11
|
RSpec/ExampleWording:
|
data/README.md
CHANGED
@@ -3,7 +3,15 @@
|
|
3
3
|
[](https://codeclimate.com/github/47colborne/exel)
|
4
4
|
[](https://snap-ci.com/47colborne/exel/branch/master)
|
5
5
|
|
6
|
-
|
6
|
+
EXEL is the Elastic eXEcution Language, a simple Ruby DSL for creating processing jobs that can be run on a single machine, or scaled up to run on dozens of machines with no changes to the job itself. To run a job on more than one machine, simply install EXEL async and remote provider gems to integrate with your preferred platforms. The currently implemented providers so far are:
|
7
|
+
|
8
|
+
**Async Providers**
|
9
|
+
|
10
|
+
* [exel-sidekiq](https://github.com/47colborne/exel-sidekiq)
|
11
|
+
|
12
|
+
**Remote Providers**
|
13
|
+
|
14
|
+
* [exel-s3](https://github.com/47colborne/exel-s3)
|
7
15
|
|
8
16
|
## Installation
|
9
17
|
|
@@ -21,7 +29,62 @@ Or install it yourself as:
|
|
21
29
|
|
22
30
|
## Usage
|
23
31
|
|
24
|
-
|
32
|
+
### Processors
|
33
|
+
|
34
|
+
A processor can be any class that provides the following interface:
|
35
|
+
|
36
|
+
class MyProcessor
|
37
|
+
def initialize(context)
|
38
|
+
# typically context is assigned to @context here
|
39
|
+
end
|
40
|
+
|
41
|
+
def process(block)
|
42
|
+
# do your work here
|
43
|
+
end
|
44
|
+
end
|
45
|
+
|
46
|
+
Processors are initialized immediately before ```#process``` is called, allowing them to set up any state that they need from the context. The ```#process``` method is where your processing logic will be implemented. Processors should be focused on performing one particular aspect of the processing that you want to accomplish, allowing your job to be composed of a sequence of small processing steps. If a block was given in the call to ```process``` in the job DSL, it will be passed as the argument to ```#process``` and can be run with: ```block.run(@context)```
|
47
|
+
|
48
|
+
### The Context
|
49
|
+
|
50
|
+
The ```Context``` class has a Hash-like interface and acts as shared storage for the various processors that make up a job. Processors take their expected inputs from the context, and place any resulting outputs there for subsequent processors to access. Values are typically placed in the context through the following means:
|
51
|
+
|
52
|
+
* Initial context set up before the job is run
|
53
|
+
* Arguments passed to processors in the job DSL
|
54
|
+
* Outputs assigned by processors during processing
|
55
|
+
|
56
|
+
If you use EXEL with an async provider, such as [exel-sidekiq](https://github.com/47colborne/exel-sidekiq), and a remote provider, such as [exel-s3](https://github.com/47colborne/exel-s3), a context switch will occur when the ```async``` command is executed. Context shifts involve serializing the context and uploading it via the remote provider, then downloading and deserializing it when the async block is eventually run. This allows the processors to pass the results of their process through the sequence of processors in the job, without having to be concerned with when, where, or how those processors will be run.
|
57
|
+
|
58
|
+
### Supported Commands
|
59
|
+
|
60
|
+
* ```process``` Execute the given processor class (specified by the ```:with``` option), given the current context and any additional arguments provided
|
61
|
+
* ```split``` Split the input data into 1000 line chunks and run the given block for each chunk. Assumes that the input data is a CSV formatted file referenced by ```context[:resource]```. When each block is run, ```context[:resource]``` will reference to the chunk file.
|
62
|
+
* ```async``` Asynchronously run the given block. Uses the configured async provider to execute the block.
|
63
|
+
|
64
|
+
### Example job
|
65
|
+
|
66
|
+
EXEL::Job.define :example_job do
|
67
|
+
# Download a large CSV data file
|
68
|
+
process with: FTPDownloader, host: ftp.example.com, path: context[:file_path]
|
69
|
+
|
70
|
+
# split it into smaller 1000 line files
|
71
|
+
split do
|
72
|
+
# for each file asynchronously run the following sequence of processors
|
73
|
+
async do
|
74
|
+
process with: RecordLoader # convert each row of data into your domain model
|
75
|
+
process with: SomeProcessor # apply some additional processing to each record
|
76
|
+
process with: RecordSaver # write this batch of records to your database
|
77
|
+
process with: ExternalServiceProcessor # interact with some service, ex: updating a search index
|
78
|
+
end
|
79
|
+
end
|
80
|
+
end
|
81
|
+
|
82
|
+
Elsewhere in your application, you could run this job as follows:
|
83
|
+
|
84
|
+
def run_example_job(file_path)
|
85
|
+
context = EXEL::Context.new(file_path: file_path, user: 'username')
|
86
|
+
EXEL::Job.run(:example_job, context)
|
87
|
+
end
|
25
88
|
|
26
89
|
## Contributing
|
27
90
|
|
data/lib/exel/context.rb
CHANGED
@@ -8,9 +8,12 @@ module EXEL
|
|
8
8
|
@table = initial_context
|
9
9
|
end
|
10
10
|
|
11
|
+
def deep_dup
|
12
|
+
Context.deserialize(serialize)
|
13
|
+
end
|
14
|
+
|
11
15
|
def serialize
|
12
|
-
|
13
|
-
EXEL::Value.remotize(serialized_context(remotized_table))
|
16
|
+
EXEL::Value.remotize(serialized_context)
|
14
17
|
end
|
15
18
|
|
16
19
|
def self.deserialize(uri)
|
@@ -50,13 +53,17 @@ module EXEL
|
|
50
53
|
|
51
54
|
private
|
52
55
|
|
53
|
-
def serialized_context
|
56
|
+
def serialized_context
|
54
57
|
file = Tempfile.new(SecureRandom.uuid, encoding: 'ascii-8bit')
|
55
|
-
file.write(Marshal.dump(Context.new(
|
58
|
+
file.write(Marshal.dump(Context.new(remotized_table)))
|
56
59
|
file.rewind
|
57
60
|
file
|
58
61
|
end
|
59
62
|
|
63
|
+
def remotized_table
|
64
|
+
@table.each_with_object({}) { |(key, value), acc| acc[key] = EXEL::Value.remotize(value) }
|
65
|
+
end
|
66
|
+
|
60
67
|
def get_deferred(value)
|
61
68
|
if deferred?(value)
|
62
69
|
value = value.get(self)
|
@@ -7,44 +7,33 @@ module EXEL
|
|
7
7
|
class SplitProcessor
|
8
8
|
include EXEL::ProcessorHelper
|
9
9
|
|
10
|
-
attr_accessor :
|
10
|
+
attr_accessor :file_name, :block
|
11
11
|
|
12
12
|
DEFAULT_CHUNK_SIZE = 1000
|
13
13
|
|
14
14
|
def initialize(context)
|
15
|
-
@chunk_size = DEFAULT_CHUNK_SIZE
|
16
15
|
@buffer = []
|
17
16
|
@tempfile_count = 0
|
18
17
|
@context = context
|
19
|
-
|
20
18
|
@file = context[:resource]
|
21
|
-
@file_name = filename(@file)
|
22
|
-
@csv_options = context[:csv_options] || {col_sep: ','}
|
23
19
|
|
24
20
|
log_prefix_with '[SplitProcessor]'
|
25
21
|
end
|
26
22
|
|
27
23
|
def process(callback)
|
28
24
|
log_process do
|
29
|
-
|
30
|
-
|
31
|
-
process_line(line, callback)
|
32
|
-
end
|
33
|
-
rescue CSV::MalformedCSVError => e
|
34
|
-
log_error "CSV::MalformedCSVError => #{e.message}"
|
35
|
-
end
|
36
|
-
process_line(:eof, callback)
|
37
|
-
File.delete(@file.path)
|
25
|
+
process_file(callback)
|
26
|
+
finish(callback)
|
38
27
|
end
|
39
28
|
end
|
40
29
|
|
41
30
|
def process_line(line, callback)
|
42
31
|
if line == :eof
|
43
|
-
flush_buffer
|
32
|
+
flush_buffer(callback)
|
44
33
|
else
|
45
34
|
@buffer << CSV.generate_line(line)
|
46
35
|
|
47
|
-
flush_buffer
|
36
|
+
flush_buffer(callback) if buffer_full?
|
48
37
|
end
|
49
38
|
end
|
50
39
|
|
@@ -54,31 +43,51 @@ module EXEL
|
|
54
43
|
chunk.write(content)
|
55
44
|
chunk.rewind
|
56
45
|
|
57
|
-
log_info "Generated chunk # #{@tempfile_count} for file #{@
|
46
|
+
log_info "Generated chunk # #{@tempfile_count} for file #{filename(@file)} in #{chunk.path}"
|
58
47
|
chunk
|
59
48
|
end
|
60
49
|
|
61
|
-
|
62
|
-
"#{@file_name}_#{@tempfile_count}_"
|
63
|
-
end
|
50
|
+
private
|
64
51
|
|
65
|
-
def
|
66
|
-
|
67
|
-
file_name_with_extension.split('.').first
|
68
|
-
end
|
52
|
+
def process_file(callback)
|
53
|
+
csv_options = @context[:csv_options] || {col_sep: ','}
|
69
54
|
|
70
|
-
|
55
|
+
CSV.foreach(@file.path, csv_options) do |line|
|
56
|
+
process_line(line, callback)
|
57
|
+
end
|
58
|
+
rescue CSV::MalformedCSVError => e
|
59
|
+
log_error "CSV::MalformedCSVError => #{e.message}"
|
60
|
+
end
|
71
61
|
|
72
62
|
def flush_buffer(callback)
|
73
63
|
unless @buffer.empty?
|
74
64
|
file = generate_chunk(@buffer.join(''))
|
75
65
|
callback.run(@context.merge!(resource: file))
|
76
66
|
end
|
67
|
+
|
77
68
|
@buffer = []
|
78
69
|
end
|
79
70
|
|
80
71
|
def buffer_full?
|
81
|
-
@buffer.size ==
|
72
|
+
@buffer.size == chunk_size
|
73
|
+
end
|
74
|
+
|
75
|
+
def chunk_size
|
76
|
+
DEFAULT_CHUNK_SIZE
|
77
|
+
end
|
78
|
+
|
79
|
+
def chunk_filename
|
80
|
+
"#{filename(@file)}_#{@tempfile_count}_"
|
81
|
+
end
|
82
|
+
|
83
|
+
def filename(file)
|
84
|
+
file_name_with_extension = file.path.split('/').last
|
85
|
+
file_name_with_extension.split('.').first
|
86
|
+
end
|
87
|
+
|
88
|
+
def finish(callback)
|
89
|
+
process_line(:eof, callback)
|
90
|
+
File.delete(@file.path)
|
82
91
|
end
|
83
92
|
end
|
84
93
|
end
|
data/lib/exel/version.rb
CHANGED
data/spec/exel/context_spec.rb
CHANGED
@@ -9,6 +9,19 @@ module EXEL
|
|
9
9
|
end
|
10
10
|
end
|
11
11
|
|
12
|
+
describe '#deep_dup' do
|
13
|
+
it 'returns a deep copy of itself' do
|
14
|
+
context[:a] = {nested: []}
|
15
|
+
|
16
|
+
dup = context.deep_dup
|
17
|
+
expect(context).to eq(dup)
|
18
|
+
expect(context).to_not be_equal(dup)
|
19
|
+
|
20
|
+
dup[:a][:nested] << 1
|
21
|
+
expect(context[:a][:nested]).to be_empty
|
22
|
+
end
|
23
|
+
end
|
24
|
+
|
12
25
|
describe '#serialize' do
|
13
26
|
before { allow(Value).to receive(:upload) }
|
14
27
|
|
@@ -43,7 +43,7 @@ module EXEL
|
|
43
43
|
{input: 4, chunks: %W(0\n1\n 2\n3\n)}
|
44
44
|
].each do |data|
|
45
45
|
it "should produce #{data[:chunks].size} chunks with #{data[:input]} input lines" do
|
46
|
-
splitter.chunk_size
|
46
|
+
allow(splitter).to receive(:chunk_size).and_return(2)
|
47
47
|
|
48
48
|
data[:chunks].each do |chunk|
|
49
49
|
expect(splitter).to receive(:generate_chunk).with(chunk).and_return(chunk_file)
|
@@ -67,10 +67,8 @@ module EXEL
|
|
67
67
|
|
68
68
|
it 'should create a file with a unique name' do
|
69
69
|
3.times do |i|
|
70
|
-
|
71
|
-
file
|
72
|
-
file_name = splitter.filename(file)
|
73
|
-
expect(file_name).to include("text_#{index}_")
|
70
|
+
file = splitter.generate_chunk('content')
|
71
|
+
expect(file.path).to include("text_#{i + 1}_")
|
74
72
|
end
|
75
73
|
end
|
76
74
|
end
|
@@ -83,7 +81,7 @@ module EXEL
|
|
83
81
|
content << line
|
84
82
|
end
|
85
83
|
|
86
|
-
StringIO.new
|
84
|
+
StringIO.new(content)
|
87
85
|
end
|
88
86
|
end
|
89
87
|
end
|
@@ -0,0 +1,53 @@
|
|
1
|
+
module EXEL
|
2
|
+
module Providers
|
3
|
+
class ContextMutatingProcessor
|
4
|
+
def initialize(context)
|
5
|
+
@context = context
|
6
|
+
end
|
7
|
+
|
8
|
+
def process(_block)
|
9
|
+
@context[:array] << @context[:arg]
|
10
|
+
end
|
11
|
+
end
|
12
|
+
|
13
|
+
describe ThreadedAsyncProvider do
|
14
|
+
subject { described_class.new(context) }
|
15
|
+
let(:context) { EXEL::Context.new }
|
16
|
+
|
17
|
+
describe '#do_async' do
|
18
|
+
let(:dsl_block) { instance_double(ASTNode) }
|
19
|
+
|
20
|
+
it 'runs the block in a new thread' do
|
21
|
+
expect(dsl_block).to receive(:start).with(context)
|
22
|
+
expect(Thread).to receive(:new).and_yield
|
23
|
+
|
24
|
+
subject.do_async(dsl_block)
|
25
|
+
end
|
26
|
+
|
27
|
+
it 'passes a copy of the context to each thread' do
|
28
|
+
context[:array] = []
|
29
|
+
complete = 0
|
30
|
+
|
31
|
+
EXEL::Job.define :thread_test do
|
32
|
+
async do
|
33
|
+
process with: ContextMutatingProcessor, arg: 1
|
34
|
+
complete += 1
|
35
|
+
end
|
36
|
+
|
37
|
+
async do
|
38
|
+
process with: ContextMutatingProcessor, arg: 2
|
39
|
+
complete += 1
|
40
|
+
end
|
41
|
+
end
|
42
|
+
|
43
|
+
EXEL::Job.run(:thread_test, context)
|
44
|
+
|
45
|
+
start_time = Time.now
|
46
|
+
sleep 0.1 while complete < 2 && Time.now - start_time < 2
|
47
|
+
|
48
|
+
expect(context[:array]).to be_empty
|
49
|
+
end
|
50
|
+
end
|
51
|
+
end
|
52
|
+
end
|
53
|
+
end
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: exel
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 1.0.
|
4
|
+
version: 1.0.1
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- yroo
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date:
|
11
|
+
date: 2016-01-08 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: bundler
|
@@ -171,6 +171,7 @@ executables: []
|
|
171
171
|
extensions: []
|
172
172
|
extra_rdoc_files: []
|
173
173
|
files:
|
174
|
+
- ".codeclimate.yml"
|
174
175
|
- ".gitignore"
|
175
176
|
- ".rspec"
|
176
177
|
- ".rubocop.yml"
|
@@ -209,8 +210,8 @@ files:
|
|
209
210
|
- spec/exel/null_instruction_spec.rb
|
210
211
|
- spec/exel/processors/async_processor_spec.rb
|
211
212
|
- spec/exel/processors/split_processor_spec.rb
|
212
|
-
- spec/exel/providers/local_async_provider_spec.rb
|
213
213
|
- spec/exel/providers/local_file_provider_spec.rb
|
214
|
+
- spec/exel/providers/threaded_async_provider_spec.rb
|
214
215
|
- spec/exel/sequence_node_spec.rb
|
215
216
|
- spec/exel/value_spec.rb
|
216
217
|
- spec/exel_spec.rb
|
@@ -250,8 +251,8 @@ test_files:
|
|
250
251
|
- spec/exel/null_instruction_spec.rb
|
251
252
|
- spec/exel/processors/async_processor_spec.rb
|
252
253
|
- spec/exel/processors/split_processor_spec.rb
|
253
|
-
- spec/exel/providers/local_async_provider_spec.rb
|
254
254
|
- spec/exel/providers/local_file_provider_spec.rb
|
255
|
+
- spec/exel/providers/threaded_async_provider_spec.rb
|
255
256
|
- spec/exel/sequence_node_spec.rb
|
256
257
|
- spec/exel/value_spec.rb
|
257
258
|
- spec/exel_spec.rb
|
@@ -1,19 +0,0 @@
|
|
1
|
-
module EXEL
|
2
|
-
module Providers
|
3
|
-
describe ThreadedAsyncProvider do
|
4
|
-
subject { described_class.new(context) }
|
5
|
-
let(:context) { EXEL::Context.new }
|
6
|
-
|
7
|
-
describe '#do_async' do
|
8
|
-
let(:dsl_block) { instance_double(ASTNode) }
|
9
|
-
|
10
|
-
it 'runs the block in a new thread' do
|
11
|
-
expect(dsl_block).to receive(:start).with(context)
|
12
|
-
expect(Thread).to receive(:new).and_yield
|
13
|
-
|
14
|
-
subject.do_async(dsl_block)
|
15
|
-
end
|
16
|
-
end
|
17
|
-
end
|
18
|
-
end
|
19
|
-
end
|