exel 1.0.0 → 1.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 07a61d1ecdfd5d0d957caa0dd064631be8c3e546
4
- data.tar.gz: 364c194ce4cb48938f1cc863b6bdf0a5b7e74e47
3
+ metadata.gz: 85ab889713fbe1368ad2e1e3186f16fb1fe39ac5
4
+ data.tar.gz: ed633de60e0cdbf125bfe5a1cfd21d110341c43e
5
5
  SHA512:
6
- metadata.gz: d004472bd8b4b3c6e496377c89cb261abcaa59fad5284009ecca65bee86dff428545af4ab27cd47ca74becbcdc9bb0a3d81962a67cf81f6437b5195a990cdf5f
7
- data.tar.gz: debffc94d7f7fd53064f2b8f5317a68f3cbcaa4e961d19c6c2af630a31c39d79b4b42c111947e361a3f5a838853ff29d4a9814fb0892609da994f4c433517b85
6
+ metadata.gz: 3e739773310941eac8d09ac6793bb5d9b296c1cb22b7015f1c467ed4c14cbe12a3d92ca30cdb7fc39d7e676a12af30f2502bdd3a1dc395c2cce67673abbaed20
7
+ data.tar.gz: 316c744a8f814dec1191bbfd7964a28bb7c2628ae7bb2726ce0884a4d8392b71338364aa14be51a7c6524a9489cd7c1089a29e86801544dfa08e65576f6d8986
data/.codeclimate.yml ADDED
@@ -0,0 +1,16 @@
1
+ ---
2
+ engines:
3
+ duplication:
4
+ enabled: true
5
+ config:
6
+ languages:
7
+ - ruby
8
+ fixme:
9
+ enabled: true
10
+ rubocop:
11
+ enabled: true
12
+ ratings:
13
+ paths:
14
+ - "**.rb"
15
+ exclude_paths:
16
+ - spec/**/*
data/.rubocop.yml CHANGED
@@ -15,3 +15,7 @@ Style/SpaceInsideHashLiteralBraces:
15
15
 
16
16
  RSpec/DescribedClass:
17
17
  Enabled: false
18
+
19
+ Metrics/ModuleLength:
20
+ Exclude:
21
+ - 'spec/**/*'
data/.rubocop_todo.yml CHANGED
@@ -6,16 +6,6 @@
6
6
  # Note that changes in the inspected code, or installation of new
7
7
  # versions of RuboCop, may require this file to be generated again.
8
8
 
9
- # Offense count: 1
10
- # Configuration parameters: CountComments.
11
- Metrics/MethodLength:
12
- Max: 11
13
-
14
- # Offense count: 2
15
- # Configuration parameters: CountComments.
16
- Metrics/ModuleLength:
17
- Max: 166
18
-
19
9
  # Offense count: 65
20
10
  # Configuration parameters: CustomTransform, IgnoredWords.
21
11
  RSpec/ExampleWording:
data/README.md CHANGED
@@ -3,7 +3,15 @@
3
3
  [![Code Climate](https://codeclimate.com/github/47colborne/exel/badges/gpa.svg)](https://codeclimate.com/github/47colborne/exel)
4
4
  [![Build Status](https://snap-ci.com/47colborne/exel/branch/master/build_image)](https://snap-ci.com/47colborne/exel/branch/master)
5
5
 
6
- TODO: Write a gem description
6
+ EXEL is the Elastic eXEcution Language, a simple Ruby DSL for creating processing jobs that can be run on a single machine, or scaled up to run on dozens of machines with no changes to the job itself. To run a job on more than one machine, simply install EXEL async and remote provider gems to integrate with your preferred platforms. The currently implemented providers so far are:
7
+
8
+ **Async Providers**
9
+
10
+ * [exel-sidekiq](https://github.com/47colborne/exel-sidekiq)
11
+
12
+ **Remote Providers**
13
+
14
+ * [exel-s3](https://github.com/47colborne/exel-s3)
7
15
 
8
16
  ## Installation
9
17
 
@@ -21,7 +29,62 @@ Or install it yourself as:
21
29
 
22
30
  ## Usage
23
31
 
24
- TODO: Write usage instructions here
32
+ ### Processors
33
+
34
+ A processor can be any class that provides the following interface:
35
+
36
+ class MyProcessor
37
+ def initialize(context)
38
+ # typically context is assigned to @context here
39
+ end
40
+
41
+ def process(block)
42
+ # do your work here
43
+ end
44
+ end
45
+
46
+ Processors are initialized immediately before ```#process``` is called, allowing them to set up any state that they need from the context. The ```#process``` method is where your processing logic will be implemented. Processors should be focused on performing one particular aspect of the processing that you want to accomplish, allowing your job to be composed of a sequence of small processing steps. If a block was given in the call to ```process``` in the job DSL, it will be passed as the argument to ```#process``` and can be run with: ```block.run(@context)```
47
+
48
+ ### The Context
49
+
50
+ The ```Context``` class has a Hash-like interface and acts as shared storage for the various processors that make up a job. Processors take their expected inputs from the context, and place any resulting outputs there for subsequent processors to access. Values are typically placed in the context through the following means:
51
+
52
+ * Initial context set up before the job is run
53
+ * Arguments passed to processors in the job DSL
54
+ * Outputs assigned by processors during processing
55
+
56
+ If you use EXEL with an async provider, such as [exel-sidekiq](https://github.com/47colborne/exel-sidekiq), and a remote provider, such as [exel-s3](https://github.com/47colborne/exel-s3), a context switch will occur when the ```async``` command is executed. Context shifts involve serializing the context and uploading it via the remote provider, then downloading and deserializing it when the async block is eventually run. This allows the processors to pass the results of their process through the sequence of processors in the job, without having to be concerned with when, where, or how those processors will be run.
57
+
58
+ ### Supported Commands
59
+
60
+ * ```process``` Execute the given processor class (specified by the ```:with``` option), given the current context and any additional arguments provided
61
+ * ```split``` Split the input data into 1000 line chunks and run the given block for each chunk. Assumes that the input data is a CSV formatted file referenced by ```context[:resource]```. When each block is run, ```context[:resource]``` will reference to the chunk file.
62
+ * ```async``` Asynchronously run the given block. Uses the configured async provider to execute the block.
63
+
64
+ ### Example job
65
+
66
+ EXEL::Job.define :example_job do
67
+ # Download a large CSV data file
68
+ process with: FTPDownloader, host: ftp.example.com, path: context[:file_path]
69
+
70
+ # split it into smaller 1000 line files
71
+ split do
72
+ # for each file asynchronously run the following sequence of processors
73
+ async do
74
+ process with: RecordLoader # convert each row of data into your domain model
75
+ process with: SomeProcessor # apply some additional processing to each record
76
+ process with: RecordSaver # write this batch of records to your database
77
+ process with: ExternalServiceProcessor # interact with some service, ex: updating a search index
78
+ end
79
+ end
80
+ end
81
+
82
+ Elsewhere in your application, you could run this job as follows:
83
+
84
+ def run_example_job(file_path)
85
+ context = EXEL::Context.new(file_path: file_path, user: 'username')
86
+ EXEL::Job.run(:example_job, context)
87
+ end
25
88
 
26
89
  ## Contributing
27
90
 
data/lib/exel/context.rb CHANGED
@@ -8,9 +8,12 @@ module EXEL
8
8
  @table = initial_context
9
9
  end
10
10
 
11
+ def deep_dup
12
+ Context.deserialize(serialize)
13
+ end
14
+
11
15
  def serialize
12
- remotized_table = @table.each_with_object({}) { |(key, value), acc| acc[key] = EXEL::Value.remotize(value) }
13
- EXEL::Value.remotize(serialized_context(remotized_table))
16
+ EXEL::Value.remotize(serialized_context)
14
17
  end
15
18
 
16
19
  def self.deserialize(uri)
@@ -50,13 +53,17 @@ module EXEL
50
53
 
51
54
  private
52
55
 
53
- def serialized_context(table)
56
+ def serialized_context
54
57
  file = Tempfile.new(SecureRandom.uuid, encoding: 'ascii-8bit')
55
- file.write(Marshal.dump(Context.new(table)))
58
+ file.write(Marshal.dump(Context.new(remotized_table)))
56
59
  file.rewind
57
60
  file
58
61
  end
59
62
 
63
+ def remotized_table
64
+ @table.each_with_object({}) { |(key, value), acc| acc[key] = EXEL::Value.remotize(value) }
65
+ end
66
+
60
67
  def get_deferred(value)
61
68
  if deferred?(value)
62
69
  value = value.get(self)
@@ -7,44 +7,33 @@ module EXEL
7
7
  class SplitProcessor
8
8
  include EXEL::ProcessorHelper
9
9
 
10
- attr_accessor :chunk_size, :file_name, :block
10
+ attr_accessor :file_name, :block
11
11
 
12
12
  DEFAULT_CHUNK_SIZE = 1000
13
13
 
14
14
  def initialize(context)
15
- @chunk_size = DEFAULT_CHUNK_SIZE
16
15
  @buffer = []
17
16
  @tempfile_count = 0
18
17
  @context = context
19
-
20
18
  @file = context[:resource]
21
- @file_name = filename(@file)
22
- @csv_options = context[:csv_options] || {col_sep: ','}
23
19
 
24
20
  log_prefix_with '[SplitProcessor]'
25
21
  end
26
22
 
27
23
  def process(callback)
28
24
  log_process do
29
- begin
30
- CSV.foreach(@file.path, @csv_options) do |line|
31
- process_line(line, callback)
32
- end
33
- rescue CSV::MalformedCSVError => e
34
- log_error "CSV::MalformedCSVError => #{e.message}"
35
- end
36
- process_line(:eof, callback)
37
- File.delete(@file.path)
25
+ process_file(callback)
26
+ finish(callback)
38
27
  end
39
28
  end
40
29
 
41
30
  def process_line(line, callback)
42
31
  if line == :eof
43
- flush_buffer callback
32
+ flush_buffer(callback)
44
33
  else
45
34
  @buffer << CSV.generate_line(line)
46
35
 
47
- flush_buffer callback if buffer_full?
36
+ flush_buffer(callback) if buffer_full?
48
37
  end
49
38
  end
50
39
 
@@ -54,31 +43,51 @@ module EXEL
54
43
  chunk.write(content)
55
44
  chunk.rewind
56
45
 
57
- log_info "Generated chunk # #{@tempfile_count} for file #{@file_name} in #{chunk.path}"
46
+ log_info "Generated chunk # #{@tempfile_count} for file #{filename(@file)} in #{chunk.path}"
58
47
  chunk
59
48
  end
60
49
 
61
- def chunk_filename
62
- "#{@file_name}_#{@tempfile_count}_"
63
- end
50
+ private
64
51
 
65
- def filename(file)
66
- file_name_with_extension = file.path.split('/').last
67
- file_name_with_extension.split('.').first
68
- end
52
+ def process_file(callback)
53
+ csv_options = @context[:csv_options] || {col_sep: ','}
69
54
 
70
- private
55
+ CSV.foreach(@file.path, csv_options) do |line|
56
+ process_line(line, callback)
57
+ end
58
+ rescue CSV::MalformedCSVError => e
59
+ log_error "CSV::MalformedCSVError => #{e.message}"
60
+ end
71
61
 
72
62
  def flush_buffer(callback)
73
63
  unless @buffer.empty?
74
64
  file = generate_chunk(@buffer.join(''))
75
65
  callback.run(@context.merge!(resource: file))
76
66
  end
67
+
77
68
  @buffer = []
78
69
  end
79
70
 
80
71
  def buffer_full?
81
- @buffer.size == @chunk_size
72
+ @buffer.size == chunk_size
73
+ end
74
+
75
+ def chunk_size
76
+ DEFAULT_CHUNK_SIZE
77
+ end
78
+
79
+ def chunk_filename
80
+ "#{filename(@file)}_#{@tempfile_count}_"
81
+ end
82
+
83
+ def filename(file)
84
+ file_name_with_extension = file.path.split('/').last
85
+ file_name_with_extension.split('.').first
86
+ end
87
+
88
+ def finish(callback)
89
+ process_line(:eof, callback)
90
+ File.delete(@file.path)
82
91
  end
83
92
  end
84
93
  end
@@ -6,7 +6,7 @@ module EXEL
6
6
  end
7
7
 
8
8
  def do_async(block)
9
- Thread.new { block.start(@context) }
9
+ Thread.new { block.start(@context.deep_dup) }
10
10
  end
11
11
  end
12
12
  end
data/lib/exel/version.rb CHANGED
@@ -1,3 +1,3 @@
1
1
  module EXEL
2
- VERSION = '1.0.0'
2
+ VERSION = '1.0.1'
3
3
  end
@@ -9,6 +9,19 @@ module EXEL
9
9
  end
10
10
  end
11
11
 
12
+ describe '#deep_dup' do
13
+ it 'returns a deep copy of itself' do
14
+ context[:a] = {nested: []}
15
+
16
+ dup = context.deep_dup
17
+ expect(context).to eq(dup)
18
+ expect(context).to_not be_equal(dup)
19
+
20
+ dup[:a][:nested] << 1
21
+ expect(context[:a][:nested]).to be_empty
22
+ end
23
+ end
24
+
12
25
  describe '#serialize' do
13
26
  before { allow(Value).to receive(:upload) }
14
27
 
@@ -43,7 +43,7 @@ module EXEL
43
43
  {input: 4, chunks: %W(0\n1\n 2\n3\n)}
44
44
  ].each do |data|
45
45
  it "should produce #{data[:chunks].size} chunks with #{data[:input]} input lines" do
46
- splitter.chunk_size = 2
46
+ allow(splitter).to receive(:chunk_size).and_return(2)
47
47
 
48
48
  data[:chunks].each do |chunk|
49
49
  expect(splitter).to receive(:generate_chunk).with(chunk).and_return(chunk_file)
@@ -67,10 +67,8 @@ module EXEL
67
67
 
68
68
  it 'should create a file with a unique name' do
69
69
  3.times do |i|
70
- index = i + 1
71
- file = splitter.generate_chunk("#{index}")
72
- file_name = splitter.filename(file)
73
- expect(file_name).to include("text_#{index}_")
70
+ file = splitter.generate_chunk('content')
71
+ expect(file.path).to include("text_#{i + 1}_")
74
72
  end
75
73
  end
76
74
  end
@@ -83,7 +81,7 @@ module EXEL
83
81
  content << line
84
82
  end
85
83
 
86
- StringIO.new content
84
+ StringIO.new(content)
87
85
  end
88
86
  end
89
87
  end
@@ -0,0 +1,53 @@
1
+ module EXEL
2
+ module Providers
3
+ class ContextMutatingProcessor
4
+ def initialize(context)
5
+ @context = context
6
+ end
7
+
8
+ def process(_block)
9
+ @context[:array] << @context[:arg]
10
+ end
11
+ end
12
+
13
+ describe ThreadedAsyncProvider do
14
+ subject { described_class.new(context) }
15
+ let(:context) { EXEL::Context.new }
16
+
17
+ describe '#do_async' do
18
+ let(:dsl_block) { instance_double(ASTNode) }
19
+
20
+ it 'runs the block in a new thread' do
21
+ expect(dsl_block).to receive(:start).with(context)
22
+ expect(Thread).to receive(:new).and_yield
23
+
24
+ subject.do_async(dsl_block)
25
+ end
26
+
27
+ it 'passes a copy of the context to each thread' do
28
+ context[:array] = []
29
+ complete = 0
30
+
31
+ EXEL::Job.define :thread_test do
32
+ async do
33
+ process with: ContextMutatingProcessor, arg: 1
34
+ complete += 1
35
+ end
36
+
37
+ async do
38
+ process with: ContextMutatingProcessor, arg: 2
39
+ complete += 1
40
+ end
41
+ end
42
+
43
+ EXEL::Job.run(:thread_test, context)
44
+
45
+ start_time = Time.now
46
+ sleep 0.1 while complete < 2 && Time.now - start_time < 2
47
+
48
+ expect(context[:array]).to be_empty
49
+ end
50
+ end
51
+ end
52
+ end
53
+ end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: exel
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.0.0
4
+ version: 1.0.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - yroo
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2015-12-16 00:00:00.000000000 Z
11
+ date: 2016-01-08 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: bundler
@@ -171,6 +171,7 @@ executables: []
171
171
  extensions: []
172
172
  extra_rdoc_files: []
173
173
  files:
174
+ - ".codeclimate.yml"
174
175
  - ".gitignore"
175
176
  - ".rspec"
176
177
  - ".rubocop.yml"
@@ -209,8 +210,8 @@ files:
209
210
  - spec/exel/null_instruction_spec.rb
210
211
  - spec/exel/processors/async_processor_spec.rb
211
212
  - spec/exel/processors/split_processor_spec.rb
212
- - spec/exel/providers/local_async_provider_spec.rb
213
213
  - spec/exel/providers/local_file_provider_spec.rb
214
+ - spec/exel/providers/threaded_async_provider_spec.rb
214
215
  - spec/exel/sequence_node_spec.rb
215
216
  - spec/exel/value_spec.rb
216
217
  - spec/exel_spec.rb
@@ -250,8 +251,8 @@ test_files:
250
251
  - spec/exel/null_instruction_spec.rb
251
252
  - spec/exel/processors/async_processor_spec.rb
252
253
  - spec/exel/processors/split_processor_spec.rb
253
- - spec/exel/providers/local_async_provider_spec.rb
254
254
  - spec/exel/providers/local_file_provider_spec.rb
255
+ - spec/exel/providers/threaded_async_provider_spec.rb
255
256
  - spec/exel/sequence_node_spec.rb
256
257
  - spec/exel/value_spec.rb
257
258
  - spec/exel_spec.rb
@@ -1,19 +0,0 @@
1
- module EXEL
2
- module Providers
3
- describe ThreadedAsyncProvider do
4
- subject { described_class.new(context) }
5
- let(:context) { EXEL::Context.new }
6
-
7
- describe '#do_async' do
8
- let(:dsl_block) { instance_double(ASTNode) }
9
-
10
- it 'runs the block in a new thread' do
11
- expect(dsl_block).to receive(:start).with(context)
12
- expect(Thread).to receive(:new).and_yield
13
-
14
- subject.do_async(dsl_block)
15
- end
16
- end
17
- end
18
- end
19
- end