burner 1.1.0 → 1.4.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: f1dd679013fe833b143b12340e68d5598ce4a3d53a6d5fcc6dc8c9df02bc7348
4
- data.tar.gz: e347add004c9846a9d7e083d4d7f35b6aa664f0eb340b3bc0eceffcd3c0875e5
3
+ metadata.gz: 708bae8cd70ee165a49a692ede631830ba30e378d002c6dd1b986262def154f9
4
+ data.tar.gz: bb11da5222c638d97cf33e39633c4f7fca7c6e0a6ccf2cf75c24844d3b3d6159
5
5
  SHA512:
6
- metadata.gz: a92860ad5611e298eb1f2d1acaca9271324402ee375a2d842ac32a41f89c23a12abd8913596ee2fbe7fd59e071fd1b7ab77bbfc5317c57d2ca342dc75f6091a2
7
- data.tar.gz: cb529e0a81f7a3bef34e9966463ca8f71b2025101ca4f6a6bd0cef03396fcb2a6ceb6bad7681079e4edc0630f21e5d6c19f7b6489fb2d198b716a93dfd5c3938
6
+ metadata.gz: 4b15fff4b6c137d9d2d004181da58d8f4a2864605b3b286685449f8b8dd63bde6b89335cd4a619dc15bc57b261772f90c0eacd7679e13a16ed5897da75d66594
7
+ data.tar.gz: 2fe52efc1b645028b97332b0d91f919408fccab68a1474e6406ee0d1ac324d1b12d3cd518ed1817203307677611bcb76e13c00423c417faa86b2c3e3754f68a7
@@ -1,3 +1,28 @@
1
+ # 1.4.0 (December 17th, 2020)
2
+
3
+ Additions:
4
+
5
+ * byte_order_mark option for b/serialize/csv job
6
+
7
+ Added Jobs:
8
+
9
+ * b/compress/row_reader
10
+ * b/io/row_reader
11
+ # 1.3.0 (December 11th, 2020)
12
+
13
+ Additions:
14
+
15
+ * Decoupled storage: `Burner::Disks` factory, `Burner::Disks::Local` reference implementation, and `b/io/*` `disk` option for configuring IO jobs to use custom disks.
16
+ # 1.2.0 (November 25th, 2020)
17
+
18
+ #### Enhancements:
19
+
20
+ * All for a pipeline to be configured with null steps. When null, just execute all jobs in positional order.
21
+ * Allow Collection::Transform job attributes to implicitly start from a resolve transformer. `explicit: true` can be passed in as an option in case the desire is to begin from the record and not a specific value.
22
+
23
+ #### Added Jobs:
24
+
25
+ * b/collection/nested_aggregate
1
26
  # 1.1.0 (November 16, 2020)
2
27
 
3
28
  Added Jobs:
data/README.md CHANGED
@@ -42,7 +42,7 @@ pipeline = {
42
42
  {
43
43
  name: :output_value,
44
44
  type: 'b/echo',
45
- message: 'The current value is: {__value}'
45
+ message: 'The current value is: {__default_register}'
46
46
  },
47
47
  {
48
48
  name: :parse,
@@ -89,40 +89,22 @@ Some notes:
89
89
 
90
90
  * Some values are able to be string-interpolated using the provided Payload#params. This allows for the passing runtime configuration/data into pipelines/jobs.
91
91
  * The job's ID can be accessed using the `__id` key.
92
- * The current job's payload value can be accessed using the `__value` key.
92
+ * The current payload registers' values can be accessed using the `__<register_name>_register` key.
93
93
  * Jobs can be re-used (just like the output_id and output_value jobs).
94
+ * If steps is nil then all jobs will execute in their declared order.
94
95
 
95
96
  ### Capturing Feedback / Output
96
97
 
97
98
  By default, output will be emitted to `$stdout`. You can add or change listeners by passing in optional values into Pipeline#execute. For example, say we wanted to capture the output from our json-to-yaml example:
98
99
 
99
100
  ````ruby
100
- class StringOut
101
- def initialize
102
- @io = StringIO.new
103
- end
104
-
105
- def puts(msg)
106
- tap { io.write("#{msg}\n") }
107
- end
108
-
109
- def read
110
- io.rewind
111
- io.read
112
- end
113
-
114
- private
115
-
116
- attr_reader :io
117
- end
118
-
119
- string_out = StringOut.new
120
- output = Burner::Output.new(outs: string_out)
121
- payload = Burner::Payload.new(params: params)
101
+ io = StringIO.new
102
+ output = Burner::Output.new(outs: io)
103
+ payload = Burner::Payload.new(params: params)
122
104
 
123
105
  Burner::Pipeline.make(pipeline).execute(output: output, payload: payload)
124
106
 
125
- log = string_out.read
107
+ log = io.string
126
108
  ````
127
109
 
128
110
  The value of `log` should now look similar to:
@@ -181,7 +163,7 @@ jobs:
181
163
 
182
164
  - name: output_value
183
165
  type: b/echo
184
- message: 'The current value is: {__value}'
166
+ message: 'The current value is: {__default_register}'
185
167
 
186
168
  - name: parse
187
169
  type: b/deserialize/json
@@ -238,13 +220,18 @@ This library only ships with very basic, rudimentary jobs that are meant to just
238
220
  * **b/collection/concatenate** [from_registers, to_register]: Concatenate each from_register's value and place the newly concatenated array into the to_register. Note: this does not do any deep copying and should be assumed it is shallow copying all objects.
239
221
  * **b/collection/graph** [config, key, register]: Use [Hashematics](https://github.com/bluemarblepayroll/hashematics) to turn a flat array of objects into a deeply nested object tree.
240
222
  * **b/collection/group** [keys, register, separator]: Take a register's value (an array of objects) and group the objects by the specified keys.
223
+ * **b/collection/nested_aggregate** [register, key_mappings, key, separator]: Traverse a set of objects, resolving key's value for each object, optionally copying down key_mappings to the child records, then merging all the inner records together.
241
224
  * **b/collection/objects_to_arrays** [mappings, register]: Convert an array of objects to an array of arrays.
242
225
  * **b/collection/shift** [amount, register]: Remove the first N number of elements from an array.
243
- * **b/collection/transform** [attributes, exclusive, separator, register]: Iterate over all objects and transform each key per the attribute transformers specifications. If exclusive is set to false then the current object will be overridden/merged. Separator can also be set for key path support. This job uses [Realize](https://github.com/bluemarblepayroll/realize), which provides its own extendable value-transformation pipeline.
226
+ * **b/collection/transform** [attributes, exclusive, separator, register]: Iterate over all objects and transform each key per the attribute transformers specifications. If exclusive is set to false then the current object will be overridden/merged. Separator can also be set for key path support. This job uses [Realize](https://github.com/bluemarblepayroll/realize), which provides its own extendable value-transformation pipeline. If an attribute is not set with `explicit: true` then it will automatically start from the key's value from the record. If `explicit: true` is started, then it will start from the record itself.
244
227
  * **b/collection/unpivot** [pivot_set, register]: Take an array of objects and unpivot specific sets of keys into rows. Under the hood it uses [HashMath's Unpivot class](https://github.com/bluemarblepayroll/hash_math#unpivot-hash-key-coalescence-and-row-extrapolation).
245
228
  * **b/collection/validate** [invalid_register, join_char, message_key, register, separator, validations]: Take an array of objects, run it through each declared validator, and split the objects into two registers. The valid objects will be split into the current register while the invalid ones will go into the invalid_register as declared. Optional arguments, join_char and message_key, help determine the compiled error messages. The separator option can be utilized to use dot-notation for validating keys. See each validation's options by viewing their classes within the `lib/modeling/validations` directory.
246
229
  * **b/collection/values** [include_keys, register]: Take an array of objects and call `#values` on each object. If include_keys is true (it is false by default), then call `#keys` on the first object and inject that as a "header" object.
247
230
 
231
+ #### Compression
232
+
233
+ * **b/compress/row_reader** [data_key, ignore_blank_path, ignore_blank_data, path_key, register, separator]: Iterates over an array of objects, extracts a path and data in each object, and creates a zip file.
234
+
248
235
  #### De-serialization
249
236
 
250
237
  * **b/deserialize/csv** [register]: Take a CSV string and de-serialize into object(s). Currently it will return an array of arrays, with each nested array representing one row.
@@ -253,13 +240,16 @@ This library only ships with very basic, rudimentary jobs that are meant to just
253
240
 
254
241
  #### IO
255
242
 
256
- * **b/io/exist** [path, short_circuit]: Check to see if a file exists. The path parameter can be interpolated using `Payload#params`. If short_circuit was set to true (defaults to false) and the file does not exist then the pipeline will be short-circuited.
257
- * **b/io/read** [binary, path, register]: Read in a local file. The path parameter can be interpolated using `Payload#params`. If the contents are binary, pass in `binary: true` to open it up in binary+read mode.
258
- * **b/io/write** [binary, path, register]: Write to a local file. The path parameter can be interpolated using `Payload#params`. If the contents are binary, pass in `binary: true` to open it up in binary+write mode.
243
+ By default all jobs will use the `Burner::Disks::Local` disk for its persistence. But this is configurable by implementing and registering custom disk-based classes in the `Burner::Disks` factory. For example: a consumer application may also want to interact with cloud-based storage providers and could leverage this as its job library instead of implementing custom jobs.
244
+
245
+ * **b/io/exist** [disk, path, short_circuit]: Check to see if a file exists. The path parameter can be interpolated using `Payload#params`. If short_circuit was set to true (defaults to false) and the file does not exist then the pipeline will be short-circuited.
246
+ * **b/io/read** [binary, disk, path, register]: Read in a local file. The path parameter can be interpolated using `Payload#params`. If the contents are binary, pass in `binary: true` to open it up in binary+read mode.
247
+ * **b/io/row_reader** [data_key, disk, ignore_blank_path, ignore_file_not_found, path_key, register, separator]: Iterates over an array of objects, extracts a filepath from a key in each object, and attempts to load the file's content for each record. The file's content will be stored at the specified data_key. By default missing paths or files will be treated as hard errors. If you wish to ignore these then pass in true for ignore_blank_path and/or ignore_file_not_found.
248
+ * **b/io/write** [binary, disk, path, register]: Write to a local file. The path parameter can be interpolated using `Payload#params`. If the contents are binary, pass in `binary: true` to open it up in binary+write mode.
259
249
 
260
250
  #### Serialization
261
251
 
262
- * **b/serialize/csv** [register]: Take an array of arrays and create a CSV.
252
+ * **b/serialize/csv** [byte_order_mark, register]: Take an array of arrays and create a CSV. You can optionally pre-pend a byte order mark, see Burner::Modeling::ByteOrderMark for acceptable options.
263
253
  * **b/serialize/json** [register]: Convert value to JSON.
264
254
  * **b/serialize/yaml** [register]: Convert value to YAML.
265
255
 
@@ -316,7 +306,7 @@ pipeline = {
316
306
  {
317
307
  name: :output_value,
318
308
  type: 'b/echo',
319
- message: 'The current value is: {__value}'
309
+ message: 'The current value is: {__default_register}'
320
310
  },
321
311
  {
322
312
  name: :parse,
@@ -33,6 +33,7 @@ Gem::Specification.new do |s|
33
33
  s.add_dependency('hash_math', '~>1.2')
34
34
  s.add_dependency('objectable', '~>1.0')
35
35
  s.add_dependency('realize', '~>1.3')
36
+ s.add_dependency('rubyzip', '~>1.2')
36
37
  s.add_dependency('stringento', '~>2.1')
37
38
 
38
39
  s.add_development_dependency('guard-rspec', '~>4.7')
@@ -10,6 +10,7 @@
10
10
  require 'acts_as_hashable'
11
11
  require 'benchmark'
12
12
  require 'csv'
13
+ require 'fileutils'
13
14
  require 'forwardable'
14
15
  require 'hash_math'
15
16
  require 'hashematics'
@@ -21,8 +22,10 @@ require 'singleton'
21
22
  require 'stringento'
22
23
  require 'time'
23
24
  require 'yaml'
25
+ require 'zip'
24
26
 
25
27
  # Common/Shared
28
+ require_relative 'burner/disks'
26
29
  require_relative 'burner/modeling'
27
30
  require_relative 'burner/side_effects'
28
31
  require_relative 'burner/util'
@@ -0,0 +1,26 @@
1
+ # frozen_string_literal: true
2
+
3
+ #
4
+ # Copyright (c) 2020-present, Blue Marble Payroll, LLC
5
+ #
6
+ # This source code is licensed under the MIT license found in the
7
+ # LICENSE file in the root directory of this source tree.
8
+ #
9
+
10
+ require_relative 'disks/local'
11
+
12
+ module Burner
13
+ # A factory to register and emit instances that conform to the Disk interface with requests
14
+ # the instance responds to: #exist?, #read, and #write. See an example implementation within
15
+ # the lib/burner/disks directory.
16
+ #
17
+ # The benefit to this pluggable disk model is a consumer application can decide which file
18
+ # backend to use and how to store files. For example: an application may choose to use
19
+ # some cloud provider with their own file store implementation. This can be wrapped up
20
+ # in a Disk class and registered here and then referenced in the Pipeline's IO jobs.
21
+ class Disks
22
+ acts_as_hashable_factory
23
+
24
+ register 'local', '', Disks::Local
25
+ end
26
+ end
@@ -0,0 +1,61 @@
1
+ # frozen_string_literal: true
2
+
3
+ #
4
+ # Copyright (c) 2020-present, Blue Marble Payroll, LLC
5
+ #
6
+ # This source code is licensed under the MIT license found in the
7
+ # LICENSE file in the root directory of this source tree.
8
+ #
9
+
10
+ module Burner
11
+ class Disks
12
+ # Operations against the local file system.
13
+ class Local
14
+ acts_as_hashable
15
+
16
+ # Check to see if the passed in path exists within the local file system.
17
+ # It will not make assumptions on what the 'file' is, only that it is recognized
18
+ # by Ruby's File class.
19
+ def exist?(path)
20
+ File.exist?(path)
21
+ end
22
+
23
+ # Open and read the contents of a local file. If binary is passed in as true then the file
24
+ # will be opened in binary mode.
25
+ def read(path, binary: false)
26
+ File.open(path, read_mode(binary), &:read)
27
+ end
28
+
29
+ # Open and write the specified data to a local file. If binary is passed in as true then
30
+ # the file will be opened in binary mode. It is important to note that if the file's
31
+ # directory structure will be automatically created if it does not exist.
32
+ def write(path, data, binary: false)
33
+ ensure_directory_exists(path)
34
+
35
+ File.open(path, write_mode(binary)) { |io| io.write(data) }
36
+
37
+ path
38
+ end
39
+
40
+ private
41
+
42
+ def ensure_directory_exists(path)
43
+ dirname = File.dirname(path)
44
+
45
+ return if File.exist?(dirname)
46
+
47
+ FileUtils.mkdir_p(dirname)
48
+
49
+ nil
50
+ end
51
+
52
+ def write_mode(binary)
53
+ binary ? 'wb' : 'w'
54
+ end
55
+
56
+ def read_mode(binary)
57
+ binary ? 'rb' : 'r'
58
+ end
59
+ end
60
+ end
61
+ end
@@ -27,6 +27,7 @@ module Burner
27
27
  register 'b/collection/concatenate', Library::Collection::Concatenate
28
28
  register 'b/collection/graph', Library::Collection::Graph
29
29
  register 'b/collection/group', Library::Collection::Group
30
+ register 'b/collection/nested_aggregate', Library::Collection::NestedAggregate
30
31
  register 'b/collection/objects_to_arrays', Library::Collection::ObjectsToArrays
31
32
  register 'b/collection/shift', Library::Collection::Shift
32
33
  register 'b/collection/transform', Library::Collection::Transform
@@ -34,12 +35,15 @@ module Burner
34
35
  register 'b/collection/values', Library::Collection::Values
35
36
  register 'b/collection/validate', Library::Collection::Validate
36
37
 
38
+ register 'b/compress/row_reader', Library::Compress::RowReader
39
+
37
40
  register 'b/deserialize/csv', Library::Deserialize::Csv
38
41
  register 'b/deserialize/json', Library::Deserialize::Json
39
42
  register 'b/deserialize/yaml', Library::Deserialize::Yaml
40
43
 
41
44
  register 'b/io/exist', Library::IO::Exist
42
45
  register 'b/io/read', Library::IO::Read
46
+ register 'b/io/row_reader', Library::IO::RowReader
43
47
  register 'b/io/write', Library::IO::Write
44
48
 
45
49
  register 'b/serialize/csv', Library::Serialize::Csv
@@ -18,6 +18,7 @@ require_relative 'library/collection/coalesce'
18
18
  require_relative 'library/collection/concatenate'
19
19
  require_relative 'library/collection/graph'
20
20
  require_relative 'library/collection/group'
21
+ require_relative 'library/collection/nested_aggregate'
21
22
  require_relative 'library/collection/objects_to_arrays'
22
23
  require_relative 'library/collection/shift'
23
24
  require_relative 'library/collection/transform'
@@ -25,12 +26,15 @@ require_relative 'library/collection/unpivot'
25
26
  require_relative 'library/collection/validate'
26
27
  require_relative 'library/collection/values'
27
28
 
29
+ require_relative 'library/compress/row_reader'
30
+
28
31
  require_relative 'library/deserialize/csv'
29
32
  require_relative 'library/deserialize/json'
30
33
  require_relative 'library/deserialize/yaml'
31
34
 
32
35
  require_relative 'library/io/exist'
33
36
  require_relative 'library/io/read'
37
+ require_relative 'library/io/row_reader'
34
38
  require_relative 'library/io/write'
35
39
 
36
40
  require_relative 'library/serialize/csv'
@@ -0,0 +1,67 @@
1
+ # frozen_string_literal: true
2
+
3
+ #
4
+ # Copyright (c) 2020-present, Blue Marble Payroll, LLC
5
+ #
6
+ # This source code is licensed under the MIT license found in the
7
+ # LICENSE file in the root directory of this source tree.
8
+ #
9
+
10
+ module Burner
11
+ module Library
12
+ module Collection
13
+ # Iterate over a collection of objects, calling key on each object, then aggregating the
14
+ # returns of key together into one array. This new derived array will be set as the value
15
+ # for the payload's register. Leverage the key_mappings option to optionally copy down
16
+ # keys and values from outer to inner records. This job is particularly useful
17
+ # if you have nested arrays but wish to deal with each level/depth in the aggregate.
18
+ #
19
+ # Expected Payload[register] input: array of objects.
20
+ # Payload[register] output: array of objects.
21
+ class NestedAggregate < JobWithRegister
22
+ attr_reader :key, :key_mappings, :resolver
23
+
24
+ def initialize(name:, key:, key_mappings: [], register: DEFAULT_REGISTER, separator: '')
25
+ super(name: name, register: register)
26
+
27
+ raise ArgumentError, 'key is required' if key.to_s.empty?
28
+
29
+ @key = key.to_s
30
+ @key_mappings = Modeling::KeyMapping.array(key_mappings)
31
+ @resolver = Objectable.resolver(separator: separator.to_s)
32
+
33
+ freeze
34
+ end
35
+
36
+ def perform(output, payload)
37
+ records = array(payload[register])
38
+ count = records.length
39
+
40
+ output.detail("Aggregating on key: #{key} for #{count} records(s)")
41
+
42
+ # Outer loop on parent records
43
+ payload[register] = records.each_with_object([]) do |record, memo|
44
+ inner_records = resolver.get(record, key)
45
+
46
+ # Inner loop on child records
47
+ array(inner_records).each do |inner_record|
48
+ memo << copy_key_mappings(record, inner_record)
49
+ end
50
+ end
51
+ end
52
+
53
+ private
54
+
55
+ def copy_key_mappings(source_record, destination_record)
56
+ key_mappings.each do |key_mapping|
57
+ value = resolver.get(source_record, key_mapping.from)
58
+
59
+ resolver.set(destination_record, key_mapping.to, value)
60
+ end
61
+
62
+ destination_record
63
+ end
64
+ end
65
+ end
66
+ end
67
+ end
@@ -0,0 +1,102 @@
1
+ # frozen_string_literal: true
2
+
3
+ #
4
+ # Copyright (c) 2020-present, Blue Marble Payroll, LLC
5
+ #
6
+ # This source code is licensed under the MIT license found in the
7
+ # LICENSE file in the root directory of this source tree.
8
+ #
9
+
10
+ module Burner
11
+ module Library
12
+ module Compress
13
+ # Iterates over an array of objects, extracts a path and data in each object, and
14
+ # creates a zip file. By default, if a path is blank then an ArgumentError will be raised.
15
+ # If this is undesirable then you can set ignore_blank_path to true and the record will be
16
+ # skipped. You also have the option to supress blank files being added by configuring
17
+ # ignore_blank_data as true.
18
+ #
19
+ # Expected Payload[register] input: array of objects.
20
+ # Payload[register] output: compressed binary zip file contents.
21
+ class RowReader < JobWithRegister
22
+ Content = Struct.new(:path, :data)
23
+
24
+ private_constant :Content
25
+
26
+ DEFAULT_DATA_KEY = 'data'
27
+ DEFAULT_PATH_KEY = 'path'
28
+
29
+ attr_reader :data_key,
30
+ :ignore_blank_data,
31
+ :ignore_blank_path,
32
+ :path_key,
33
+ :resolver
34
+
35
+ def initialize(
36
+ name:,
37
+ data_key: DEFAULT_DATA_KEY,
38
+ ignore_blank_data: false,
39
+ ignore_blank_path: false,
40
+ path_key: DEFAULT_PATH_KEY,
41
+ register: DEFAULT_REGISTER,
42
+ separator: ''
43
+ )
44
+ super(name: name, register: register)
45
+
46
+ @data_key = data_key.to_s
47
+ @ignore_blank_data = ignore_blank_data || false
48
+ @ignore_blank_path = ignore_blank_path || false
49
+ @path_key = path_key.to_s
50
+ @resolver = Objectable.resolver(separator: separator)
51
+
52
+ freeze
53
+ end
54
+
55
+ def perform(output, payload)
56
+ payload[register] = Zip::OutputStream.write_buffer do |zip|
57
+ array(payload[register]).each.with_index(1) do |record, index|
58
+ content = extract_path_and_data(record, index, output)
59
+
60
+ next unless content
61
+
62
+ zip.put_next_entry(content.path)
63
+ zip.write(content.data)
64
+ end
65
+ end.string
66
+ end
67
+
68
+ private
69
+
70
+ def extract_path_and_data(record, index, output)
71
+ path = strip_leading_separator(resolver.get(record, path_key))
72
+ data = resolver.get(record, data_key)
73
+
74
+ return if assert_and_skip_missing_path?(path, index, output)
75
+ return if skip_missing_data?(data, index, output)
76
+
77
+ Content.new(path, data)
78
+ end
79
+
80
+ def strip_leading_separator(path)
81
+ path.to_s.start_with?(File::SEPARATOR) ? path.to_s[1..-1] : path.to_s
82
+ end
83
+
84
+ def assert_and_skip_missing_path?(path, index, output)
85
+ if ignore_blank_path && path.to_s.empty?
86
+ output.detail("Skipping record #{index} because of blank path")
87
+ true
88
+ elsif path.to_s.empty?
89
+ raise ArgumentError, "Record #{index} is missing a path at key: #{path_key}"
90
+ end
91
+ end
92
+
93
+ def skip_missing_data?(data, index, output)
94
+ return false unless ignore_blank_data && data.to_s.empty?
95
+
96
+ output.detail("Skipping record #{index} because of blank data")
97
+ true
98
+ end
99
+ end
100
+ end
101
+ end
102
+ end
@@ -7,8 +7,6 @@
7
7
  # LICENSE file in the root directory of this source tree.
8
8
  #
9
9
 
10
- require_relative 'base'
11
-
12
10
  module Burner
13
11
  module Library
14
12
  module IO
@@ -17,13 +15,14 @@ module Burner
17
15
  #
18
16
  # Note: this does not use Payload#registers.
19
17
  class Exist < Job
20
- attr_reader :path, :short_circuit
18
+ attr_reader :disk, :path, :short_circuit
21
19
 
22
- def initialize(name:, path:, short_circuit: false)
20
+ def initialize(name:, path:, disk: {}, short_circuit: false)
23
21
  super(name: name)
24
22
 
25
23
  raise ArgumentError, 'path is required' if path.to_s.empty?
26
24
 
25
+ @disk = Disks.make(disk)
27
26
  @path = path.to_s
28
27
  @short_circuit = short_circuit || false
29
28
  end
@@ -31,7 +30,7 @@ module Burner
31
30
  def perform(output, payload)
32
31
  compiled_path = job_string_template(path, output, payload)
33
32
 
34
- exists = File.exist?(compiled_path)
33
+ exists = disk.exist?(compiled_path)
35
34
  verb = exists ? 'does' : 'does not'
36
35
 
37
36
  output.detail("The path: #{compiled_path} #{verb} exist")
@@ -10,16 +10,20 @@
10
10
  module Burner
11
11
  module Library
12
12
  module IO
13
- # Common configuration/code for all IO Job subclasses.
14
- class Base < JobWithRegister
15
- attr_reader :path
13
+ # Common configuration/code for all IO Job subclasses that open a file.
14
+ class OpenFileBase < JobWithRegister
15
+ attr_reader :binary, :disk, :path
16
16
 
17
- def initialize(name:, path:, register: DEFAULT_REGISTER)
17
+ def initialize(name:, path:, binary: false, disk: {}, register: DEFAULT_REGISTER)
18
18
  super(name: name, register: register)
19
19
 
20
20
  raise ArgumentError, 'path is required' if path.to_s.empty?
21
21
 
22
- @path = path.to_s
22
+ @binary = binary || false
23
+ @disk = Disks.make(disk)
24
+ @path = path.to_s
25
+
26
+ freeze
23
27
  end
24
28
  end
25
29
  end
@@ -7,7 +7,7 @@
7
7
  # LICENSE file in the root directory of this source tree.
8
8
  #
9
9
 
10
- require_relative 'base'
10
+ require_relative 'open_file_base'
11
11
 
12
12
  module Burner
13
13
  module Library
@@ -16,29 +16,13 @@ module Burner
16
16
  #
17
17
  # Expected Payload[register] input: nothing.
18
18
  # Payload[register] output: contents of the specified file.
19
- class Read < Base
20
- attr_reader :binary
21
-
22
- def initialize(name:, path:, binary: false, register: DEFAULT_REGISTER)
23
- super(name: name, path: path, register: register)
24
-
25
- @binary = binary || false
26
-
27
- freeze
28
- end
29
-
19
+ class Read < OpenFileBase
30
20
  def perform(output, payload)
31
21
  compiled_path = job_string_template(path, output, payload)
32
22
 
33
23
  output.detail("Reading: #{compiled_path}")
34
24
 
35
- payload[register] = File.open(compiled_path, mode, &:read)
36
- end
37
-
38
- private
39
-
40
- def mode
41
- binary ? 'rb' : 'r'
25
+ payload[register] = disk.read(compiled_path, binary: binary)
42
26
  end
43
27
  end
44
28
  end
@@ -0,0 +1,119 @@
1
+ # frozen_string_literal: true
2
+
3
+ #
4
+ # Copyright (c) 2020-present, Blue Marble Payroll, LLC
5
+ #
6
+ # This source code is licensed under the MIT license found in the
7
+ # LICENSE file in the root directory of this source tree.
8
+ #
9
+
10
+ require_relative 'open_file_base'
11
+
12
+ module Burner
13
+ module Library
14
+ module IO
15
+ # Iterates over an array of objects, extracts a filepath from a key in each object,
16
+ # and attempts to load the file's content for each record. The file's content will be
17
+ # stored at the specified data_key. By default missing paths or files will be
18
+ # treated as hard errors. If you wish to ignore these then pass in true for
19
+ # ignore_blank_path and/or ignore_file_not_found.
20
+ #
21
+ # Expected Payload[register] input: array of objects.
22
+ # Payload[register] output: array of objects.
23
+ class RowReader < JobWithRegister
24
+ class FileNotFoundError < StandardError; end
25
+
26
+ DEFAULT_DATA_KEY = 'data'
27
+ DEFAULT_PATH_KEY = 'path'
28
+
29
+ attr_reader :binary,
30
+ :data_key,
31
+ :disk,
32
+ :ignore_blank_path,
33
+ :ignore_file_not_found,
34
+ :path_key,
35
+ :resolver
36
+
37
+ def initialize(
38
+ name:,
39
+ binary: false,
40
+ data_key: DEFAULT_DATA_KEY,
41
+ disk: {},
42
+ ignore_blank_path: false,
43
+ ignore_file_not_found: false,
44
+ path_key: DEFAULT_PATH_KEY,
45
+ register: DEFAULT_REGISTER,
46
+ separator: ''
47
+ )
48
+ super(name: name, register: register)
49
+
50
+ @binary = binary || false
51
+ @data_key = data_key.to_s
52
+ @disk = Disks.make(disk)
53
+ @ignore_blank_path = ignore_blank_path || false
54
+ @ignore_file_not_found = ignore_file_not_found || false
55
+ @path_key = path_key.to_s
56
+ @resolver = Objectable.resolver(separator: separator)
57
+
58
+ freeze
59
+ end
60
+
61
+ def perform(output, payload)
62
+ records = array(payload[register])
63
+
64
+ output.detail("Reading path_key: #{path_key} for #{payload[register].length} records(s)")
65
+ output.detail("Storing read data in: #{path_key}")
66
+
67
+ payload[register] = records.map.with_index(1) do |object, index|
68
+ load_data(object, index, output)
69
+ end
70
+ end
71
+
72
+ private
73
+
74
+ def assert_and_skip_missing_path?(path, index, output)
75
+ missing_path = path.to_s.empty?
76
+ blank_path_raises_error = !ignore_blank_path
77
+
78
+ if missing_path && blank_path_raises_error
79
+ output.detail("Record #{index} is missing a path, raising error")
80
+
81
+ raise ArgumentError, "Record #{index} is missing a path"
82
+ elsif missing_path
83
+ output.detail("Record #{index} is missing a path")
84
+
85
+ true
86
+ end
87
+ end
88
+
89
+ def assert_and_skip_file_not_found?(path, index, output)
90
+ does_not_exist = !disk.exist?(path)
91
+ file_not_found_raises_error = !ignore_file_not_found
92
+
93
+ if file_not_found_raises_error && does_not_exist
94
+ output.detail("Record #{index} path: '#{path}' does not exist, raising error")
95
+
96
+ raise FileNotFoundError, "#{path} does not exist"
97
+ elsif does_not_exist
98
+ output.detail("Record #{index} path: '#{path}' does not exist, skipping")
99
+
100
+ true
101
+ end
102
+ end
103
+
104
+ def load_data(object, index, output)
105
+ path = resolver.get(object, path_key)
106
+
107
+ return object if assert_and_skip_missing_path?(path, index, output)
108
+ return object if assert_and_skip_file_not_found?(path, index, output)
109
+
110
+ data = disk.read(path, binary: binary)
111
+
112
+ resolver.set(object, data_key, data)
113
+
114
+ object
115
+ end
116
+ end
117
+ end
118
+ end
119
+ end
@@ -7,7 +7,7 @@
7
7
  # LICENSE file in the root directory of this source tree.
8
8
  #
9
9
 
10
- require_relative 'base'
10
+ require_relative 'open_file_base'
11
11
 
12
12
  module Burner
13
13
  module Library
@@ -16,54 +16,27 @@ module Burner
16
16
  #
17
17
  # Expected Payload[register] input: anything.
18
18
  # Payload[register] output: whatever was passed in.
19
- class Write < Base
20
- attr_reader :binary
21
-
22
- def initialize(name:, path:, binary: false, register: DEFAULT_REGISTER)
23
- super(name: name, path: path, register: register)
24
-
25
- @binary = binary || false
26
-
27
- freeze
28
- end
29
-
19
+ class Write < OpenFileBase
30
20
  def perform(output, payload)
31
- compiled_path = job_string_template(path, output, payload)
32
-
33
- ensure_directory_exists(output, compiled_path)
21
+ logical_filename = job_string_template(path, output, payload)
22
+ physical_filename = nil
34
23
 
35
- output.detail("Writing: #{compiled_path}")
24
+ output.detail("Writing: #{logical_filename}")
36
25
 
37
26
  time_in_seconds = Benchmark.measure do
38
- File.open(compiled_path, mode) { |io| io.write(payload[register]) }
27
+ physical_filename = disk.write(logical_filename, payload[register], binary: binary)
39
28
  end.real
40
29
 
30
+ output.detail("Wrote to: #{physical_filename}")
31
+
41
32
  side_effect = SideEffects::WrittenFile.new(
42
- logical_filename: compiled_path,
43
- physical_filename: compiled_path,
33
+ logical_filename: logical_filename,
34
+ physical_filename: physical_filename,
44
35
  time_in_seconds: time_in_seconds
45
36
  )
46
37
 
47
38
  payload.add_side_effect(side_effect)
48
39
  end
49
-
50
- private
51
-
52
- def ensure_directory_exists(output, compiled_path)
53
- dirname = File.dirname(compiled_path)
54
-
55
- return if File.exist?(dirname)
56
-
57
- output.detail("Outer directory does not exist, creating: #{dirname}")
58
-
59
- FileUtils.mkdir_p(dirname)
60
-
61
- nil
62
- end
63
-
64
- def mode
65
- binary ? 'wb' : 'w'
66
- end
67
40
  end
68
41
  end
69
42
  end
@@ -10,17 +10,30 @@
10
10
  module Burner
11
11
  module Library
12
12
  module Serialize
13
- # Take an array of arrays and create a CSV.
13
+ # Take an array of arrays and create a CSV. You can optionally pre-pend a byte order mark,
14
+ # see Burner::Modeling::ByteOrderMark for acceptable options.
14
15
  #
15
16
  # Expected Payload[register] input: array of arrays.
16
17
  # Payload[register] output: a serialized CSV string.
17
18
  class Csv < JobWithRegister
19
+ attr_reader :byte_order_mark
20
+
21
+ def initialize(name:, byte_order_mark: nil, register: DEFAULT_REGISTER)
22
+ super(name: name, register: register)
23
+
24
+ @byte_order_mark = Modeling::ByteOrderMark.resolve(byte_order_mark)
25
+
26
+ freeze
27
+ end
28
+
18
29
  def perform(_output, payload)
19
- payload[register] = CSV.generate(options) do |csv|
30
+ serialized_rows = CSV.generate(options) do |csv|
20
31
  array(payload[register]).each do |row|
21
32
  csv << row
22
33
  end
23
34
  end
35
+
36
+ payload[register] = "#{byte_order_mark}#{serialized_rows}"
24
37
  end
25
38
 
26
39
  private
@@ -9,6 +9,7 @@
9
9
 
10
10
  require_relative 'modeling/attribute'
11
11
  require_relative 'modeling/attribute_renderer'
12
+ require_relative 'modeling/byte_order_mark'
12
13
  require_relative 'modeling/key_index_mapping'
13
14
  require_relative 'modeling/key_mapping'
14
15
  require_relative 'modeling/validations'
@@ -13,19 +13,37 @@ module Burner
13
13
  # to set the key to. The transformers that can be passed in can be any Realize::Transformers
14
14
  # subclasses. For more information, see the Realize library at:
15
15
  # https://github.com/bluemarblepayroll/realize
16
+ #
17
+ # Note that if explicit: true is set then no transformers will be automatically injected.
18
+ # If explicit is not true (default) then it will have a resolve job automatically injected
19
+ # in the beginning of the pipeline. This is the observed default behavior, with the
20
+ # exception having to be initially cross-mapped using a custom resolve transformation.
16
21
  class Attribute
17
22
  acts_as_hashable
18
23
 
24
+ RESOLVE_TYPE = 'r/value/resolve'
25
+
19
26
  attr_reader :key, :transformers
20
27
 
21
- def initialize(key:, transformers: [])
28
+ def initialize(key:, explicit: false, transformers: [])
22
29
  raise ArgumentError, 'key is required' if key.to_s.empty?
23
30
 
24
31
  @key = key.to_s
25
- @transformers = Realize::Transformers.array(transformers)
32
+ @transformers = base_transformers(explicit) + Realize::Transformers.array(transformers)
26
33
 
27
34
  freeze
28
35
  end
36
+
37
+ private
38
+
39
+ # When explicit, this will return an empty array.
40
+ # When not explicit, this will return an array with a basic transformer that simply
41
+ # gets the key's value. This establishes a good majority base case.
42
+ def base_transformers(explicit)
43
+ return [] if explicit
44
+
45
+ [Realize::Transformers.make(type: RESOLVE_TYPE, key: key)]
46
+ end
29
47
  end
30
48
  end
31
49
  end
@@ -0,0 +1,27 @@
1
+ # frozen_string_literal: true
2
+
3
+ #
4
+ # Copyright (c) 2020-present, Blue Marble Payroll, LLC
5
+ #
6
+ # This source code is licensed under the MIT license found in the
7
+ # LICENSE file in the root directory of this source tree.
8
+ #
9
+
10
+ module Burner
11
+ module Modeling
12
+ # Define all acceptable byte order mark values.
13
+ module ByteOrderMark
14
+ UTF_8 = "\xEF\xBB\xBF"
15
+ UTF_16BE = "\xFE\xFF"
16
+ UTF_16LE = "\xFF\xFE"
17
+ UTF_32BE = "\x00\x00\xFE\xFF"
18
+ UTF_32LE = "\xFE\xFF\x00\x00"
19
+
20
+ class << self
21
+ def resolve(value)
22
+ value ? const_get(value.to_s.upcase.to_sym) : nil
23
+ end
24
+ end
25
+ end
26
+ end
27
+ end
@@ -9,18 +9,18 @@
9
9
 
10
10
  module Burner
11
11
  module Modeling
12
- # Generic mapping from a key to another key.
12
+ # Generic mapping from a key to another key. The argument 'to' is optional
13
+ # and if it is blank then the 'from' value will be used for the 'to' as well.
13
14
  class KeyMapping
14
15
  acts_as_hashable
15
16
 
16
17
  attr_reader :from, :to
17
18
 
18
- def initialize(from:, to:)
19
+ def initialize(from:, to: '')
19
20
  raise ArgumentError, 'from is required' if from.to_s.empty?
20
- raise ArgumentError, 'to is required' if to.to_s.empty?
21
21
 
22
22
  @from = from.to_s
23
- @to = to.to_s
23
+ @to = to.to_s.empty? ? @from : to.to_s
24
24
 
25
25
  freeze
26
26
  end
@@ -14,7 +14,8 @@ require_relative 'step'
14
14
 
15
15
  module Burner
16
16
  # The root package. A Pipeline contains the job configurations along with the steps. The steps
17
- # referens jobs and tell you the order of the jobs to run.
17
+ # reference jobs and tell you the order of the jobs to run. If steps is nil then all jobs
18
+ # will execute in their declared order.
18
19
  class Pipeline
19
20
  acts_as_hashable
20
21
 
@@ -23,14 +24,16 @@ module Burner
23
24
 
24
25
  attr_reader :steps
25
26
 
26
- def initialize(jobs: [], steps: [])
27
+ def initialize(jobs: [], steps: nil)
27
28
  jobs = Jobs.array(jobs)
28
29
 
29
30
  assert_unique_job_names(jobs)
30
31
 
31
32
  jobs_by_name = jobs.map { |job| [job.name, job] }.to_h
32
33
 
33
- @steps = Array(steps).map do |step_name|
34
+ step_names = steps ? Array(steps) : jobs_by_name.keys
35
+
36
+ @steps = step_names.map do |step_name|
34
37
  job = jobs_by_name[step_name.to_s]
35
38
 
36
39
  raise JobNotFoundError, "#{step_name} was not declared as a job" unless job
@@ -8,5 +8,5 @@
8
8
  #
9
9
 
10
10
  module Burner
11
- VERSION = '1.1.0'
11
+ VERSION = '1.4.0'
12
12
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: burner
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.1.0
4
+ version: 1.4.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Matthew Ruggio
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2020-11-16 00:00:00.000000000 Z
11
+ date: 2020-12-17 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: acts_as_hashable
@@ -80,6 +80,20 @@ dependencies:
80
80
  - - "~>"
81
81
  - !ruby/object:Gem::Version
82
82
  version: '1.3'
83
+ - !ruby/object:Gem::Dependency
84
+ name: rubyzip
85
+ requirement: !ruby/object:Gem::Requirement
86
+ requirements:
87
+ - - "~>"
88
+ - !ruby/object:Gem::Version
89
+ version: '1.2'
90
+ type: :runtime
91
+ prerelease: false
92
+ version_requirements: !ruby/object:Gem::Requirement
93
+ requirements:
94
+ - - "~>"
95
+ - !ruby/object:Gem::Version
96
+ version: '1.2'
83
97
  - !ruby/object:Gem::Dependency
84
98
  name: stringento
85
99
  requirement: !ruby/object:Gem::Requirement
@@ -220,6 +234,8 @@ files:
220
234
  - exe/burner
221
235
  - lib/burner.rb
222
236
  - lib/burner/cli.rb
237
+ - lib/burner/disks.rb
238
+ - lib/burner/disks/local.rb
223
239
  - lib/burner/job.rb
224
240
  - lib/burner/job_with_register.rb
225
241
  - lib/burner/jobs.rb
@@ -229,19 +245,22 @@ files:
229
245
  - lib/burner/library/collection/concatenate.rb
230
246
  - lib/burner/library/collection/graph.rb
231
247
  - lib/burner/library/collection/group.rb
248
+ - lib/burner/library/collection/nested_aggregate.rb
232
249
  - lib/burner/library/collection/objects_to_arrays.rb
233
250
  - lib/burner/library/collection/shift.rb
234
251
  - lib/burner/library/collection/transform.rb
235
252
  - lib/burner/library/collection/unpivot.rb
236
253
  - lib/burner/library/collection/validate.rb
237
254
  - lib/burner/library/collection/values.rb
255
+ - lib/burner/library/compress/row_reader.rb
238
256
  - lib/burner/library/deserialize/csv.rb
239
257
  - lib/burner/library/deserialize/json.rb
240
258
  - lib/burner/library/deserialize/yaml.rb
241
259
  - lib/burner/library/echo.rb
242
- - lib/burner/library/io/base.rb
243
260
  - lib/burner/library/io/exist.rb
261
+ - lib/burner/library/io/open_file_base.rb
244
262
  - lib/burner/library/io/read.rb
263
+ - lib/burner/library/io/row_reader.rb
245
264
  - lib/burner/library/io/write.rb
246
265
  - lib/burner/library/nothing.rb
247
266
  - lib/burner/library/serialize/csv.rb
@@ -253,6 +272,7 @@ files:
253
272
  - lib/burner/modeling.rb
254
273
  - lib/burner/modeling/attribute.rb
255
274
  - lib/burner/modeling/attribute_renderer.rb
275
+ - lib/burner/modeling/byte_order_mark.rb
256
276
  - lib/burner/modeling/key_index_mapping.rb
257
277
  - lib/burner/modeling/key_mapping.rb
258
278
  - lib/burner/modeling/validations.rb