metacrunch 4.0.3 → 4.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/Readme.md +29 -8
- data/lib/metacrunch/job.rb +15 -6
- data/lib/metacrunch/job/buffer.rb +17 -12
- data/lib/metacrunch/job/dsl.rb +2 -2
- data/lib/metacrunch/version.rb +1 -1
- metadata +2 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: a192155704d21d4792eab70bb1122bcd92a14d84
|
4
|
+
data.tar.gz: 07cf95291da074427756f363621947baaa248d95
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 1f360eebf44dc1b2a3025194c592e4bf01787c6637feec58aba758826c8d73eb4ec119730f539321911237cd395172b1c4a16cbdcc5d2728ed3830cc5fe64957
|
7
|
+
data.tar.gz: fa06ae98c1a2d4d8d7b4c7eaad845526df3ee5c12bbd1c6902a5ac90e9c32c1bab43b4886d13a40b691f7bff9c09d745ef60e61ac038a8163534b6fd307c6471
|
data/Readme.md
CHANGED
@@ -102,19 +102,27 @@ transformation MyTransformation.new
|
|
102
102
|
|
103
103
|
Sometimes it is useful to buffer data between transformation steps to allow a transformation to work on larger bulks of data. metacrunch uses a simple transformation buffer to achieve this.
|
104
104
|
|
105
|
-
To use a transformation buffer
|
105
|
+
To use a transformation buffer add the `:buffer` option to your transformation. You can pass a positive integer value as a buffer size, or as an advanced option you can pass a `Proc` object. The buffer flushes every time the buffer reaches the given size or if the `Proc` returns `true`.
|
106
106
|
|
107
107
|
```ruby
|
108
108
|
# File: my_etl_job.metacrunch
|
109
109
|
|
110
110
|
source 1..95 # A range responds to #each and is a valid source
|
111
111
|
|
112
|
+
# A buffer with a fixed size
|
112
113
|
transformation ->(bulk) {
|
113
114
|
# this transformation is called when the buffer
|
114
115
|
# is filled with 10 objects or if the source has
|
115
116
|
# yielded the last data object.
|
116
117
|
# bulk would be: [1,...,10], [11,...,20], ..., [91,...,95]
|
117
|
-
},
|
118
|
+
}, buffer: 10
|
119
|
+
|
120
|
+
# A buffer that uses a Proc
|
121
|
+
transformation ->(bulk) {
|
122
|
+
# ...
|
123
|
+
}, buffer: -> {
|
124
|
+
true if some_condition
|
125
|
+
}
|
118
126
|
```
|
119
127
|
|
120
128
|
#### Defining a destination
|
@@ -163,7 +171,7 @@ In this example we declare two options `log_level` and `database_url`. `log_leve
|
|
163
171
|
To set/override these options use the command line.
|
164
172
|
|
165
173
|
```
|
166
|
-
$
|
174
|
+
$ metacrunch my_etl_job.metacrunch --log-level debug
|
167
175
|
```
|
168
176
|
|
169
177
|
This will set the `options[:log_level]` to `debug`.
|
@@ -171,9 +179,8 @@ This will set the `options[:log_level]` to `debug`.
|
|
171
179
|
To get a list of available options for a job, use `--help` on the command line.
|
172
180
|
|
173
181
|
```
|
174
|
-
$
|
182
|
+
$ metacrunch my_etl_job.metacrunch --help
|
175
183
|
|
176
|
-
Usage: metacrunch [options] JOB_FILE [job-options] [ARGS]
|
177
184
|
Job options:
|
178
185
|
-l, --log-level LEVEL Log level (debug,info,warn,error)
|
179
186
|
DEFAULT: info
|
@@ -215,12 +222,17 @@ If you use [Bundler](http://bundler.io) to manage dependencies for your jobs mak
|
|
215
222
|
$ bundle exec metacrunch my_etl_job.metacrunch
|
216
223
|
```
|
217
224
|
|
218
|
-
|
225
|
+
In your job file use `Bundler.require` to require the dependencies from your `Gemfile`.
|
226
|
+
|
227
|
+
```ruby
|
228
|
+
# File: my_etl_job.metacrunch
|
229
|
+
Bundler.require
|
230
|
+
```
|
219
231
|
|
220
232
|
Use the following syntax to run a metacrunch job
|
221
233
|
|
222
234
|
```
|
223
|
-
$ [bundle exec] metacrunch [
|
235
|
+
$ [bundle exec] metacrunch [options] JOB_FILE [job-options] [ARGS...]
|
224
236
|
```
|
225
237
|
|
226
238
|
|
@@ -345,6 +357,15 @@ destination MyDestination.new
|
|
345
357
|
|
346
358
|
```
|
347
359
|
|
360
|
+
Official extension packages
|
361
|
+
---------------------------
|
362
|
+
|
363
|
+
* [metacrunch-db](https://github.com/ubpb/metacrunch-db): SQL Database package
|
364
|
+
* [metacrunch-file](https://github.com/ubpb/metacrunch-file): File package
|
365
|
+
* [metacrunch-elasticsearch](https://github.com/ubpb/metacrunch-elasticsearch): [Elasticsearch](https://www.elastic.co) package
|
366
|
+
* [metacrunch-redis](https://github.com/ubpb/metacrunch-redis): [Redis](https://redis.io) package
|
367
|
+
* [metacrunch-marcxml](https://github.com/ubpb/metacrunch-marcxml): [MARCXML](http://www.loc.gov/standards/marcxml/) package
|
368
|
+
|
348
369
|
Upgrading
|
349
370
|
---------
|
350
371
|
|
@@ -353,7 +374,7 @@ Upgrading
|
|
353
374
|
When upgrading from metacrunch 3.x, there are some breaking changes you need to address.
|
354
375
|
|
355
376
|
* There is now only one `source` and `destination`. If you have more than one in your job file the last definition will used.
|
356
|
-
* There is no `transformation_buffer` anymore. Instead set `
|
377
|
+
* There is no `transformation_buffer` anymore. Instead set `buffer` as an option to `transformation`.
|
357
378
|
* `transformation`, `pre_process` and `post_process` can't be implemented using a block anymore. Always use a `callable` (E.g. Lambda, Proc or any object responding to `#call`).
|
358
379
|
* When running jobs via the CLI you do not need to separate the arguments passed to metacrunch from the arguments passed to the job with `@@`.
|
359
380
|
* The `args` function used to get the non-option arguments passed to a job has been removed. Use `ARGV` instead.
|
data/lib/metacrunch/job.rb
CHANGED
@@ -14,6 +14,8 @@ module Metacrunch
|
|
14
14
|
def initialize(file_content = nil, &block)
|
15
15
|
@dsl = Dsl.new(self)
|
16
16
|
|
17
|
+
@deprecator = ActiveSupport::Deprecation.new("5.0.0", "metacrunch")
|
18
|
+
|
17
19
|
if file_content
|
18
20
|
@dsl.instance_eval(file_content, "Check your metacrunch Job at Line")
|
19
21
|
elsif block_given?
|
@@ -61,11 +63,16 @@ module Metacrunch
|
|
61
63
|
@transformations ||= []
|
62
64
|
end
|
63
65
|
|
64
|
-
def add_transformation(callable, buffer_size: nil)
|
66
|
+
def add_transformation(callable, buffer_size: nil, buffer: nil)
|
65
67
|
ensure_callable!(callable)
|
66
68
|
|
67
|
-
if buffer_size && buffer_size.
|
68
|
-
|
69
|
+
if buffer_size && buffer_size.is_a?(Numeric)
|
70
|
+
@deprecator.deprecation_warning(:buffer_size, :buffer)
|
71
|
+
buffer = buffer_size
|
72
|
+
end
|
73
|
+
|
74
|
+
if buffer
|
75
|
+
transformations << Metacrunch::Job::Buffer.new(buffer)
|
69
76
|
end
|
70
77
|
|
71
78
|
transformations << callable
|
@@ -120,11 +127,13 @@ module Metacrunch
|
|
120
127
|
def run_transformations(data, flush_buffers: false)
|
121
128
|
transformations.each do |transformation|
|
122
129
|
if transformation.is_a?(Buffer)
|
130
|
+
buffer = transformation
|
131
|
+
|
123
132
|
if data
|
124
|
-
data =
|
125
|
-
data =
|
133
|
+
data = buffer.buffer(data)
|
134
|
+
data = buffer.flush if flush_buffers
|
126
135
|
else
|
127
|
-
data =
|
136
|
+
data = buffer.flush
|
128
137
|
end
|
129
138
|
else
|
130
139
|
data = transformation.call(data) if data
|
@@ -1,25 +1,30 @@
|
|
1
1
|
module Metacrunch
|
2
2
|
class Job::Buffer
|
3
3
|
|
4
|
-
def initialize(
|
5
|
-
@
|
4
|
+
def initialize(size_or_proc)
|
5
|
+
@size_or_proc = size_or_proc
|
6
|
+
@buffer = []
|
7
|
+
|
8
|
+
if @size_or_proc.is_a?(Numeric) && @size_or_proc <= 0
|
9
|
+
raise ArgumentError, "Buffer size must be a posive number greater that 0."
|
10
|
+
end
|
6
11
|
end
|
7
12
|
|
8
13
|
def buffer(data)
|
9
|
-
|
10
|
-
|
14
|
+
@buffer << data
|
15
|
+
|
16
|
+
case @size_or_proc
|
17
|
+
when Numeric
|
18
|
+
flush if @buffer.count >= @size_or_proc.to_i
|
19
|
+
when Proc
|
20
|
+
flush if @size_or_proc.call == true
|
21
|
+
end
|
11
22
|
end
|
12
23
|
|
13
24
|
def flush
|
14
|
-
|
25
|
+
@buffer
|
15
26
|
ensure
|
16
|
-
@buffer =
|
17
|
-
end
|
18
|
-
|
19
|
-
private
|
20
|
-
|
21
|
-
def storage
|
22
|
-
@buffer ||= []
|
27
|
+
@buffer = []
|
23
28
|
end
|
24
29
|
|
25
30
|
end
|
data/lib/metacrunch/job/dsl.rb
CHANGED
@@ -22,8 +22,8 @@ module Metacrunch
|
|
22
22
|
@_job.post_process = callable
|
23
23
|
end
|
24
24
|
|
25
|
-
def transformation(callable, buffer_size: nil)
|
26
|
-
@_job.add_transformation(callable, buffer_size: buffer_size)
|
25
|
+
def transformation(callable, buffer_size: nil, buffer: nil)
|
26
|
+
@_job.add_transformation(callable, buffer_size: buffer_size, buffer: buffer)
|
27
27
|
end
|
28
28
|
|
29
29
|
def options(require_args: false, &block)
|
data/lib/metacrunch/version.rb
CHANGED
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: metacrunch
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 4.0
|
4
|
+
version: 4.1.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- René Sprotte
|
@@ -10,7 +10,7 @@ authors:
|
|
10
10
|
autorequire:
|
11
11
|
bindir: exe
|
12
12
|
cert_chain: []
|
13
|
-
date: 2017-09
|
13
|
+
date: 2017-10-09 00:00:00.000000000 Z
|
14
14
|
dependencies:
|
15
15
|
- !ruby/object:Gem::Dependency
|
16
16
|
name: activesupport
|