metacrunch 4.0.3 → 4.1.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/Readme.md +29 -8
- data/lib/metacrunch/job.rb +15 -6
- data/lib/metacrunch/job/buffer.rb +17 -12
- data/lib/metacrunch/job/dsl.rb +2 -2
- data/lib/metacrunch/version.rb +1 -1
- metadata +2 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: a192155704d21d4792eab70bb1122bcd92a14d84
|
4
|
+
data.tar.gz: 07cf95291da074427756f363621947baaa248d95
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 1f360eebf44dc1b2a3025194c592e4bf01787c6637feec58aba758826c8d73eb4ec119730f539321911237cd395172b1c4a16cbdcc5d2728ed3830cc5fe64957
|
7
|
+
data.tar.gz: fa06ae98c1a2d4d8d7b4c7eaad845526df3ee5c12bbd1c6902a5ac90e9c32c1bab43b4886d13a40b691f7bff9c09d745ef60e61ac038a8163534b6fd307c6471
|
data/Readme.md
CHANGED
@@ -102,19 +102,27 @@ transformation MyTransformation.new
|
|
102
102
|
|
103
103
|
Sometimes it is useful to buffer data between transformation steps to allow a transformation to work on larger bulks of data. metacrunch uses a simple transformation buffer to achieve this.
|
104
104
|
|
105
|
-
To use a transformation buffer
|
105
|
+
To use a transformation buffer add the `:buffer` option to your transformation. You can pass a positive integer value as a buffer size, or as an advanced option you can pass a `Proc` object. The buffer flushes every time the buffer reaches the given size or if the `Proc` returns `true`.
|
106
106
|
|
107
107
|
```ruby
|
108
108
|
# File: my_etl_job.metacrunch
|
109
109
|
|
110
110
|
source 1..95 # A range responds to #each and is a valid source
|
111
111
|
|
112
|
+
# A buffer with a fixed size
|
112
113
|
transformation ->(bulk) {
|
113
114
|
# this transformation is called when the buffer
|
114
115
|
# is filled with 10 objects or if the source has
|
115
116
|
# yielded the last data object.
|
116
117
|
# bulk would be: [1,...,10], [11,...,20], ..., [91,...,95]
|
117
|
-
},
|
118
|
+
}, buffer: 10
|
119
|
+
|
120
|
+
# A buffer that uses a Proc
|
121
|
+
transformation ->(bulk) {
|
122
|
+
# ...
|
123
|
+
}, buffer: -> {
|
124
|
+
true if some_condition
|
125
|
+
}
|
118
126
|
```
|
119
127
|
|
120
128
|
#### Defining a destination
|
@@ -163,7 +171,7 @@ In this example we declare two options `log_level` and `database_url`. `log_leve
|
|
163
171
|
To set/override these options use the command line.
|
164
172
|
|
165
173
|
```
|
166
|
-
$
|
174
|
+
$ metacrunch my_etl_job.metacrunch --log-level debug
|
167
175
|
```
|
168
176
|
|
169
177
|
This will set the `options[:log_level]` to `debug`.
|
@@ -171,9 +179,8 @@ This will set the `options[:log_level]` to `debug`.
|
|
171
179
|
To get a list of available options for a job, use `--help` on the command line.
|
172
180
|
|
173
181
|
```
|
174
|
-
$
|
182
|
+
$ metacrunch my_etl_job.metacrunch --help
|
175
183
|
|
176
|
-
Usage: metacrunch [options] JOB_FILE [job-options] [ARGS]
|
177
184
|
Job options:
|
178
185
|
-l, --log-level LEVEL Log level (debug,info,warn,error)
|
179
186
|
DEFAULT: info
|
@@ -215,12 +222,17 @@ If you use [Bundler](http://bundler.io) to manage dependencies for your jobs mak
|
|
215
222
|
$ bundle exec metacrunch my_etl_job.metacrunch
|
216
223
|
```
|
217
224
|
|
218
|
-
|
225
|
+
In your job file use `Bundler.require` to require the dependencies from your `Gemfile`.
|
226
|
+
|
227
|
+
```ruby
|
228
|
+
# File: my_etl_job.metacrunch
|
229
|
+
Bundler.require
|
230
|
+
```
|
219
231
|
|
220
232
|
Use the following syntax to run a metacrunch job
|
221
233
|
|
222
234
|
```
|
223
|
-
$ [bundle exec] metacrunch [
|
235
|
+
$ [bundle exec] metacrunch [options] JOB_FILE [job-options] [ARGS...]
|
224
236
|
```
|
225
237
|
|
226
238
|
|
@@ -345,6 +357,15 @@ destination MyDestination.new
|
|
345
357
|
|
346
358
|
```
|
347
359
|
|
360
|
+
Official extension packages
|
361
|
+
---------------------------
|
362
|
+
|
363
|
+
* [metacrunch-db](https://github.com/ubpb/metacrunch-db): SQL Database package
|
364
|
+
* [metacrunch-file](https://github.com/ubpb/metacrunch-file): File package
|
365
|
+
* [metacrunch-elasticsearch](https://github.com/ubpb/metacrunch-elasticsearch): [Elasticsearch](https://www.elastic.co) package
|
366
|
+
* [metacrunch-redis](https://github.com/ubpb/metacrunch-redis): [Redis](https://redis.io) package
|
367
|
+
* [metacrunch-marcxml](https://github.com/ubpb/metacrunch-marcxml): [MARCXML](http://www.loc.gov/standards/marcxml/) package
|
368
|
+
|
348
369
|
Upgrading
|
349
370
|
---------
|
350
371
|
|
@@ -353,7 +374,7 @@ Upgrading
|
|
353
374
|
When upgrading from metacrunch 3.x, there are some breaking changes you need to address.
|
354
375
|
|
355
376
|
* There is now only one `source` and `destination`. If you have more than one in your job file the last definition will used.
|
356
|
-
* There is no `transformation_buffer` anymore. Instead set `
|
377
|
+
* There is no `transformation_buffer` anymore. Instead set `buffer` as an option to `transformation`.
|
357
378
|
* `transformation`, `pre_process` and `post_process` can't be implemented using a block anymore. Always use a `callable` (E.g. Lambda, Proc or any object responding to `#call`).
|
358
379
|
* When running jobs via the CLI you do not need to separate the arguments passed to metacrunch from the arguments passed to the job with `@@`.
|
359
380
|
* The `args` function used to get the non-option arguments passed to a job has been removed. Use `ARGV` instead.
|
data/lib/metacrunch/job.rb
CHANGED
@@ -14,6 +14,8 @@ module Metacrunch
|
|
14
14
|
def initialize(file_content = nil, &block)
|
15
15
|
@dsl = Dsl.new(self)
|
16
16
|
|
17
|
+
@deprecator = ActiveSupport::Deprecation.new("5.0.0", "metacrunch")
|
18
|
+
|
17
19
|
if file_content
|
18
20
|
@dsl.instance_eval(file_content, "Check your metacrunch Job at Line")
|
19
21
|
elsif block_given?
|
@@ -61,11 +63,16 @@ module Metacrunch
|
|
61
63
|
@transformations ||= []
|
62
64
|
end
|
63
65
|
|
64
|
-
def add_transformation(callable, buffer_size: nil)
|
66
|
+
def add_transformation(callable, buffer_size: nil, buffer: nil)
|
65
67
|
ensure_callable!(callable)
|
66
68
|
|
67
|
-
if buffer_size && buffer_size.
|
68
|
-
|
69
|
+
if buffer_size && buffer_size.is_a?(Numeric)
|
70
|
+
@deprecator.deprecation_warning(:buffer_size, :buffer)
|
71
|
+
buffer = buffer_size
|
72
|
+
end
|
73
|
+
|
74
|
+
if buffer
|
75
|
+
transformations << Metacrunch::Job::Buffer.new(buffer)
|
69
76
|
end
|
70
77
|
|
71
78
|
transformations << callable
|
@@ -120,11 +127,13 @@ module Metacrunch
|
|
120
127
|
def run_transformations(data, flush_buffers: false)
|
121
128
|
transformations.each do |transformation|
|
122
129
|
if transformation.is_a?(Buffer)
|
130
|
+
buffer = transformation
|
131
|
+
|
123
132
|
if data
|
124
|
-
data =
|
125
|
-
data =
|
133
|
+
data = buffer.buffer(data)
|
134
|
+
data = buffer.flush if flush_buffers
|
126
135
|
else
|
127
|
-
data =
|
136
|
+
data = buffer.flush
|
128
137
|
end
|
129
138
|
else
|
130
139
|
data = transformation.call(data) if data
|
@@ -1,25 +1,30 @@
|
|
1
1
|
module Metacrunch
|
2
2
|
class Job::Buffer
|
3
3
|
|
4
|
-
def initialize(
|
5
|
-
@
|
4
|
+
def initialize(size_or_proc)
|
5
|
+
@size_or_proc = size_or_proc
|
6
|
+
@buffer = []
|
7
|
+
|
8
|
+
if @size_or_proc.is_a?(Numeric) && @size_or_proc <= 0
|
9
|
+
raise ArgumentError, "Buffer size must be a posive number greater that 0."
|
10
|
+
end
|
6
11
|
end
|
7
12
|
|
8
13
|
def buffer(data)
|
9
|
-
|
10
|
-
|
14
|
+
@buffer << data
|
15
|
+
|
16
|
+
case @size_or_proc
|
17
|
+
when Numeric
|
18
|
+
flush if @buffer.count >= @size_or_proc.to_i
|
19
|
+
when Proc
|
20
|
+
flush if @size_or_proc.call == true
|
21
|
+
end
|
11
22
|
end
|
12
23
|
|
13
24
|
def flush
|
14
|
-
|
25
|
+
@buffer
|
15
26
|
ensure
|
16
|
-
@buffer =
|
17
|
-
end
|
18
|
-
|
19
|
-
private
|
20
|
-
|
21
|
-
def storage
|
22
|
-
@buffer ||= []
|
27
|
+
@buffer = []
|
23
28
|
end
|
24
29
|
|
25
30
|
end
|
data/lib/metacrunch/job/dsl.rb
CHANGED
@@ -22,8 +22,8 @@ module Metacrunch
|
|
22
22
|
@_job.post_process = callable
|
23
23
|
end
|
24
24
|
|
25
|
-
def transformation(callable, buffer_size: nil)
|
26
|
-
@_job.add_transformation(callable, buffer_size: buffer_size)
|
25
|
+
def transformation(callable, buffer_size: nil, buffer: nil)
|
26
|
+
@_job.add_transformation(callable, buffer_size: buffer_size, buffer: buffer)
|
27
27
|
end
|
28
28
|
|
29
29
|
def options(require_args: false, &block)
|
data/lib/metacrunch/version.rb
CHANGED
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: metacrunch
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 4.0
|
4
|
+
version: 4.1.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- René Sprotte
|
@@ -10,7 +10,7 @@ authors:
|
|
10
10
|
autorequire:
|
11
11
|
bindir: exe
|
12
12
|
cert_chain: []
|
13
|
-
date: 2017-09
|
13
|
+
date: 2017-10-09 00:00:00.000000000 Z
|
14
14
|
dependencies:
|
15
15
|
- !ruby/object:Gem::Dependency
|
16
16
|
name: activesupport
|