metacrunch 4.0.3 → 4.1.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 6cd1ffde611aeb9af20078f87408261f3dfa1f85
4
- data.tar.gz: 279a857dbb4907237a86deeac815feeebe9b20dc
3
+ metadata.gz: a192155704d21d4792eab70bb1122bcd92a14d84
4
+ data.tar.gz: 07cf95291da074427756f363621947baaa248d95
5
5
  SHA512:
6
- metadata.gz: a3a0257a6bf2fc1cdac8ce1414f7b1c1185c218983412dce4ba0f98f2f8a4cd9719acc105c12e96e12ca32ebcbbaa1ac403774354b451c9044697833d087596f
7
- data.tar.gz: f16c860f681612118e43213406e5b034cf648a21574d568d655c9e98d8ac2b29a70eb66333dbedde0bfb9a55a1566dbdbaee2000731e9eefd377a50586047369
6
+ metadata.gz: 1f360eebf44dc1b2a3025194c592e4bf01787c6637feec58aba758826c8d73eb4ec119730f539321911237cd395172b1c4a16cbdcc5d2728ed3830cc5fe64957
7
+ data.tar.gz: fa06ae98c1a2d4d8d7b4c7eaad845526df3ee5c12bbd1c6902a5ac90e9c32c1bab43b4886d13a40b691f7bff9c09d745ef60e61ac038a8163534b6fd307c6471
data/Readme.md CHANGED
@@ -102,19 +102,27 @@ transformation MyTransformation.new
102
102
 
103
103
  Sometimes it is useful to buffer data between transformation steps to allow a transformation to work on larger bulks of data. metacrunch uses a simple transformation buffer to achieve this.
104
104
 
105
- To use a transformation buffer pass the buffer size as an option to the transformation.
105
+ To use a transformation buffer add the `:buffer` option to your transformation. You can pass a positive integer value as a buffer size, or as an advanced option you can pass a `Proc` object. The buffer flushes every time the buffer reaches the given size or if the `Proc` returns `true`.
106
106
 
107
107
  ```ruby
108
108
  # File: my_etl_job.metacrunch
109
109
 
110
110
  source 1..95 # A range responds to #each and is a valid source
111
111
 
112
+ # A buffer with a fixed size
112
113
  transformation ->(bulk) {
113
114
  # this transformation is called when the buffer
114
115
  # is filled with 10 objects or if the source has
115
116
  # yielded the last data object.
116
117
  # bulk would be: [1,...,10], [11,...,20], ..., [91,...,95]
117
- }, buffer_size: 10
118
+ }, buffer: 10
119
+
120
+ # A buffer that uses a Proc
121
+ transformation ->(bulk) {
122
+ # ...
123
+ }, buffer: -> {
124
+ true if some_condition
125
+ }
118
126
  ```
119
127
 
120
128
  #### Defining a destination
@@ -163,7 +171,7 @@ In this example we declare two options `log_level` and `database_url`. `log_leve
163
171
  To set/override these options use the command line.
164
172
 
165
173
  ```
166
- $ bundle exec metacrunch my_etl_job.metacrunch --log-level debug
174
+ $ metacrunch my_etl_job.metacrunch --log-level debug
167
175
  ```
168
176
 
169
177
  This will set the `options[:log_level]` to `debug`.
@@ -171,9 +179,8 @@ This will set the `options[:log_level]` to `debug`.
171
179
  To get a list of available options for a job, use `--help` on the command line.
172
180
 
173
181
  ```
174
- $ bundle exec metacrunch my_etl_job.metacrunch --help
182
+ $ metacrunch my_etl_job.metacrunch --help
175
183
 
176
- Usage: metacrunch [options] JOB_FILE [job-options] [ARGS]
177
184
  Job options:
178
185
  -l, --log-level LEVEL Log level (debug,info,warn,error)
179
186
  DEFAULT: info
@@ -215,12 +222,17 @@ If you use [Bundler](http://bundler.io) to manage dependencies for your jobs mak
215
222
  $ bundle exec metacrunch my_etl_job.metacrunch
216
223
  ```
217
224
 
218
- Depending on your environment `bundle exec` may not be required (e.g. if you have rubygems-bundler installed) but we recommend using it whenever you have a Gemfile you like to use. When using Bundler make sure to add `gem "metacrunch"` to the Gemfile.
225
+ In your job file use `Bundler.require` to require the dependencies from your `Gemfile`.
226
+
227
+ ```ruby
228
+ # File: my_etl_job.metacrunch
229
+ Bundler.require
230
+ ```
219
231
 
220
232
  Use the following syntax to run a metacrunch job
221
233
 
222
234
  ```
223
- $ [bundle exec] metacrunch [COMMAND_OPTIONS] JOB_FILE [JOB_OPTIONS] [JOB_ARGS...]
235
+ $ [bundle exec] metacrunch [options] JOB_FILE [job-options] [ARGS...]
224
236
  ```
225
237
 
226
238
 
@@ -345,6 +357,15 @@ destination MyDestination.new
345
357
 
346
358
  ```
347
359
 
360
+ Official extension packages
361
+ ---------------------------
362
+
363
+ * [metacrunch-db](https://github.com/ubpb/metacrunch-db): SQL Database package
364
+ * [metacrunch-file](https://github.com/ubpb/metacrunch-file): File package
365
+ * [metacrunch-elasticsearch](https://github.com/ubpb/metacrunch-elasticsearch): [Elasticsearch](https://www.elastic.co) package
366
+ * [metacrunch-redis](https://github.com/ubpb/metacrunch-redis): [Redis](https://redis.io) package
367
+ * [metacrunch-marcxml](https://github.com/ubpb/metacrunch-marcxml): [MARCXML](http://www.loc.gov/standards/marcxml/) package
368
+
348
369
  Upgrading
349
370
  ---------
350
371
 
@@ -353,7 +374,7 @@ Upgrading
353
374
  When upgrading from metacrunch 3.x, there are some breaking changes you need to address.
354
375
 
355
376
  * There is now only one `source` and `destination`. If you have more than one in your job file the last definition will used.
356
- * There is no `transformation_buffer` anymore. Instead set `buffer_size` as an option to `transformation`.
377
+ * There is no `transformation_buffer` anymore. Instead set `buffer` as an option to `transformation`.
357
378
  * `transformation`, `pre_process` and `post_process` can't be implemented using a block anymore. Always use a `callable` (E.g. Lambda, Proc or any object responding to `#call`).
358
379
  * When running jobs via the CLI you do not need to separate the arguments passed to metacrunch from the arguments passed to the job with `@@`.
359
380
  * The `args` function used to get the non-option arguments passed to a job has been removed. Use `ARGV` instead.
@@ -14,6 +14,8 @@ module Metacrunch
14
14
  def initialize(file_content = nil, &block)
15
15
  @dsl = Dsl.new(self)
16
16
 
17
+ @deprecator = ActiveSupport::Deprecation.new("5.0.0", "metacrunch")
18
+
17
19
  if file_content
18
20
  @dsl.instance_eval(file_content, "Check your metacrunch Job at Line")
19
21
  elsif block_given?
@@ -61,11 +63,16 @@ module Metacrunch
61
63
  @transformations ||= []
62
64
  end
63
65
 
64
- def add_transformation(callable, buffer_size: nil)
66
+ def add_transformation(callable, buffer_size: nil, buffer: nil)
65
67
  ensure_callable!(callable)
66
68
 
67
- if buffer_size && buffer_size.to_i > 0
68
- transformations << Metacrunch::Job::Buffer.new(buffer_size)
69
+ if buffer_size && buffer_size.is_a?(Numeric)
70
+ @deprecator.deprecation_warning(:buffer_size, :buffer)
71
+ buffer = buffer_size
72
+ end
73
+
74
+ if buffer
75
+ transformations << Metacrunch::Job::Buffer.new(buffer)
69
76
  end
70
77
 
71
78
  transformations << callable
@@ -120,11 +127,13 @@ module Metacrunch
120
127
  def run_transformations(data, flush_buffers: false)
121
128
  transformations.each do |transformation|
122
129
  if transformation.is_a?(Buffer)
130
+ buffer = transformation
131
+
123
132
  if data
124
- data = transformation.buffer(data)
125
- data = transformation.flush if flush_buffers
133
+ data = buffer.buffer(data)
134
+ data = buffer.flush if flush_buffers
126
135
  else
127
- data = transformation.flush
136
+ data = buffer.flush
128
137
  end
129
138
  else
130
139
  data = transformation.call(data) if data
@@ -1,25 +1,30 @@
1
1
  module Metacrunch
2
2
  class Job::Buffer
3
3
 
4
- def initialize(size)
5
- @size = size
4
+ def initialize(size_or_proc)
5
+ @size_or_proc = size_or_proc
6
+ @buffer = []
7
+
8
+ if @size_or_proc.is_a?(Numeric) && @size_or_proc <= 0
9
+ raise ArgumentError, "Buffer size must be a posive number greater that 0."
10
+ end
6
11
  end
7
12
 
8
13
  def buffer(data)
9
- storage << data
10
- flush if storage.count >= @size
14
+ @buffer << data
15
+
16
+ case @size_or_proc
17
+ when Numeric
18
+ flush if @buffer.count >= @size_or_proc.to_i
19
+ when Proc
20
+ flush if @size_or_proc.call == true
21
+ end
11
22
  end
12
23
 
13
24
  def flush
14
- storage
25
+ @buffer
15
26
  ensure
16
- @buffer = nil
17
- end
18
-
19
- private
20
-
21
- def storage
22
- @buffer ||= []
27
+ @buffer = []
23
28
  end
24
29
 
25
30
  end
@@ -22,8 +22,8 @@ module Metacrunch
22
22
  @_job.post_process = callable
23
23
  end
24
24
 
25
- def transformation(callable, buffer_size: nil)
26
- @_job.add_transformation(callable, buffer_size: buffer_size)
25
+ def transformation(callable, buffer_size: nil, buffer: nil)
26
+ @_job.add_transformation(callable, buffer_size: buffer_size, buffer: buffer)
27
27
  end
28
28
 
29
29
  def options(require_args: false, &block)
@@ -1,3 +1,3 @@
1
1
  module Metacrunch
2
- VERSION = "4.0.3"
2
+ VERSION = "4.1.0"
3
3
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: metacrunch
3
3
  version: !ruby/object:Gem::Version
4
- version: 4.0.3
4
+ version: 4.1.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - René Sprotte
@@ -10,7 +10,7 @@ authors:
10
10
  autorequire:
11
11
  bindir: exe
12
12
  cert_chain: []
13
- date: 2017-09-26 00:00:00.000000000 Z
13
+ date: 2017-10-09 00:00:00.000000000 Z
14
14
  dependencies:
15
15
  - !ruby/object:Gem::Dependency
16
16
  name: activesupport