metacrunch 4.0.3 → 4.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 6cd1ffde611aeb9af20078f87408261f3dfa1f85
4
- data.tar.gz: 279a857dbb4907237a86deeac815feeebe9b20dc
3
+ metadata.gz: a192155704d21d4792eab70bb1122bcd92a14d84
4
+ data.tar.gz: 07cf95291da074427756f363621947baaa248d95
5
5
  SHA512:
6
- metadata.gz: a3a0257a6bf2fc1cdac8ce1414f7b1c1185c218983412dce4ba0f98f2f8a4cd9719acc105c12e96e12ca32ebcbbaa1ac403774354b451c9044697833d087596f
7
- data.tar.gz: f16c860f681612118e43213406e5b034cf648a21574d568d655c9e98d8ac2b29a70eb66333dbedde0bfb9a55a1566dbdbaee2000731e9eefd377a50586047369
6
+ metadata.gz: 1f360eebf44dc1b2a3025194c592e4bf01787c6637feec58aba758826c8d73eb4ec119730f539321911237cd395172b1c4a16cbdcc5d2728ed3830cc5fe64957
7
+ data.tar.gz: fa06ae98c1a2d4d8d7b4c7eaad845526df3ee5c12bbd1c6902a5ac90e9c32c1bab43b4886d13a40b691f7bff9c09d745ef60e61ac038a8163534b6fd307c6471
data/Readme.md CHANGED
@@ -102,19 +102,27 @@ transformation MyTransformation.new
102
102
 
103
103
  Sometimes it is useful to buffer data between transformation steps to allow a transformation to work on larger bulks of data. metacrunch uses a simple transformation buffer to achieve this.
104
104
 
105
- To use a transformation buffer pass the buffer size as an option to the transformation.
105
+ To use a transformation buffer add the `:buffer` option to your transformation. You can pass a positive integer value as a buffer size, or as an advanced option you can pass a `Proc` object. The buffer flushes every time the buffer reaches the given size or if the `Proc` returns `true`.
106
106
 
107
107
  ```ruby
108
108
  # File: my_etl_job.metacrunch
109
109
 
110
110
  source 1..95 # A range responds to #each and is a valid source
111
111
 
112
+ # A buffer with a fixed size
112
113
  transformation ->(bulk) {
113
114
  # this transformation is called when the buffer
114
115
  # is filled with 10 objects or if the source has
115
116
  # yielded the last data object.
116
117
  # bulk would be: [1,...,10], [11,...,20], ..., [91,...,95]
117
- }, buffer_size: 10
118
+ }, buffer: 10
119
+
120
+ # A buffer that uses a Proc
121
+ transformation ->(bulk) {
122
+ # ...
123
+ }, buffer: -> {
124
+ true if some_condition
125
+ }
118
126
  ```
119
127
 
120
128
  #### Defining a destination
@@ -163,7 +171,7 @@ In this example we declare two options `log_level` and `database_url`. `log_leve
163
171
  To set/override these options use the command line.
164
172
 
165
173
  ```
166
- $ bundle exec metacrunch my_etl_job.metacrunch --log-level debug
174
+ $ metacrunch my_etl_job.metacrunch --log-level debug
167
175
  ```
168
176
 
169
177
  This will set the `options[:log_level]` to `debug`.
@@ -171,9 +179,8 @@ This will set the `options[:log_level]` to `debug`.
171
179
  To get a list of available options for a job, use `--help` on the command line.
172
180
 
173
181
  ```
174
- $ bundle exec metacrunch my_etl_job.metacrunch --help
182
+ $ metacrunch my_etl_job.metacrunch --help
175
183
 
176
- Usage: metacrunch [options] JOB_FILE [job-options] [ARGS]
177
184
  Job options:
178
185
  -l, --log-level LEVEL Log level (debug,info,warn,error)
179
186
  DEFAULT: info
@@ -215,12 +222,17 @@ If you use [Bundler](http://bundler.io) to manage dependencies for your jobs mak
215
222
  $ bundle exec metacrunch my_etl_job.metacrunch
216
223
  ```
217
224
 
218
- Depending on your environment `bundle exec` may not be required (e.g. if you have rubygems-bundler installed) but we recommend using it whenever you have a Gemfile you like to use. When using Bundler make sure to add `gem "metacrunch"` to the Gemfile.
225
+ In your job file use `Bundler.require` to require the dependencies from your `Gemfile`.
226
+
227
+ ```ruby
228
+ # File: my_etl_job.metacrunch
229
+ Bundler.require
230
+ ```
219
231
 
220
232
  Use the following syntax to run a metacrunch job
221
233
 
222
234
  ```
223
- $ [bundle exec] metacrunch [COMMAND_OPTIONS] JOB_FILE [JOB_OPTIONS] [JOB_ARGS...]
235
+ $ [bundle exec] metacrunch [options] JOB_FILE [job-options] [ARGS...]
224
236
  ```
225
237
 
226
238
 
@@ -345,6 +357,15 @@ destination MyDestination.new
345
357
 
346
358
  ```
347
359
 
360
+ Official extension packages
361
+ ---------------------------
362
+
363
+ * [metacrunch-db](https://github.com/ubpb/metacrunch-db): SQL Database package
364
+ * [metacrunch-file](https://github.com/ubpb/metacrunch-file): File package
365
+ * [metacrunch-elasticsearch](https://github.com/ubpb/metacrunch-elasticsearch): [Elasticsearch](https://www.elastic.co) package
366
+ * [metacrunch-redis](https://github.com/ubpb/metacrunch-redis): [Redis](https://redis.io) package
367
+ * [metacrunch-marcxml](https://github.com/ubpb/metacrunch-marcxml): [MARCXML](http://www.loc.gov/standards/marcxml/) package
368
+
348
369
  Upgrading
349
370
  ---------
350
371
 
@@ -353,7 +374,7 @@ Upgrading
353
374
  When upgrading from metacrunch 3.x, there are some breaking changes you need to address.
354
375
 
355
376
  * There is now only one `source` and `destination`. If you have more than one in your job file the last definition will used.
356
- * There is no `transformation_buffer` anymore. Instead set `buffer_size` as an option to `transformation`.
377
+ * There is no `transformation_buffer` anymore. Instead set `buffer` as an option to `transformation`.
357
378
  * `transformation`, `pre_process` and `post_process` can't be implemented using a block anymore. Always use a `callable` (E.g. Lambda, Proc or any object responding to `#call`).
358
379
  * When running jobs via the CLI you do not need to separate the arguments passed to metacrunch from the arguments passed to the job with `@@`.
359
380
  * The `args` function used to get the non-option arguments passed to a job has been removed. Use `ARGV` instead.
@@ -14,6 +14,8 @@ module Metacrunch
14
14
  def initialize(file_content = nil, &block)
15
15
  @dsl = Dsl.new(self)
16
16
 
17
+ @deprecator = ActiveSupport::Deprecation.new("5.0.0", "metacrunch")
18
+
17
19
  if file_content
18
20
  @dsl.instance_eval(file_content, "Check your metacrunch Job at Line")
19
21
  elsif block_given?
@@ -61,11 +63,16 @@ module Metacrunch
61
63
  @transformations ||= []
62
64
  end
63
65
 
64
- def add_transformation(callable, buffer_size: nil)
66
+ def add_transformation(callable, buffer_size: nil, buffer: nil)
65
67
  ensure_callable!(callable)
66
68
 
67
- if buffer_size && buffer_size.to_i > 0
68
- transformations << Metacrunch::Job::Buffer.new(buffer_size)
69
+ if buffer_size && buffer_size.is_a?(Numeric)
70
+ @deprecator.deprecation_warning(:buffer_size, :buffer)
71
+ buffer = buffer_size
72
+ end
73
+
74
+ if buffer
75
+ transformations << Metacrunch::Job::Buffer.new(buffer)
69
76
  end
70
77
 
71
78
  transformations << callable
@@ -120,11 +127,13 @@ module Metacrunch
120
127
  def run_transformations(data, flush_buffers: false)
121
128
  transformations.each do |transformation|
122
129
  if transformation.is_a?(Buffer)
130
+ buffer = transformation
131
+
123
132
  if data
124
- data = transformation.buffer(data)
125
- data = transformation.flush if flush_buffers
133
+ data = buffer.buffer(data)
134
+ data = buffer.flush if flush_buffers
126
135
  else
127
- data = transformation.flush
136
+ data = buffer.flush
128
137
  end
129
138
  else
130
139
  data = transformation.call(data) if data
@@ -1,25 +1,30 @@
1
1
  module Metacrunch
2
2
  class Job::Buffer
3
3
 
4
- def initialize(size)
5
- @size = size
4
+ def initialize(size_or_proc)
5
+ @size_or_proc = size_or_proc
6
+ @buffer = []
7
+
8
+ if @size_or_proc.is_a?(Numeric) && @size_or_proc <= 0
9
+ raise ArgumentError, "Buffer size must be a posive number greater that 0."
10
+ end
6
11
  end
7
12
 
8
13
  def buffer(data)
9
- storage << data
10
- flush if storage.count >= @size
14
+ @buffer << data
15
+
16
+ case @size_or_proc
17
+ when Numeric
18
+ flush if @buffer.count >= @size_or_proc.to_i
19
+ when Proc
20
+ flush if @size_or_proc.call == true
21
+ end
11
22
  end
12
23
 
13
24
  def flush
14
- storage
25
+ @buffer
15
26
  ensure
16
- @buffer = nil
17
- end
18
-
19
- private
20
-
21
- def storage
22
- @buffer ||= []
27
+ @buffer = []
23
28
  end
24
29
 
25
30
  end
@@ -22,8 +22,8 @@ module Metacrunch
22
22
  @_job.post_process = callable
23
23
  end
24
24
 
25
- def transformation(callable, buffer_size: nil)
26
- @_job.add_transformation(callable, buffer_size: buffer_size)
25
+ def transformation(callable, buffer_size: nil, buffer: nil)
26
+ @_job.add_transformation(callable, buffer_size: buffer_size, buffer: buffer)
27
27
  end
28
28
 
29
29
  def options(require_args: false, &block)
@@ -1,3 +1,3 @@
1
1
  module Metacrunch
2
- VERSION = "4.0.3"
2
+ VERSION = "4.1.0"
3
3
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: metacrunch
3
3
  version: !ruby/object:Gem::Version
4
- version: 4.0.3
4
+ version: 4.1.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - René Sprotte
@@ -10,7 +10,7 @@ authors:
10
10
  autorequire:
11
11
  bindir: exe
12
12
  cert_chain: []
13
- date: 2017-09-26 00:00:00.000000000 Z
13
+ date: 2017-10-09 00:00:00.000000000 Z
14
14
  dependencies:
15
15
  - !ruby/object:Gem::Dependency
16
16
  name: activesupport