metacrunch 4.2.0 → 4.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
- SHA1:
3
- metadata.gz: 1fc2560b2bb768757c384d71a806509d594e7610
4
- data.tar.gz: 8cbb0e384582550d29f94daa2d7521942cbe14c9
2
+ SHA256:
3
+ metadata.gz: 6c1facce15096151df3186f7d48245b1c06bebb231b9d6dabeb70d569c0bb06c
4
+ data.tar.gz: 41487b86683753e2f8eba95d1e9dbea47efa0af5c36b13c1bb343f0b59f714ab
5
5
  SHA512:
6
- metadata.gz: eaf6d9b6b72b7cadc92dae0d4d9e1204f6d43fe909f2f5f60bda85ad0407aa6fffe567a119361d3cef3fa15ed2d83ca5abc44d314eafc3f59b7b936c49a07ab5
7
- data.tar.gz: 0204e2e3284a53c007ea086b883ff52fbb87e0731616fe83c1c4d636134985506c39d885ec8ff4bc4cc8708e3a51d02c27e28fe79b732798f7f3bfc24905ac6d
6
+ metadata.gz: '04015927726756e1f5839d4b4bceac287400b729a1828a3652d15fc456720245ade702d8b1fddec83aaf41418df2c404006f0cd89805cdfc0aba8bd04e737579'
7
+ data.tar.gz: f6e8d9719618e8c1f6c8b68c8a28c008f9f710929b22d878352076cd3bf7f99e4503d7604e78472e8c70d052f3aa07eb8ccc0d778ed31dc19d040de893f4d484
@@ -0,0 +1,35 @@
1
+ version: 2.1
2
+ orbs:
3
+ ruby: circleci/ruby@1.1.1
4
+
5
+ jobs:
6
+ build:
7
+ docker:
8
+ - image: circleci/ruby:2.6-node-browsers
9
+
10
+ working_directory: ~/repo
11
+
12
+ steps:
13
+ - checkout
14
+
15
+ - run:
16
+ name: Install dependencies
17
+ command: bundle install --jobs=4 --retry=3 --path vendor/bundle
18
+
19
+ - run:
20
+ name: Install CodeClimate test coverage reporter
21
+ command: |
22
+ curl -L https://codeclimate.com/downloads/test-reporter/test-reporter-latest-linux-amd64 > ./cc-test-reporter
23
+ chmod +x ./cc-test-reporter
24
+ ./cc-test-reporter before-build
25
+
26
+ - run:
27
+ name: Run tests
28
+ command: |
29
+ mkdir /tmp/test-results
30
+ bundle exec rspec --format progress --format RspecJunitFormatter --out /tmp/test-results/rspec.xml
31
+
32
+ - run:
33
+ name: Upload test coverage report to CodeClimate
34
+ command: ./cc-test-reporter after-build --exit-code $?
35
+
data/Gemfile CHANGED
@@ -5,7 +5,6 @@ gemspec
5
5
  group :development do
6
6
  gem "bundler", ">= 1.15"
7
7
  gem "rake", ">= 12.1"
8
- gem "rspec", ">= 3.5.0", "< 4.0.0"
9
8
 
10
9
  if !ENV["CI"]
11
10
  gem "pry-byebug", ">= 3.5.0"
@@ -13,5 +12,7 @@ group :development do
13
12
  end
14
13
 
15
14
  group :test do
16
- gem "simplecov", ">= 0.15.0"
15
+ gem "rspec", ">= 3.5.0", "< 4.0.0"
16
+ gem "rspec_junit_formatter", ">= 0.3.0"
17
+ gem "simplecov", "= 0.17.1"
17
18
  end
data/Readme.md CHANGED
@@ -3,7 +3,8 @@ metacrunch
3
3
 
4
4
  [![Gem Version](https://badge.fury.io/rb/metacrunch.svg)](http://badge.fury.io/rb/metacrunch)
5
5
  [![Code Climate](https://codeclimate.com/github/ubpb/metacrunch/badges/gpa.svg)](https://codeclimate.com/github/ubpb/metacrunch)
6
- [![Build Status](https://travis-ci.org/ubpb/metacrunch.svg)](https://travis-ci.org/ubpb/metacrunch)
6
+ [![Test Coverage](https://codeclimate.com/github/ubpb/metacrunch/badges/coverage.svg)](https://codeclimate.com/github/ubpb/metacrunch/coverage)
7
+ [![CircleCI](https://circleci.com/gh/ubpb/metacrunch.svg?style=svg)](https://circleci.com/gh/ubpb/metacrunch)
7
8
 
8
9
  metacrunch is a simple and lightweight data processing and ETL ([Extract-Transform-Load](http://en.wikipedia.org/wiki/Extract,_transform,_load))
9
10
  toolkit for Ruby.
@@ -28,7 +29,7 @@ metacrunch gives you a simple DSL ([Domain-specific language](https://en.wikiped
28
29
 
29
30
  Let's walk through the main steps of creating ETL jobs with metacrunch. For a collection of working examples check out our [metacrunch-demo](https://github.com/ubpb/metacrunch-demo) repository.
30
31
 
31
- #### It's Ruby
32
+ ### It's Ruby
32
33
 
33
34
  Every `.metacrunch` job is a regular Ruby file and you can use any valid Ruby code like declaring methods, classes, variables, requiring other Ruby
34
35
  files and so on.
@@ -50,12 +51,14 @@ require "SomeGem"
50
51
  require_relative "./some/other/ruby/file"
51
52
  ```
52
53
 
53
- #### Defining a source
54
+ ### Defining a source
54
55
 
55
- A source is an object that reads data (e.g. from a file or an external system) into the metacrunch processing pipeline. Implementing sources is easy – a source can be any Ruby object that responds to `#each`. For more information on how to implement sources [see notes below](#implementing-sources).
56
+ A source is an object that emits data objects (e.g. from a file or an external system) into the metacrunch processing pipeline. Implementing sources is easy – a source is a Ruby `Enumerable` (any object that responds to the `#each` method). For more information on how to implement sources [see notes below](#implementing-sources).
56
57
 
57
58
  You must declare a source to allow a job to run.
58
59
 
60
+ A source iterates over it's entries and emits every entry as a data object into the transformation pipeline, by passing it to the first transformation.
61
+
59
62
  ```ruby
60
63
  # File: my_etl_job.metacrunch
61
64
 
@@ -66,15 +69,15 @@ source Metacrunch::File::Source.new(ARGV)
66
69
  source MySource.new
67
70
  ```
68
71
 
69
- #### Defining transformations
72
+ ### Defining transformations
70
73
 
71
- To process, transform or manipulate data use the `#transformation` hook. A transformation is implemented with a `callable` object (any Ruby object that responds to `#call`. E.g. a lambda). To learn more about transformations check the section about [implementing transformations](#implementing-transformations) below.
74
+ To process, transform or manipulate data use the `#transformation` hook. A transformation is implemented with a `callable` object (any Ruby object that responds to `#call`. E.g. a `Proc`). To learn more about transformations check the section about [implementing transformations](#implementing-transformations) below.
72
75
 
73
- The current data object (the last object yielded by the source) will be passed to the first transformation as a parameter. The return value of a transformation will then be passed to the next transformation and so on.
76
+ The *current data object* (the current object emitted by the source) will be passed to the first transformation as a parameter. The return value of a transformation will then be passed to the next transformation and so on.
74
77
 
75
- There are two exceptions to that rule.
78
+ There are two exceptions to that rule:
76
79
 
77
- * If you return `nil` the current data object will be dismissed and the next transformation won't be called.
80
+ * If you return `nil` the current data object will be dismissed and the next transformation won't be called. The process continues with the next data object that will be emitted by the source and the first transformation.
78
81
  * If you return an `Enumerator` the object will be expanded and the following transformations will be called with each element of the `Enumerator`.
79
82
 
80
83
  ```ruby
@@ -85,27 +88,29 @@ source [1,2,3,4,5,6,7,8,9]
85
88
 
86
89
  # A transformation is implemented with a `callable` object (any
87
90
  # object that responds to #call).
88
- # Lambdas responds to #call
91
+ # Proc responds to #call
89
92
  transformation ->(number) {
90
- # Called for each data object that has been read by a source.
93
+ # Called for each data object that has been emitted by a source.
91
94
  # You must return the data to keep it in the pipeline. Dismiss the
92
95
  # data conditionally by returning nil.
93
96
  number if number.odd?
94
97
  }
95
98
 
99
+ # Only called for odd numbers as even numbers gets dismissed in the previous
100
+ # transformation.
96
101
  transformation ->(odd_number) {
97
102
  odd_number * 2
98
103
  }
99
104
 
100
- # MyTransformation implements #call
105
+ # MyTransformation implements #call. Gets called with the prevous number times 2.
101
106
  transformation MyTransformation.new
102
107
  ```
103
108
 
104
- #### Using a transformation buffer
109
+ ### Using a transformation buffer
105
110
 
106
111
  Sometimes it is useful to buffer data between transformation steps to allow a transformation to work on larger bulks of data. metacrunch uses a simple transformation buffer to achieve this.
107
112
 
108
- To use a transformation buffer add the `:buffer` option to your transformation. You can pass a positive integer value as a buffer size, or as an advanced option you can pass a `Proc` object. The buffer flushes every time the buffer reaches the given size or if the `Proc` returns `true`.
113
+ To use a transformation buffer add the `:buffer` option to your transformation. You can pass a positive integer value as a buffer size, or as an advanced option you can pass a `Proc` object. The buffer flushes every time the buffer reaches the given size or if the `Proc` returns `true`. The buffer also flushes after the last data object was emitted by the source.
109
114
 
110
115
  ```ruby
111
116
  # File: my_etl_job.metacrunch
@@ -128,11 +133,9 @@ transformation ->(bulk) {
128
133
  }
129
134
  ```
130
135
 
131
- #### Defining a destination
132
-
133
- A destination is an object that writes the transformed data to an external system. Implementing destinations is easy – [see notes below](#implementing-destinations). A destination receives the return value from the last transformation as a parameter if the return value from the last transformation was not `nil`.
136
+ ### Defining a destination
134
137
 
135
- Using destinations is optional. In most cases using the last transformation to write the data to an external system is fine. Destinations are useful if the required code is more complex.
138
+ A destination is an object that writes the transformed data to an external system (e.g. a file, database etc.). Implementing destinations is easy – [see notes below](#implementing-destinations). A destination receives the return value from the last transformation as a parameter if the return value from the last transformation was not `nil`.
136
139
 
137
140
  ```ruby
138
141
  # File: my_etl_job.metacrunch
@@ -140,20 +143,20 @@ Using destinations is optional. In most cases using the last transformation to w
140
143
  destination MyDestination.new
141
144
  ```
142
145
 
143
- #### Pre/Post process
146
+ ### Pre/Post process
144
147
 
145
148
  To run arbitrary code before the first transformation is run on the first data object use the `#pre_process` hook. To run arbitrary code after the last transformation is run on the last data object use `#post_process`. Like transformations, `#post_process` and `#pre_process` must be implemented using a `callable` object.
146
149
 
147
150
  ```ruby
148
151
  pre_process -> {
149
- # Lambdas responds to #call
152
+ # Proc responds to #call
150
153
  }
151
154
 
152
155
  # MyCallable class defines #call
153
156
  post_process MyCallable.new
154
157
  ```
155
158
 
156
- #### Defining job options
159
+ ### Defining job options
157
160
 
158
161
  metacrunch has build-in support to parameterize jobs. Using the `options` hook you can declare options that can be set/overridden by the CLI when [running your jobs](#running-etl-jobs).
159
162
 
@@ -191,9 +194,7 @@ Job options:
191
194
  REQUIRED
192
195
  ```
193
196
 
194
- To learn more about defining options take a look at the [reference below](#defining-job-options).
195
-
196
- #### Require non-option arguments
197
+ ### Require non-option arguments
197
198
 
198
199
  All non-option arguments that get passed to the job when running are available to the `ARGV` constant. If your job requires such arguments (e.g. if you work with a list of files) you can require it.
199
200
 
@@ -242,11 +243,11 @@ $ [bundle exec] metacrunch [options] JOB_FILE [job-options] [ARGS...]
242
243
  Implementing sources
243
244
  --------------------
244
245
 
245
- A metacrunch source is any Ruby object that responds to the `each` method that yields data objects one by one.
246
+ A metacrunch source is any Ruby `Enumerable` object (an object that responds to the `#each` method) that yields data objects one by one.
246
247
 
247
248
  The data is usually a `Hash` instance, but could be other structures as long as the rest of your pipeline is expecting it.
248
249
 
249
- Any `enumerable` object (e.g. `Array`) responds to `each` and can be used as a source in metacrunch.
250
+ Any `Enumerable` object (e.g. `Array`) responds to `#each` and can be used as a source in metacrunch.
250
251
 
251
252
  ```ruby
252
253
  # File: my_etl_job.metacrunch
@@ -288,9 +289,9 @@ source MyCsvSource.new("my_data.csv")
288
289
  Implementing transformations
289
290
  ----------------------------
290
291
 
291
- A metacrunch transformation is implemented as a `callable` object. A `callable` in Ruby is any object that responds to the `call` method.
292
+ A metacrunch transformation is implemented as a `callable` object. A `callable` in Ruby is any object that responds to the `#call` method.
292
293
 
293
- Procs and Lambdas in Ruby respond to `call`. They can be used to implement transformations inline.
294
+ `Proc`s in Ruby respond to `#call`. They can be used to implement transformations inline.
294
295
 
295
296
  ```ruby
296
297
  # File: my_etl_job.metacrunch
@@ -329,7 +330,7 @@ transformation MyTransformation.new
329
330
  Implementing destinations
330
331
  -------------------------
331
332
 
332
- A destination is any Ruby object that responds to `write(data)` and `close`.
333
+ A destination is any Ruby object that responds to `#write(data)` and `#close`.
333
334
 
334
335
  Like sources you are encouraged to implement destinations as classes.
335
336
 
@@ -51,7 +51,7 @@ private
51
51
  def run!(job_file)
52
52
  if job_file.blank?
53
53
  error "You need to provide a job file."
54
- elsif !File.exists?(job_file)
54
+ elsif !File.exist?(job_file)
55
55
  error "The file `#{job_file}` doesn't exist."
56
56
  else
57
57
  job_filename = File.expand_path(job_file)
@@ -1,3 +1,3 @@
1
1
  module Metacrunch
2
- VERSION = "4.2.0"
2
+ VERSION = "4.2.1"
3
3
  end
metadata CHANGED
@@ -1,16 +1,15 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: metacrunch
3
3
  version: !ruby/object:Gem::Version
4
- version: 4.2.0
4
+ version: 4.2.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - René Sprotte
8
8
  - Michael Sievers
9
9
  - Marcel Otto
10
- autorequire:
11
10
  bindir: exe
12
11
  cert_chain: []
13
- date: 2017-10-10 00:00:00.000000000 Z
12
+ date: 1980-01-02 00:00:00.000000000 Z
14
13
  dependencies:
15
14
  - !ruby/object:Gem::Dependency
16
15
  name: activesupport
@@ -40,16 +39,14 @@ dependencies:
40
39
  - - ">="
41
40
  - !ruby/object:Gem::Version
42
41
  version: 0.8.1
43
- description:
44
- email:
45
42
  executables:
46
43
  - metacrunch
47
44
  extensions: []
48
45
  extra_rdoc_files: []
49
46
  files:
47
+ - ".circleci/config.yml"
50
48
  - ".gitignore"
51
49
  - ".rspec"
52
- - ".travis.yml"
53
50
  - Gemfile
54
51
  - License.txt
55
52
  - Rakefile
@@ -73,7 +70,6 @@ homepage: http://github.com/ubpb/metacrunch
73
70
  licenses:
74
71
  - MIT
75
72
  metadata: {}
76
- post_install_message:
77
73
  rdoc_options: []
78
74
  require_paths:
79
75
  - lib
@@ -88,9 +84,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
88
84
  - !ruby/object:Gem::Version
89
85
  version: '0'
90
86
  requirements: []
91
- rubyforge_project:
92
- rubygems_version: 2.6.11
93
- signing_key:
87
+ rubygems_version: 3.6.9
94
88
  specification_version: 4
95
89
  summary: Data processing and ETL toolkit for Ruby
96
90
  test_files: []
data/.travis.yml DELETED
@@ -1,4 +0,0 @@
1
- language: ruby
2
- rvm:
3
- - ruby-2.3.5
4
- - ruby-2.4.2