kiba 0.5.0

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: 502470fc246c67daaa681ca78fb5337899cca7fa
4
+ data.tar.gz: a125ff166156c79e5a0b0d67bf9dfb980b7e0dba
5
+ SHA512:
6
+ metadata.gz: ccefac21a401ca860d34c89fdda2473e5a30b51d61223fc8cced50165786f41f328014144bd31486522db34c4e801190060d250cad20408745c691ca937ea1ea
7
+ data.tar.gz: 6c0bee993d99fdec14504e6811549af4dce40cd8930be951a142f8793da69956283ed7ccf6acececa4a6a108e9e9f424d2190b6ba7d9e45f55207b3ee240418d
@@ -0,0 +1,2 @@
1
+ .ruby-version
2
+ Gemfile.lock
@@ -0,0 +1,6 @@
1
+ language: ruby
2
+ rvm:
3
+ - 2.2
4
+ - 2.1
5
+ - 2.0
6
+ - jruby
@@ -0,0 +1,4 @@
1
+ 0.5.0
2
+ -----
3
+
4
+ - Initial release
data/Gemfile ADDED
@@ -0,0 +1,3 @@
1
+ source 'https://rubygems.org'
2
+
3
+ gemspec
@@ -0,0 +1,268 @@
1
+ Writing reliable, concise, well-tested & maintainable data-processing code is tricky.
2
+
3
+ Kiba lets you define and run such high-quality ETL jobs, using Ruby.
4
+
5
+ **Note: this is EARLY WORK - the API/syntax may change at any time.**
6
+
7
+ [![Build Status](https://travis-ci.org/thbar/kiba.svg?branch=master)](https://travis-ci.org/thbar/kiba) [![Code Climate](https://codeclimate.com/github/thbar/kiba/badges/gpa.svg)](https://codeclimate.com/github/thbar/kiba) [![Dependency Status](https://gemnasium.com/thbar/kiba.svg)](https://gemnasium.com/thbar/kiba)
8
+
9
+ ## How do you define ETL jobs with Kiba?
10
+
11
+ Kiba provides you with a DSL to define ETL jobs:
12
+
13
+ ```ruby
14
+ # declare a ruby method here, for quick reusable logic
15
+ def parse_french_date(date)
16
+ Date.strptime(date, '%d/%m/%Y')
17
+ end
18
+
19
+ # or better, include a ruby file which loads reusable assets
20
+ # eg: commonly used sources / destinations / transforms, under unit-test
21
+ require_relative 'common'
22
+
23
+ # declare a source where to take data from (you implement it - see notes below)
24
+ source MyCsvSource, 'input.csv'
25
+
26
+ # declare a row transform to process a given field
27
+ transform do |row|
28
+ row[:birth_date] = parse_french_date(row[:birth_date])
29
+ # return to keep in the pipeline
30
+ row
31
+ end
32
+
33
+ # declare another row transform, dismissing rows conditionally by returning nil
34
+ transform do |row|
35
+ row[:birth_date].year < 2000 ? row : nil
36
+ end
37
+
38
+ # declare a row transform as a class, which can be tested properly
39
+ transform ComplianceCheckTransform, eula: 2015
40
+
41
+ # before declaring a definition, maybe you'll want to retrieve credentials
42
+ config = YAML.load(IO.read('config.yml'))
43
+
44
+ # declare a destination - like source, you implement it (see below)
45
+ destination MyDatabaseDestination, config['my_database']
46
+
47
+ # declare a post-processor: a block called after all rows are successfully processed
48
+ post_process do
49
+ # do something
50
+ end
51
+ ```
52
+
53
+ The combination of sources, transforms, destinations and post-processors defines the data processing pipeline.
54
+
55
+ Note: you are advised to store your ETL definitions as files with the extension `.etl` (rather than `.rb`). This will make sure you do not end up loading them by mistake from another component (eg: a Rails app).
56
+
57
+ ## How do you run your ETL jobs?
58
+
59
+ You can use the provided command-line:
60
+
61
+ ```
62
+ bundle exec kiba my-data-processing-script.etl
63
+ ```
64
+
65
+ This command essentially starts a two-step process:
66
+
67
+ ```ruby
68
+ script_content = IO.read(filename)
69
+ # pass the filename to get for line numbers on errors
70
+ job_definition = Kiba.parse(script_content, filename)
71
+ Kiba.run(job_definition)
72
+ ```
73
+
74
+ `Kiba.parse` evaluates your ETL Ruby code to register sources, transforms, destinations and post-processors in a job definition. It is important to understand that you can use Ruby logic at the DSL parsing time. This means that such code is possible, provided the CSV files are available at parsing time:
75
+
76
+ ```ruby
77
+ Dir['to_be_processed/*.csv'].each do |f|
78
+ source MyCsvSource, file
79
+ end
80
+ ```
81
+
82
+ Once the job definition is loaded, `Kiba.run` will use that information to do the actual row-by-row processing. It currently uses a simple row-by-row, single-threaded processing that will stop at the first error encountered.
83
+
84
+ ## Implementing ETL sources
85
+
86
+ In Kiba, you are responsible for implementing the sources that do the extraction of data.
87
+
88
+ Sources are classes implementing:
89
+ - a constructor (to which Kiba will pass the provided arguments in the DSL)
90
+ - the `each` method (which should yield rows one by one)
91
+
92
+ Rows are usually `Hash` instances, but could be other structures as long as the rest of your pipeline is expecting it.
93
+
94
+ Since sources are classes, you can (and are encouraged to) unit test them and reuse them.
95
+
96
+ Here is a simple CSV source:
97
+
98
+ ```ruby
99
+ require 'csv'
100
+
101
+ class MyCsvSource
102
+ def initialize(input_file)
103
+ @csv = CSV.open(input_file, headers: true, header_converters: :symbol)
104
+ end
105
+
106
+ def each
107
+ @csv.each do |row|
108
+ yield(row.to_hash)
109
+ end
110
+ @csv.close
111
+ end
112
+ end
113
+ ```
114
+
115
+ ## Implementing row transforms
116
+
117
+ Row transforms can implemented in two ways: as blocks, or as classes.
118
+
119
+ ### Row transform as a block
120
+
121
+ When writing a row transform as a block, it will be passed the row as parameter:
122
+
123
+ ```ruby
124
+ transform do |row|
125
+ row[:this_field] = row[:that_field] * 10
126
+ # make sure to return the row to keep it in the pipeline
127
+ row
128
+ end
129
+ ```
130
+
131
+ To dismiss a row from the pipeline, simply return `nil` from a transform:
132
+
133
+ ```ruby
134
+ transform { |row| row[:index] % 2 == 0 ? row : nil }
135
+ ```
136
+
137
+ ### Row transform as a class
138
+
139
+ If you implement the transform as a class, it must respond to `process(row)`:
140
+
141
+ ```ruby
142
+ class SamplingTransform
143
+ def initialize(modulo_value)
144
+ @modulo_value = modulo_value
145
+ end
146
+
147
+ def process(row)
148
+ row[:index] % @modulo_value == 0 ? row : nil
149
+ end
150
+ end
151
+ ```
152
+
153
+ You'll use it this way in your ETL declaration (the parameters will be passed to initialize):
154
+
155
+ ```ruby
156
+ # only keep 1 row over 10
157
+ transform SamplingTransform, 10
158
+ ```
159
+
160
+ Like the block form, it can return `nil` to dismiss the row. The class form allows better testability and reusability across your(s) ETL script(s).
161
+
162
+ ## Implementing ETL destinations
163
+
164
+ Like sources, destinations are classes that you are providing. Destinations must implement:
165
+ - a constructor (to which Kiba will pass the provided arguments in the DSL)
166
+ - a `write(row)` method that will be called for each non-dismissed row
167
+ - a `close` method that will be called at the end of the processing
168
+
169
+ Here is an example destination:
170
+
171
+ ```ruby
172
+ require 'csv'
173
+
174
+ # simple destination assuming all rows have the same fields
175
+ class MyCsvDestination
176
+ def initialize(output_file)
177
+ @csv = CSV.open(output_file, 'w')
178
+ end
179
+
180
+ def write(row)
181
+ unless @headers_written
182
+ @headers_written = true
183
+ @csv << row.keys
184
+ end
185
+ @csv << row.values
186
+ end
187
+
188
+ def close
189
+ @csv.close
190
+ end
191
+ end
192
+ ```
193
+
194
+ ## Implementing post-processors
195
+
196
+ Post-processors are currently blocks, which get called once, after the ETL
197
+ successfully processed all the rows. It won't get called if an error occurred.
198
+
199
+ ```ruby
200
+ count = 0
201
+
202
+ transform do |row|
203
+ count += 1
204
+ row
205
+ end
206
+
207
+ post_process do
208
+ Email.send(supervisor_address, "#{count} rows successfully processed")
209
+ end
210
+ ```
211
+
212
+ ## Composability, reusability, testability of Kiba components
213
+
214
+ The way Kiba works makes it easy to create reusable, well-tested ETL components and jobs.
215
+
216
+ The main reason for this is that a Kiba ETL script can `require` shared Ruby code, which allows to:
217
+ - create well-tested, reusable sources & destinations
218
+ - create macro-transforms as methods, to be reused across sister scripts
219
+ - substitute a component by another (e.g.: try a variant of a destination)
220
+ - use a centralized place for configuration (credentials, IP addresses, etc.)
221
+
222
+ The fact that the DSL evaluation "runs" the script also allows for simple meta-programming techniques, like pre-reading a source file to extract field names, to be used in transform definitions.
223
+
224
+ The ability to support that DSL, but also check command line arguments, environment variables and tweak behaviour as needed, or call other/faster specialized tools make Ruby an asset to implement ETL jobs.
225
+
226
+ Make sure to subscribe to my [Ruby ETL blog](http://thibautbarrere.com) where I'll demonstrate such techniques over time!
227
+
228
+ ## History & Credits
229
+
230
+ Wow, you're still there? Nice to meet you. I'm [Thibaut](http://thibautbarrere.com), author of Kiba.
231
+
232
+ I first met the idea of row-based syntax when I started using [Anthony Eden](https://github.com/aeden)'s [Activewarehouse-ETL](https://github.com/activewarehouse/activewarehouse-etl), first published around 2006 (I think), in which Anthony applied the core principles defined by Ralph Kimball in [The Data Warehouse ETL Toolkit](http://www.amazon.com/gp/product/0764567578).
233
+
234
+ I've been writing and maintaining a number of production ETL systems using Activewarehouse-ETL, then later with an ancestor of Kiba which was named TinyTL.
235
+
236
+ I took over the maintenance of Activewarehouse-ETL circa 2009/2010, but over time, I could not properly update & document it, given the gradual failure of a large number of dependencies and components. Ultimately in 2014 I had to stop maintaining it, after an already long hiatus.
237
+
238
+ That said using Activewarehouse-ETL for so long made me realize the row-based processing syntax was great and provided some great assets for maintainability on long time-spans.
239
+
240
+ Kiba is a completely fresh & minimalistic-on-purpose implementation of that row-based processing pattern.
241
+
242
+ It is minimalistic to make it more likely that I will be able to maintain it over time.
243
+
244
+ It makes strong simplicity assumptions (like letting you define the sources, transforms & destinations). MiniTest is an inspiration.
245
+
246
+ As I developed Kiba, I realize how much this simplicity opens the road for interesting developments such as multi-threaded & multi-processes processing.
247
+
248
+ Last word: Kiba is 100% sponsored by my company LoGeek SARL (also provider of [WiseCash, a lightweight cash-flow forecasting app](https://www.wisecashhq.com)).
249
+
250
+ ## License
251
+
252
+ Copyright (c) LoGeek SARL.
253
+
254
+ Kiba is an Open Source project licensed under the terms of
255
+ the LGPLv3 license. Please see <http://www.gnu.org/licenses/lgpl-3.0.html>
256
+ for license text.
257
+
258
+ ## Contributing & Legal
259
+
260
+ Until the API is more stable, I can only accept documentation Pull Requests.
261
+
262
+ (agreement below borrowed from [Sidekiq Legal](https://github.com/mperham/sidekiq/blob/master/Contributing.md))
263
+
264
+ By submitting a Pull Request, you disavow any rights or claims to any changes submitted to the Kiba project and assign the copyright of those changes to LoGeek SARL.
265
+
266
+ If you cannot or do not want to reassign those rights (your employment contract for your employer may not allow this), you should not submit a PR. Open an issue and someone else can do the work.
267
+
268
+ This is a legal way of saying "If you submit a PR to us, that code becomes ours". 99.9% of the time that's what you intend anyways; we hope it doesn't scare you away from contributing.
@@ -0,0 +1,7 @@
1
+ require 'rake/testtask'
2
+
3
+ Rake::TestTask.new(:test) do |t|
4
+ t.pattern = 'test/test_*.rb'
5
+ end
6
+
7
+ task :default => :test
@@ -0,0 +1,5 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require_relative '../lib/kiba/cli'
4
+
5
+ Kiba::Cli.run(ARGV)
@@ -0,0 +1,20 @@
1
+ # -*- encoding: utf-8 -*-
2
+ require File.expand_path('../lib/kiba/version', __FILE__)
3
+
4
+ Gem::Specification.new do |gem|
5
+ gem.authors = ["Thibaut Barrère"]
6
+ gem.email = ["thibaut.barrere@gmail.com"]
7
+ gem.description = gem.summary = "Lightweight ETL for Ruby"
8
+ gem.homepage = "http://thbar.github.io/kiba/"
9
+ gem.license = "LGPL-3.0"
10
+ gem.files = `git ls-files | grep -Ev '^(examples)'`.split("\n")
11
+ gem.test_files = `git ls-files -- test/*`.split("\n")
12
+ gem.name = "kiba"
13
+ gem.require_paths = ["lib"]
14
+ gem.version = Kiba::VERSION
15
+ gem.executables = ['kiba']
16
+
17
+ gem.add_development_dependency 'rake'
18
+ gem.add_development_dependency 'minitest'
19
+ gem.add_development_dependency 'awesome_print'
20
+ end
@@ -0,0 +1,10 @@
1
+ # encoding: utf-8
2
+ require 'kiba/version'
3
+
4
+ require 'kiba/control'
5
+ require 'kiba/context'
6
+ require 'kiba/parser'
7
+ require 'kiba/runner'
8
+
9
+ Kiba.extend(Kiba::Parser)
10
+ Kiba.extend(Kiba::Runner)
@@ -0,0 +1,16 @@
1
+ require 'kiba'
2
+
3
+ module Kiba
4
+ class Cli
5
+ def self.run(args)
6
+ unless args.size == 1
7
+ puts "Syntax: kiba your-script.etl"
8
+ exit -1
9
+ end
10
+ filename = args[0]
11
+ script_content = IO.read(filename)
12
+ job_definition = Kiba.parse(script_content, filename)
13
+ Kiba.run(job_definition)
14
+ end
15
+ end
16
+ end
@@ -0,0 +1,28 @@
1
+ module Kiba
2
+ class Context
3
+ def initialize(control)
4
+ # TODO: forbid access to control from context? use cleanroom?
5
+ @control = control
6
+ end
7
+
8
+ def source(klass, *initialization_params)
9
+ @control.sources << {klass: klass, args: initialization_params}
10
+ end
11
+
12
+ def transform(klass = nil, *initialization_params, &block)
13
+ if klass
14
+ @control.transforms << {klass: klass, args: initialization_params}
15
+ else
16
+ @control.transforms << block
17
+ end
18
+ end
19
+
20
+ def destination(klass, *initialization_params)
21
+ @control.destinations << {klass: klass, args: initialization_params}
22
+ end
23
+
24
+ def post_process(&block)
25
+ @control.post_processes << block
26
+ end
27
+ end
28
+ end
@@ -0,0 +1,19 @@
1
+ module Kiba
2
+ class Control
3
+ def sources
4
+ @sources ||= []
5
+ end
6
+
7
+ def transforms
8
+ @transforms ||= []
9
+ end
10
+
11
+ def destinations
12
+ @destinations ||= []
13
+ end
14
+
15
+ def post_processes
16
+ @post_processes ||= []
17
+ end
18
+ end
19
+ end
@@ -0,0 +1,15 @@
1
+ module Kiba
2
+ module Parser
3
+ def parse(source_as_string = nil, source_file = nil, &source_as_block)
4
+ control = Control.new
5
+ context = Context.new(control)
6
+ if source_as_string
7
+ # this somewhat weird construct allows to remove a nil source_file
8
+ context.instance_eval(*[source_as_string, source_file].compact)
9
+ else
10
+ context.instance_eval(&source_as_block)
11
+ end
12
+ control
13
+ end
14
+ end
15
+ end
@@ -0,0 +1,44 @@
1
+ module Kiba
2
+ module Runner
3
+ def run(control)
4
+ sources = to_instances(control.sources)
5
+ destinations = to_instances(control.destinations)
6
+ transforms = to_instances(control.transforms, true)
7
+ # not using keyword args because JRuby defaults to 1.9 syntax currently
8
+ post_processes = to_instances(control.post_processes, true, false)
9
+
10
+ sources.each do |source|
11
+ source.each do |row|
12
+ transforms.each_with_index do |transform, index|
13
+ if transform.is_a?(Proc)
14
+ row = transform.call(row)
15
+ else
16
+ row = transform.process(row)
17
+ end
18
+ break unless row
19
+ end
20
+ next unless row
21
+ destinations.each do |destination|
22
+ destination.write(row)
23
+ end
24
+ end
25
+ end
26
+
27
+ destinations.each(&:close)
28
+ post_processes.each(&:call)
29
+ end
30
+
31
+ def to_instances(definitions, allow_block = false, allow_class = true)
32
+ definitions.map do |d|
33
+ case d
34
+ when Proc
35
+ raise "Block form is not allowed here" unless allow_block
36
+ d
37
+ else
38
+ raise "Class form is not allowed here" unless allow_class
39
+ d[:klass].new(*d[:args])
40
+ end
41
+ end
42
+ end
43
+ end
44
+ end
@@ -0,0 +1,3 @@
1
+ module Kiba
2
+ VERSION = "0.5.0"
3
+ end
@@ -0,0 +1,2 @@
1
+ # this should fail because we have an unknown class
2
+ source UnknownThing
@@ -0,0 +1 @@
1
+ # this does nothing
@@ -0,0 +1,17 @@
1
+ require 'minitest/autorun'
2
+ require 'minitest/pride'
3
+ require 'kiba'
4
+
5
+ class Kiba::Test < Minitest::Test
6
+ extend Minitest::Spec::DSL
7
+
8
+ def remove_files(*files)
9
+ files.each do |file|
10
+ File.delete(file) if File.exists?(file)
11
+ end
12
+ end
13
+
14
+ def fixture(file)
15
+ File.join(File.dirname(__FILE__), 'fixtures', file)
16
+ end
17
+ end
@@ -0,0 +1,21 @@
1
+ require 'csv'
2
+
3
+ # simple destination, not checking that each row has all the fields
4
+ class TestCsvDestination
5
+ def initialize(output_file)
6
+ @csv = CSV.open(output_file, 'w')
7
+ @headers_written = false
8
+ end
9
+
10
+ def write(row)
11
+ unless @headers_written
12
+ @headers_written = true
13
+ @csv << row.keys
14
+ end
15
+ @csv << row.values
16
+ end
17
+
18
+ def close
19
+ @csv.close
20
+ end
21
+ end
@@ -0,0 +1,14 @@
1
+ require 'csv'
2
+
3
+ class TestCsvSource
4
+ def initialize(input_file)
5
+ @csv = CSV.open(input_file, headers: true, header_converters: :symbol)
6
+ end
7
+
8
+ def each
9
+ @csv.each do |row|
10
+ yield(row.to_hash)
11
+ end
12
+ @csv.close
13
+ end
14
+ end
@@ -0,0 +1,11 @@
1
+ class TestEnumerableSource
2
+ def initialize(enumerable)
3
+ @enumerable = enumerable
4
+ end
5
+
6
+ def each
7
+ @enumerable.each do |row|
8
+ yield row
9
+ end
10
+ end
11
+ end
@@ -0,0 +1,11 @@
1
+ class TestRenameFieldTransform
2
+ def initialize(from, to)
3
+ @from = from
4
+ @to = to
5
+ end
6
+
7
+ def process(row)
8
+ row[@to] = row.delete(@from)
9
+ row
10
+ end
11
+ end
@@ -0,0 +1,17 @@
1
+ require_relative 'helper'
2
+ require 'kiba/cli'
3
+
4
+ class TestCli < Kiba::Test
5
+ def test_cli_launches
6
+ Kiba::Cli.run([fixture('valid.etl')])
7
+ end
8
+
9
+ def test_cli_reports_filename_and_lineno
10
+ exception = assert_raises(NameError) do
11
+ Kiba::Cli.run([fixture('bogus.etl')])
12
+ end
13
+
14
+ assert_match /uninitialized constant (.*)UnknownThing/, exception.message
15
+ assert_includes exception.backtrace.to_s, 'test/fixtures/bogus.etl:2:in'
16
+ end
17
+ end
@@ -0,0 +1,88 @@
1
+ require_relative 'helper'
2
+
3
+ require_relative 'support/test_csv_source'
4
+ require_relative 'support/test_csv_destination'
5
+ require_relative 'support/test_rename_field_transform'
6
+
7
+ # End-to-end tests go here
8
+ class TestIntegration < Kiba::Test
9
+ let(:output_file) { 'test/tmp/output.csv' }
10
+ let(:input_file) { 'test/tmp/input.csv' }
11
+
12
+ let(:sample_csv_data) do <<CSV
13
+ first_name,last_name,sex
14
+ John,Doe,M
15
+ Mary,Johnson,F
16
+ Cindy,Backgammon,F
17
+ Patrick,McWire,M
18
+ CSV
19
+ end
20
+
21
+ def setup
22
+ remove_files(input_file, output_file)
23
+ IO.write(input_file, sample_csv_data)
24
+ end
25
+
26
+ def teardown
27
+ remove_files(input_file, output_file)
28
+ end
29
+
30
+ def test_csv_to_csv
31
+ # parse the ETL script (this won't run it)
32
+ control = Kiba.parse do
33
+ source TestCsvSource, 'test/tmp/input.csv'
34
+
35
+ transform do |row|
36
+ row[:sex] = case row[:sex]
37
+ when 'M'; 'Male'
38
+ when 'F'; 'Female'
39
+ else 'Unknown'
40
+ end
41
+ row # must be returned
42
+ end
43
+
44
+ # returning nil dismisses the row
45
+ transform do |row|
46
+ row[:sex] == 'Female' ? row : nil
47
+ end
48
+
49
+ transform TestRenameFieldTransform, :sex, :sex_2015
50
+
51
+ destination TestCsvDestination, 'test/tmp/output.csv'
52
+ end
53
+
54
+ # run the parsed ETL script
55
+ Kiba.run(control)
56
+
57
+ # verify the output
58
+ assert_equal <<CSV, IO.read(output_file)
59
+ first_name,last_name,sex_2015
60
+ Mary,Johnson,Female
61
+ Cindy,Backgammon,Female
62
+ CSV
63
+ end
64
+
65
+ def test_variable_access
66
+ message = nil
67
+
68
+ control = Kiba.parse do
69
+ source TestEnumerableSource, [1, 2, 3]
70
+
71
+ count = 0
72
+
73
+ transform do |r|
74
+ count += 1
75
+ r
76
+ end
77
+
78
+ post_process do
79
+ message = "#{count} rows processed"
80
+ end
81
+ end
82
+
83
+ Kiba.run(control)
84
+
85
+ assert_equal '3 rows processed', message
86
+ end
87
+
88
+ end
@@ -0,0 +1,84 @@
1
+ require_relative 'helper'
2
+
3
+ require_relative 'support/test_rename_field_transform'
4
+
5
+ class DummyClass
6
+ end
7
+
8
+ class TestParser < Kiba::Test
9
+ def test_source_definition
10
+ control = Kiba.parse do
11
+ source DummyClass, 'has', 'args'
12
+ end
13
+
14
+ assert_equal DummyClass, control.sources[0][:klass]
15
+ assert_equal ['has', 'args'], control.sources[0][:args]
16
+ end
17
+
18
+ def test_block_transform_definition
19
+ control = Kiba.parse do
20
+ transform { |row| row }
21
+ end
22
+
23
+ assert_instance_of Proc, control.transforms[0]
24
+ end
25
+
26
+ def test_class_transform_definition
27
+ control = Kiba.parse do
28
+ transform TestRenameFieldTransform, :last_name, :name
29
+ end
30
+
31
+ assert_equal TestRenameFieldTransform, control.transforms[0][:klass]
32
+ assert_equal [:last_name, :name], control.transforms[0][:args]
33
+ end
34
+
35
+ def test_destination_definition
36
+ control = Kiba.parse do
37
+ destination DummyClass, 'has', 'args'
38
+ end
39
+
40
+ assert_equal DummyClass, control.destinations[0][:klass]
41
+ assert_equal ['has', 'args'], control.destinations[0][:args]
42
+ end
43
+
44
+ def test_block_post_process_definition
45
+ control = Kiba.parse do
46
+ post_process { }
47
+ end
48
+
49
+ assert_instance_of Proc, control.post_processes[0]
50
+ end
51
+
52
+ def test_source_as_string_parsing
53
+ control = Kiba.parse <<RUBY
54
+ source DummyClass, 'from', 'file'
55
+ RUBY
56
+
57
+ assert_equal 1, control.sources.size
58
+ assert_equal DummyClass, control.sources[0][:klass]
59
+ assert_equal ['from', 'file'], control.sources[0][:args]
60
+ end
61
+
62
+ def test_source_as_file_doing_require
63
+ IO.write 'test/tmp/etl-common.rb', <<RUBY
64
+ def common_source_declaration
65
+ source DummyClass, 'from', 'common'
66
+ end
67
+ RUBY
68
+ IO.write 'test/tmp/etl-main.rb', <<RUBY
69
+ require './test/tmp/etl-common.rb'
70
+
71
+ source DummyClass, 'from', 'main'
72
+ common_source_declaration
73
+ RUBY
74
+ control = Kiba.parse IO.read('test/tmp/etl-main.rb')
75
+
76
+ assert_equal 2, control.sources.size
77
+
78
+ assert_equal ['from', 'main'], control.sources[0][:args]
79
+ assert_equal ['from', 'common'], control.sources[1][:args]
80
+
81
+ ensure
82
+ remove_files('test/tmp/etl-common.rb', 'test/tmp/etl-main.rb')
83
+ end
84
+ end
@@ -0,0 +1,40 @@
1
+ require_relative 'helper'
2
+
3
+ require_relative 'support/test_enumerable_source'
4
+
5
+ class TestRunner < Kiba::Test
6
+ let(:control) do
7
+ control = Kiba::Control.new
8
+ # this will yield a single row for testing
9
+ control.sources << {klass: TestEnumerableSource, args: [[{field: 'value'}]]}
10
+ control
11
+ end
12
+
13
+ def test_block_transform_processing
14
+ # is there a better way to assert a block was called in minitest?
15
+ control.transforms << lambda { |r| @called = true; r }
16
+ Kiba.run(control)
17
+ assert_equal true, @called
18
+ end
19
+
20
+ def test_dismissed_row_not_passed_to_next_transform
21
+ control.transforms << lambda { |r| nil }
22
+ control.transforms << lambda { |r| @called = true; nil}
23
+ Kiba.run(control)
24
+ assert_nil @called
25
+ end
26
+
27
+ def test_post_process_runs
28
+ control.post_processes << lambda { @called = true }
29
+ Kiba.run(control)
30
+ assert_equal true, @called
31
+ end
32
+
33
+ def test_post_process_not_called_after_row_failure
34
+ control.transforms << lambda { |r| raise 'FAIL' }
35
+ control.post_processes << lambda { @called = true }
36
+ assert_raises(RuntimeError, 'FAIL') { Kiba.run(control) }
37
+ assert_nil @called
38
+ end
39
+
40
+ end
File without changes
metadata ADDED
@@ -0,0 +1,126 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: kiba
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.5.0
5
+ platform: ruby
6
+ authors:
7
+ - Thibaut Barrère
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+ date: 2015-04-18 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: rake
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - ">="
18
+ - !ruby/object:Gem::Version
19
+ version: '0'
20
+ type: :development
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - ">="
25
+ - !ruby/object:Gem::Version
26
+ version: '0'
27
+ - !ruby/object:Gem::Dependency
28
+ name: minitest
29
+ requirement: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - ">="
32
+ - !ruby/object:Gem::Version
33
+ version: '0'
34
+ type: :development
35
+ prerelease: false
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - ">="
39
+ - !ruby/object:Gem::Version
40
+ version: '0'
41
+ - !ruby/object:Gem::Dependency
42
+ name: awesome_print
43
+ requirement: !ruby/object:Gem::Requirement
44
+ requirements:
45
+ - - ">="
46
+ - !ruby/object:Gem::Version
47
+ version: '0'
48
+ type: :development
49
+ prerelease: false
50
+ version_requirements: !ruby/object:Gem::Requirement
51
+ requirements:
52
+ - - ">="
53
+ - !ruby/object:Gem::Version
54
+ version: '0'
55
+ description: Lightweight ETL for Ruby
56
+ email:
57
+ - thibaut.barrere@gmail.com
58
+ executables:
59
+ - kiba
60
+ extensions: []
61
+ extra_rdoc_files: []
62
+ files:
63
+ - ".gitignore"
64
+ - ".travis.yml"
65
+ - Changes.md
66
+ - Gemfile
67
+ - README.md
68
+ - Rakefile
69
+ - bin/kiba
70
+ - kiba.gemspec
71
+ - lib/kiba.rb
72
+ - lib/kiba/cli.rb
73
+ - lib/kiba/context.rb
74
+ - lib/kiba/control.rb
75
+ - lib/kiba/parser.rb
76
+ - lib/kiba/runner.rb
77
+ - lib/kiba/version.rb
78
+ - test/fixtures/bogus.etl
79
+ - test/fixtures/valid.etl
80
+ - test/helper.rb
81
+ - test/support/test_csv_destination.rb
82
+ - test/support/test_csv_source.rb
83
+ - test/support/test_enumerable_source.rb
84
+ - test/support/test_rename_field_transform.rb
85
+ - test/test_cli.rb
86
+ - test/test_integration.rb
87
+ - test/test_parser.rb
88
+ - test/test_runner.rb
89
+ - test/tmp/.gitkeep
90
+ homepage: http://thbar.github.io/kiba/
91
+ licenses:
92
+ - LGPL-3.0
93
+ metadata: {}
94
+ post_install_message:
95
+ rdoc_options: []
96
+ require_paths:
97
+ - lib
98
+ required_ruby_version: !ruby/object:Gem::Requirement
99
+ requirements:
100
+ - - ">="
101
+ - !ruby/object:Gem::Version
102
+ version: '0'
103
+ required_rubygems_version: !ruby/object:Gem::Requirement
104
+ requirements:
105
+ - - ">="
106
+ - !ruby/object:Gem::Version
107
+ version: '0'
108
+ requirements: []
109
+ rubyforge_project:
110
+ rubygems_version: 2.4.3
111
+ signing_key:
112
+ specification_version: 4
113
+ summary: Lightweight ETL for Ruby
114
+ test_files:
115
+ - test/fixtures/bogus.etl
116
+ - test/fixtures/valid.etl
117
+ - test/helper.rb
118
+ - test/support/test_csv_destination.rb
119
+ - test/support/test_csv_source.rb
120
+ - test/support/test_enumerable_source.rb
121
+ - test/support/test_rename_field_transform.rb
122
+ - test/test_cli.rb
123
+ - test/test_integration.rb
124
+ - test/test_parser.rb
125
+ - test/test_runner.rb
126
+ - test/tmp/.gitkeep