kiba 2.0.0 → 4.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (50) hide show
  1. checksums.yaml +5 -5
  2. data/.github/FUNDING.yml +1 -0
  3. data/.github/workflows/ci.yml +41 -0
  4. data/COMM-LICENSE.md +348 -0
  5. data/Changes.md +38 -2
  6. data/Gemfile +1 -1
  7. data/ISSUE_TEMPLATE.md +7 -0
  8. data/LICENSE +3 -1
  9. data/Pro-Changes.md +82 -5
  10. data/README.md +12 -65
  11. data/Rakefile +8 -3
  12. data/kiba.gemspec +20 -17
  13. data/lib/kiba.rb +14 -11
  14. data/lib/kiba/context.rb +9 -5
  15. data/lib/kiba/control.rb +1 -1
  16. data/lib/kiba/dsl_extensions/config.rb +1 -1
  17. data/lib/kiba/parser.rb +6 -22
  18. data/lib/kiba/streaming_runner.rb +62 -5
  19. data/lib/kiba/version.rb +1 -1
  20. data/test/helper.rb +15 -7
  21. data/test/shared_runner_tests.rb +227 -0
  22. data/test/support/shared_tests.rb +1 -1
  23. data/test/support/test_aggregate_transform.rb +19 -0
  24. data/test/support/test_array_destination.rb +2 -2
  25. data/test/support/test_close_yielding_transform.rb +11 -0
  26. data/test/support/test_csv_destination.rb +2 -2
  27. data/test/support/test_csv_source.rb +1 -1
  28. data/test/support/test_destination_returning_nil.rb +12 -0
  29. data/test/support/test_duplicate_row_transform.rb +1 -1
  30. data/test/support/test_keyword_arguments_component.rb +14 -0
  31. data/test/support/test_mixed_arguments_component.rb +14 -0
  32. data/test/support/test_non_closing_transform.rb +5 -0
  33. data/test/support/test_yielding_transform.rb +1 -1
  34. data/test/test_integration.rb +38 -33
  35. data/test/test_parser.rb +16 -50
  36. data/test/test_run.rb +37 -0
  37. data/test/test_streaming_runner.rb +44 -23
  38. metadata +45 -30
  39. data/.travis.yml +0 -15
  40. data/appveyor.yml +0 -26
  41. data/bin/kiba +0 -5
  42. data/lib/kiba/cli.rb +0 -16
  43. data/lib/kiba/runner.rb +0 -78
  44. data/test/common/runner.rb +0 -137
  45. data/test/fixtures/bogus.etl +0 -2
  46. data/test/fixtures/namespace_conflict.etl +0 -9
  47. data/test/fixtures/some_extension.rb +0 -4
  48. data/test/fixtures/valid.etl +0 -1
  49. data/test/test_cli.rb +0 -21
  50. data/test/test_runner.rb +0 -6
data/Gemfile CHANGED
@@ -1,3 +1,3 @@
1
- source 'https://rubygems.org'
1
+ source "https://rubygems.org"
2
2
 
3
3
  gemspec
data/ISSUE_TEMPLATE.md ADDED
@@ -0,0 +1,7 @@
1
+ **If you need help**, please:
2
+ * [Check existing answers on StackOverflow](https://stackoverflow.com/questions/tagged/kiba-etl).
3
+ * [Ask your question with tag kiba-etl on StackOverflow](http://stackoverflow.com/questions/ask?tags=kiba-etl) so that other can benefit from your contribution!
4
+
5
+ I monitor this specific tag and will reply to you.
6
+
7
+ Please only open an issue in case you found a bug. Thanks!
data/LICENSE CHANGED
@@ -2,4 +2,6 @@ Copyright (c) LoGeek SARL
2
2
 
3
3
  Kiba Common is an Open Source project licensed under the terms of
4
4
  the LGPLv3 license. Please see <http://www.gnu.org/licenses/lgpl-3.0.html>
5
- for license text.
5
+ for license text.
6
+
7
+ Kiba Pro has a commercial-friendly license allowing private forks and modifications of Kiba. You can find the commercial license terms in COMM-LICENSE.md.
data/Pro-Changes.md CHANGED
@@ -1,13 +1,90 @@
1
1
  Kiba Pro Changelog
2
2
  ==================
3
3
 
4
- Kiba Pro is the commercial extension for Kiba. Documentation is available on the [Wiki](https://github.com/thbar/kiba/wiki).
4
+ Kiba Pro provides vendor-supported ETL extensions for Kiba. Your subscription funds the Open-Source development, thanks for considering it!
5
5
 
6
- HEAD
7
- -------
6
+ Learn more on the [Kiba website](https://www.kiba-etl.org/kiba-pro).
8
7
 
9
- 1.0.0.rc1
10
- ---------
8
+ Documentation is available on the [Wiki](https://github.com/thbar/kiba/wiki#kiba-pro).
9
+
10
+ 2.0.0
11
+ -----
12
+
13
+ - New: `SQLBulkLookup` transform allows to efficiently lookup values in SQL tables. This is particularly useful in datawarehouse scenarios (to replace unique business keys by surrogate keys), or when writing migrations of SQL databases. Instead of looking-up each row individually, it avoids a "N+1" like effect, by working on large batches of rows.
14
+ - New: `ParallelTransform` provides an easy way to process a group of ETL rows at the same time using a pool of threads. It can be used to accelerate ETL transforms doing IO operations such as HTTP queries, by going multithreaded.
15
+ - New: `FileLock` adds an easy way to avoid overlapping runs in ETL Jobs using a local file lock.
16
+
17
+ 1.5.0
18
+ -----
19
+
20
+ - Compatibility with Kiba v3
21
+ - BREAKING CHANGE: deprecate non-live Sequel connection passing (https://github.com/thbar/kiba/issues/79). Do not use `database: "connection_string"`, instead pass your `Sequel` connection directly. This moves the connection management out of the destination, which is a better pattern & provides better (block-based) resources closing.
22
+ - Official MySQL support:
23
+ - While the compatibility was already here, it is now tested for in our QA testing suite.
24
+ - MySQL 5.5-8.0 is supported & tested
25
+ - MariaDB should be supported (although not tested against in the QA testing suite)
26
+ - Amazon Aurora MySQL is also supposed to work (although not tested)
27
+ - `Kiba::Pro::Sources::SQL` supports for non-streaming + streaming use
28
+ - `Kiba::Pro::Destinations::SQLBulkInsert` supports:
29
+ - Bulk insert
30
+ - Bulk insert with ignore
31
+ - Bulk upsert (including with dynamically computed columns) via `ON DUPLICATE KEY UPDATE`
32
+ - Note that the `Kiba::Pro::Destinations::SQLUpsert` (row-by-row) is not MySQL compatible at the moment
33
+
34
+ 1.2.0
35
+ -----
36
+
37
+ - `SQL` source improvements:
38
+ - Deprecate use_cursor in favor of block query construct. The source could previously be configured with:
39
+
40
+ ```ruby
41
+ source Kiba::Pro::Sources::SQL,
42
+ query: "SELECT * FROM items",
43
+ use_cursor: true
44
+ ```
45
+
46
+ The `use_cursor` keyword is now deprecated. You can use the more powerful block query construct:
47
+
48
+ ```ruby
49
+ source Kiba::Pro::Sources::SQL,
50
+ query: -> (db) { db["SELECT * FROM items"].use_cursor },
51
+ ```
52
+
53
+ - Avoid bogus nested SQL calls when configuring the query via block/proc. A call with:
54
+
55
+ ```ruby
56
+ source Kiba::Pro::Sources::SQL,
57
+ query: -> (db) { db["SELECT * FROM items"] },
58
+ ```
59
+
60
+ would have previously generated a `SELECT * FROM (SELECT * FROM "items")`. This is now fixed.
61
+
62
+ - Add specs around streaming support (for both MySQL and Postgres).
63
+
64
+ For Postgres, streaming was [recommended by the author of Sequel](https://groups.google.com/d/msg/sequel-talk/olznPcmEf8M/hd5Ris0pYNwJ) over `use_cursor: true` (but do compare on your actual cases!). To enable streaming for Postgres:
65
+ - Add `sequel_pg` to your `Gemfile`
66
+ - Enable the extension in your `db` instance & add `.stream` to your dataset e.g.:
67
+
68
+ ```ruby
69
+ Sequel.connect(ENV.fetch('DATABASE_URL')) do |db|
70
+ db.extension(:pg_streaming)
71
+ Kiba.run(Kiba.parse do
72
+ source Kiba::Pro::Sources::SQL,
73
+ db: db,
74
+ query: -> (db) { db[:items].stream }
75
+ # SNIP
76
+ end)
77
+ ```
78
+
79
+ For MySQL, just add `.stream` to your dataset like above (no extension required).
80
+
81
+ 1.1.0
82
+ -----
83
+
84
+ - Improvement: `SQLBulkInsert` now supports Postgres `INSERT ON CONFLICT` for batch operations (bulk upsert, conditional upserts, ignore if exist etc) via new `dataset` keyword. See [documentation](https://github.com/thbar/kiba/wiki/SQL-Bulk-Insert-Destination).
85
+
86
+ 1.0.0
87
+ -----
11
88
 
12
89
  NOTE: documentation & requirements/compatibility are available on the [wiki](https://github.com/thbar/kiba/wiki).
13
90
 
data/README.md CHANGED
@@ -1,84 +1,31 @@
1
- **If you need help**, please [ask your question with tag kiba-etl on StackOverflow](http://stackoverflow.com/questions/ask?tags=kiba-etl) so that other can benefit from your contribution! I monitor this specific tag and will reply to you.
2
-
3
- Writing reliable, concise, well-tested & maintainable data-processing code is tricky.
4
-
5
- Kiba lets you define and run such high-quality ETL ([Extract-Transform-Load](http://en.wikipedia.org/wiki/Extract,_transform,_load)) jobs using Ruby.
6
-
7
- Learn more on the [Wiki](https://github.com/thbar/kiba/wiki), on my [blog](http://thibautbarrere.com) and on [StackOverflow](http://stackoverflow.com/questions/tagged/kiba-etl).
1
+ # Kiba ETL
8
2
 
9
3
  [![Gem Version](https://badge.fury.io/rb/kiba.svg)](http://badge.fury.io/rb/kiba)
10
- [![Build Status](https://travis-ci.org/thbar/kiba.svg?branch=master)](https://travis-ci.org/thbar/kiba) [![Build status](https://ci.appveyor.com/api/projects/status/v05jcyhpp1mueq9i?svg=true)](https://ci.appveyor.com/project/thbar/kiba) [![Code Climate](https://codeclimate.com/github/thbar/kiba/badges/gpa.svg)](https://codeclimate.com/github/thbar/kiba) [![Dependency Status](https://gemnasium.com/thbar/kiba.svg)](https://gemnasium.com/thbar/kiba)
11
-
12
- ## Kiba 2.0.0.rc1
13
-
14
- Kiba 2.0.0.rc1 (available via `gem install kiba --prerelease`) is available for testing.
15
-
16
- ### New StreamingRunner engine
17
-
18
- Kiba 2 introduces a new, opt-in engine called the `StreamingRunner`, which allows to generate an arbitrary number of rows inside transforms. This drastically improves the reusability & composability of Kiba components (see [#44](https://github.com/thbar/kiba/pull/44) for some background).
19
-
20
- To use the `StreamingRunner`, use the following code:
4
+ [![Build Status](https://github.com/thbar/kiba/actions/workflows/ci.yml/badge.svg)](https://github.com/thbar/kiba/actions) [![Code Climate](https://codeclimate.com/github/thbar/kiba/badges/gpa.svg)](https://codeclimate.com/github/thbar/kiba)
21
5
 
22
- ```ruby
23
- # activate the new Kiba internal config system
24
- extend Kiba::DSLExtensions::Config
25
- # opt-in for the new engine
26
- config :kiba, runner: Kiba::StreamingRunner
27
-
28
- # write transform class able to yield an arbitrary number of rows
29
- class MyYieldingTransform
30
- def process(row)
31
- yield {key: 1}
32
- yield {key: 2}
33
- {key: 3}
34
- end
35
- end
36
- ```
37
-
38
- The improved runner is compatible with Ruby 2.0+.
39
-
40
- ### Compatibility with Kiba 1
41
-
42
- Kiba 2 is expected to be compatible with existing Kiba scripts as long as you did not use internal API.
43
-
44
- Internal changes include:
6
+ Writing reliable, concise, well-tested & maintainable data-processing code is tricky.
45
7
 
46
- * An opt-in, Elixir's mix-inspired `config` system, currently only used to select the runner you want at job declaration time
47
- * A stronger isolation in the `Parser`, to reduces the chances that ETL scripts could conflict with Kiba internal classes
8
+ Kiba lets you define and run such high-quality ETL ([Extract-Transform-Load](http://en.wikipedia.org/wiki/Extract,_transform,_load)) jobs using Ruby.
48
9
 
49
10
  ## Getting Started
50
11
 
51
- * [How do you define ETL jobs with Kiba?](https://github.com/thbar/kiba/wiki/How-do-you-define-ETL-jobs-with-Kiba%3F)
52
- * [How do you run your ETL jobs?](https://github.com/thbar/kiba/wiki/How-do-you-run-your-ETL-jobs%3F)
53
- * [Implementing ETL sources](https://github.com/thbar/kiba/wiki/Implementing-ETL-sources).
54
- * [Implementing ETL transforms](https://github.com/thbar/kiba/wiki/Implementing-ETL-transforms).
55
- * [Implementing ETL destinations](https://github.com/thbar/kiba/wiki/Implementing-ETL-destinations).
56
- * [Implementing pre and post-processors](https://github.com/thbar/kiba/wiki/Implementing-pre-and-post-processors).
12
+ Head over to the [Wiki](https://github.com/thbar/kiba/wiki) for up-to-date documentation.
57
13
 
58
- ## Useful links
14
+ **If you need help**, please [ask your question with tag kiba-etl on StackOverflow](http://stackoverflow.com/questions/ask?tags=kiba-etl) so that other can benefit from your contribution! I monitor this specific tag and will reply to you.
59
15
 
60
- * [Live Coding Session - Processing data with Kiba ETL](http://thibautbarrere.com/2015/11/09/video-processing-data-with-kiba-etl/)
61
- * [Rubyists - are you doing ETL unknowningly?](http://thibautbarrere.com/2015/03/25/rubyists-are-you-doing-etl-unknowingly/)
62
- * [How to write solid data processing code](http://thibautbarrere.com/2015/04/05/how-to-write-solid-data-processing-code/)
63
- * [How to reformat CSV files with Kiba](http://thibautbarrere.com/2015/06/04/how-to-reformat-csv-files-with-kiba/) (in-depth, hands-on tutorial)
64
- * [How to explode multivalued attributes with Kiba ETL?](http://thibautbarrere.com/2015/06/25/how-to-explode-multivalued-attributes-with-kiba/)
65
- * [Common techniques to compute aggregates with Kiba](https://stackoverflow.com/questions/31145715/how-to-do-a-aggregation-transformation-in-a-kiba-etl-script-kiba-gem)
66
- * [How to run Kiba in a Rails environment?](http://thibautbarrere.com/2015/09/26/how-to-run-kiba-in-a-rails-environment/)
67
- * [How to pass parameters to the Kiba command line?](http://stackoverflow.com/questions/32959692/how-to-pass-parameters-into-your-etl-job)
16
+ [Kiba Pro](https://www.kiba-etl.org/kiba-pro) customers get priority private email support for any unforeseen issues and simple matters such as installation troubles. Our consulting services will also be prioritized to Kiba Pro subscribers. If you need any coaching on ETL & data pipeline implementation, please [reach out via email](mailto:info@logeek.fr) so we can discuss how to help you out.
68
17
 
69
- ## Supported Ruby versions
18
+ You can also check out the [author blog](https://thibautbarrere.com) and [StackOverflow answers](http://stackoverflow.com/questions/tagged/kiba-etl).
70
19
 
71
- Kiba currently supports Ruby 2.0+ and JRuby (with its default 1.9 syntax). See [test matrix](https://travis-ci.org/thbar/kiba).
72
-
73
- ## Kiba Common
20
+ ## Supported Ruby versions
74
21
 
75
- I'm starting to add commonly used reusable helpers in a separate gem called [kiba-common](https://github.com/thbar/kiba-common), check it out (work-in-progress).
22
+ Kiba currently supports Ruby 2.5+, JRuby 9.2+ and TruffleRuby. See [test matrix](https://github.com/thbar/kiba/actions).
76
23
 
77
24
  ## ETL consulting & commercial version
78
25
 
79
- **Consulting services**: if your organization needs help to implement a data pipeline or to build a data-intensive application, I provide consulting services. [More information](http://thibautbarrere.com/hire-me/).
26
+ **Consulting services**: if your organization needs guidance on Kiba / ETL implementations, we provide consulting services. Contact at [https://www.logeek.fr](https://www.logeek.fr).
80
27
 
81
- **Kiba Pro**: for more features & goodies, check out Kiba Pro ([Changelog & contact info](Pro-Changes.md)).
28
+ **Kiba Pro**: for vendor-backed ETL extensions, check out [Kiba Pro](https://www.kiba-etl.org/kiba-pro).
82
29
 
83
30
  ## License
84
31
 
data/Rakefile CHANGED
@@ -1,7 +1,12 @@
1
- require 'rake/testtask'
1
+ require "rake/testtask"
2
2
 
3
3
  Rake::TestTask.new(:test) do |t|
4
- t.pattern = 'test/test_*.rb'
4
+ t.pattern = "test/test_*.rb"
5
5
  end
6
6
 
7
- task default: :test
7
+ # A simple check to verify TruffleRuby installation trick is really in effect
8
+ task :show_ruby_version do
9
+ puts "Running with #{RUBY_DESCRIPTION}"
10
+ end
11
+
12
+ task default: [:show_ruby_version, :test]
data/kiba.gemspec CHANGED
@@ -1,21 +1,24 @@
1
- # -*- encoding: utf-8 -*-
2
- require File.expand_path('../lib/kiba/version', __FILE__)
1
+ require File.expand_path("../lib/kiba/version", __FILE__)
3
2
 
4
3
  Gem::Specification.new do |gem|
5
- gem.authors = ['Thibaut Barrère']
6
- gem.email = ['thibaut.barrere@gmail.com']
7
- gem.description = gem.summary = 'Lightweight ETL for Ruby'
8
- gem.homepage = 'http://thbar.github.io/kiba/'
9
- gem.license = 'LGPL-3.0'
10
- gem.files = `git ls-files | grep -Ev '^(examples)'`.split("\n")
11
- gem.test_files = `git ls-files -- test/*`.split("\n")
12
- gem.name = 'kiba'
13
- gem.require_paths = ['lib']
14
- gem.version = Kiba::VERSION
15
- gem.executables = ['kiba']
4
+ gem.authors = ["Thibaut Barrère"]
5
+ gem.email = ["thibaut.barrere@gmail.com"]
6
+ gem.description = gem.summary = "Lightweight ETL for Ruby"
7
+ gem.homepage = "https://www.kiba-etl.org"
8
+ gem.license = "LGPL-3.0"
9
+ gem.files = `git ls-files | grep -Ev '^(examples)'`.split("\n")
10
+ gem.test_files = `git ls-files -- test/*`.split("\n")
11
+ gem.name = "kiba"
12
+ gem.require_paths = ["lib"]
13
+ gem.version = Kiba::VERSION
14
+ gem.metadata = {
15
+ "source_code_uri" => "https://github.com/thbar/kiba",
16
+ "documentation_uri" => "https://github.com/thbar/kiba/wiki"
17
+ }
16
18
 
17
- gem.add_development_dependency 'rake'
18
- gem.add_development_dependency 'minitest', '~> 5.9'
19
- gem.add_development_dependency 'awesome_print'
20
- gem.add_development_dependency 'minitest-focus'
19
+ gem.add_development_dependency "rake"
20
+ gem.add_development_dependency "minitest", "~> 5.9"
21
+ gem.add_development_dependency "awesome_print"
22
+ gem.add_development_dependency "minitest-focus"
23
+ gem.add_development_dependency "standard"
21
24
  end
data/lib/kiba.rb CHANGED
@@ -1,19 +1,22 @@
1
- # encoding: utf-8
2
- require 'kiba/version'
1
+ require "kiba/version"
3
2
 
4
- require 'kiba/control'
5
- require 'kiba/context'
6
- require 'kiba/parser'
7
- require 'kiba/runner'
8
- require 'kiba/streaming_runner'
9
- require 'kiba/dsl_extensions/config'
3
+ require "kiba/control"
4
+ require "kiba/context"
5
+ require "kiba/parser"
6
+ require "kiba/streaming_runner"
7
+ require "kiba/dsl_extensions/config"
10
8
 
11
9
  Kiba.extend(Kiba::Parser)
12
10
 
13
11
  module Kiba
14
- def self.run(job)
15
- # NOTE: use Hash#dig when Ruby 2.2 reaches EOL
16
- runner = job.config.fetch(:kiba, {}).fetch(:runner, Kiba::Runner)
12
+ def self.run(job = nil, &block)
13
+ unless job.nil? ^ block.nil?
14
+ fail ArgumentError.new("Kiba.run takes either one argument (the job) or a block (defining the job)")
15
+ end
16
+
17
+ job ||= Kiba.parse { instance_exec(&block) }
18
+
19
+ runner = job.config.fetch(:kiba, {}).fetch(:runner, Kiba::StreamingRunner)
17
20
  runner.run(job)
18
21
  end
19
22
  end
data/lib/kiba/context.rb CHANGED
@@ -5,23 +5,27 @@ module Kiba
5
5
  end
6
6
 
7
7
  def pre_process(&block)
8
- @control.pre_processes << { block: block }
8
+ @control.pre_processes << {block: block}
9
9
  end
10
10
 
11
11
  def source(klass, *initialization_params)
12
- @control.sources << { klass: klass, args: initialization_params }
12
+ @control.sources << {klass: klass, args: initialization_params}
13
13
  end
14
14
 
15
15
  def transform(klass = nil, *initialization_params, &block)
16
- @control.transforms << { klass: klass, args: initialization_params, block: block }
16
+ @control.transforms << {klass: klass, args: initialization_params, block: block}
17
17
  end
18
18
 
19
19
  def destination(klass, *initialization_params)
20
- @control.destinations << { klass: klass, args: initialization_params }
20
+ @control.destinations << {klass: klass, args: initialization_params}
21
21
  end
22
22
 
23
23
  def post_process(&block)
24
- @control.post_processes << { block: block }
24
+ @control.post_processes << {block: block}
25
+ end
26
+
27
+ [:source, :transform, :destination].each do |m|
28
+ ruby2_keywords(m) if respond_to?(:ruby2_keywords, true)
25
29
  end
26
30
  end
27
31
  end
data/lib/kiba/control.rb CHANGED
@@ -3,7 +3,7 @@ module Kiba
3
3
  def pre_processes
4
4
  @pre_processes ||= []
5
5
  end
6
-
6
+
7
7
  def config
8
8
  @config ||= {}
9
9
  end
@@ -6,4 +6,4 @@ module Kiba
6
6
  end
7
7
  end
8
8
  end
9
- end
9
+ end
data/lib/kiba/parser.rb CHANGED
@@ -1,26 +1,10 @@
1
- # NOTE: using the "Kiba::Parser" declaration, as I discovered,
2
- # provides increased isolation to the declared ETL script, compared
3
- # to 2 nested modules.
4
- # Before that, a user creating entities named Control, Context
5
- # or DSLExtensions would see a conflict with Kiba own classes,
6
- # as by default instance_eval will resolve references by adding
7
- # the module containing the parser class (initially "Kiba").
8
- # Now, the classes appear to be further hidden from the user,
9
- # as Kiba::Parser is its own module.
10
- # This allows the user to create a Parser, Context, Control class
11
- # without it being interpreted as reopening Kiba::Parser, Kiba::Context,
12
- # etc.
13
- # See test in test_cli.rb (test_namespace_conflict)
14
- module Kiba::Parser
15
- def parse(source_as_string = nil, source_file = nil, &source_as_block)
16
- control = Kiba::Control.new
17
- context = Kiba::Context.new(control)
18
- if source_as_string
19
- # this somewhat weird construct allows to remove a nil source_file
20
- context.instance_eval(*[source_as_string, source_file].compact)
21
- else
1
+ module Kiba
2
+ module Parser
3
+ def parse(&source_as_block)
4
+ control = Kiba::Control.new
5
+ context = Kiba::Context.new(control)
22
6
  context.instance_eval(&source_as_block)
7
+ control
23
8
  end
24
- control
25
9
  end
26
10
  end
@@ -1,8 +1,37 @@
1
1
  module Kiba
2
2
  module StreamingRunner
3
- include Runner
4
3
  extend self
5
-
4
+
5
+ # allow to handle a block form just like a regular transform
6
+ class AliasingProc < Proc
7
+ alias_method :process, :call
8
+ end
9
+
10
+ def run(control)
11
+ run_pre_processes(control)
12
+ process_rows(
13
+ to_instances(control.sources),
14
+ to_instances(control.transforms, true),
15
+ destinations = to_instances(control.destinations)
16
+ )
17
+ close_destinations(destinations)
18
+ run_post_processes(control)
19
+ end
20
+
21
+ def run_pre_processes(control)
22
+ to_instances(control.pre_processes, true, false).each(&:call)
23
+ end
24
+
25
+ def run_post_processes(control)
26
+ to_instances(control.post_processes, true, false).each(&:call)
27
+ end
28
+
29
+ def close_destinations(destinations)
30
+ destinations
31
+ .find_all { |d| d.respond_to?(:close) }
32
+ .each(&:close)
33
+ end
34
+
6
35
  def transform_stream(stream, t)
7
36
  Enumerator.new do |y|
8
37
  stream.each do |input_row|
@@ -11,9 +40,14 @@ module Kiba
11
40
  end
12
41
  y << returned_row if returned_row
13
42
  end
43
+ if t.respond_to?(:close)
44
+ t.close do |close_row|
45
+ y << close_row
46
+ end
47
+ end
14
48
  end
15
49
  end
16
-
50
+
17
51
  def source_stream(sources)
18
52
  Enumerator.new do |y|
19
53
  sources.each do |source|
@@ -24,10 +58,33 @@ module Kiba
24
58
 
25
59
  def process_rows(sources, transforms, destinations)
26
60
  stream = source_stream(sources)
27
- recurser = lambda { |s,t| transform_stream(s, t) }
61
+ recurser = lambda { |s, t| transform_stream(s, t) }
28
62
  transforms.inject(stream, &recurser).each do |r|
29
63
  destinations.each { |d| d.write(r) }
30
64
  end
31
65
  end
66
+
67
+ def to_instances(definitions, allow_block = false, allow_class = true)
68
+ definitions.map do |definition|
69
+ to_instance(
70
+ *definition.values_at(:klass, :args, :block),
71
+ allow_block, allow_class
72
+ )
73
+ end
74
+ end
75
+
76
+ def to_instance(klass, args, block, allow_block, allow_class)
77
+ if klass && block
78
+ fail "Class and block form cannot be used together at the moment"
79
+ elsif klass
80
+ fail "Class form is not allowed here" unless allow_class
81
+ klass.new(*args)
82
+ elsif block
83
+ fail "Block form is not allowed here" unless allow_block
84
+ AliasingProc.new(&block)
85
+ else
86
+ fail "Nil parameters not allowed here"
87
+ end
88
+ end
32
89
  end
33
- end
90
+ end