raka 0.2.3

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: a3391bad82ef017d820e266d897d570d32f6a62c3b09382b4b6ad8d1f8ead6fe
4
+ data.tar.gz: 39aa088fc9e87bda8a7ad14f60678b934ca006de0d78c893da5678f0a1082740
5
+ SHA512:
6
+ metadata.gz: b0b426ad5076914630e192345d0796dfa1c30c580f3709d6c88773170ee8e18148c9795f6f7c2619bc5fa22983f0f50f4ed34e09880fc3a8004cbd773b138a5e
7
+ data.tar.gz: db00045e711049a23735dc9358b7faca10528a07b893307becd8eb4d73a90002430a19f6a4e5bb228366455b5066ee1e27b0b609b9b2192768fb8ea33131986a
data/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2016 yarray
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,261 @@
1
+ **Raka** is a **DSL**(Domain Specific Language) on top of **Rak**e for defining and running d**a**t**a** processing workflows. Raka is specifically designed for data processing with improved pattern matching, scopes, language extensions and lots of conventions to prevent verbosity.
2
+
3
+ ## Why Raka
4
+
5
+ Data processing tasks can involve plenty of steps, each with its dependencies. Compared to bare Rake or the more classical Make, Raka offers the following advantages:
6
+
7
+ 1. Advanced pattern matching and template resolving to define general rules and maximize code reuse.
8
+ 2. Extensible and context-aware protocol architecture
9
+ 3. Multilingual. Other programming languages can be easily embedded
10
+ 4. Auto dependency and naming by conventions
11
+ 5. Support scopes to ease comparative studies
12
+ 6. Terser syntax
13
+
14
+ ... and more.
15
+
16
+ ## Usage
17
+
18
+ Raka is a drop-in library for rake. Though rake is cross platform, raka may not work on Windows since it relies some shell facilities. To use raka, one has to install ruby and rake first. Ruby is available for most \*nix systems including Mac OSX so the only task is to install rake like:
19
+
20
+ ```bash
21
+ gem install rake
22
+ ```
23
+
24
+ The next step is to clone this project to local machine, cd to the directory, and install the gem:
25
+
26
+ ```bash
27
+ gem install pkg/raka-0.1.0.gem
28
+ ```
29
+
30
+ ## For the Impatient
31
+
32
+ First create a file named `main.raka` and import & initialize the DSL
33
+
34
+ ```ruby
35
+ require 'raka'
36
+
37
+ dsl = DSL.new(self,
38
+ output_types: [:txt, :table, :pdf, :idx],
39
+ input_types: [:txt, :table]
40
+ )
41
+ ```
42
+
43
+ Then the code below will define two simple rules:
44
+
45
+ ```ruby
46
+ txt.sort.first50 = shell* "cat sort.txt | head -n 50 > $@"
47
+ txt.sort = [txt.input] | shell* "cat $< | sort -rn > $@"
48
+ ```
49
+
50
+ For testing let's prepare an input file named `input.txt`:
51
+
52
+ ```bash
53
+ seq 1000 > input.txt
54
+ ```
55
+
56
+ We can then invoke `rake first50.txt`, the script will read data from _input.txt_, sort the numbers descendingly and get the first 50 lines.
57
+
58
+ The workflow here is as follows:
59
+
60
+ 1. Try to find _first50\_\_sort.txt_: not exists
61
+ 2. Rule `txt.sort.first50` matched
62
+ 3. For rule `txt.sort.first50`, find input file _sort.txt_ or _sort.table_. Neither exists
63
+ 4. Rule `txt.sort` matched
64
+ 5. Rule `txt.sort` has no input but a depended target `txt.input`
65
+ 6. Find file _input.txt_ or _input.table_. Use the former
66
+ 7. Run rule `txt.sort` and create _sort.txt_
67
+ 8. Run rule `txt.sort.first50` and create _first50\_\_sort.txt_
68
+
69
+ This illustrates some basic ideas but may not be particularly interesting. Following is a much more sophisticated example from real world research which covers more features.
70
+
71
+ ```ruby
72
+ SRC_DIR = File.absolute_path 'src'
73
+ USER = 'postgres'
74
+ DB = 'osm'
75
+ HOST = 'localhost'
76
+ PORT = 5432
77
+
78
+ def idx_this() [idx._('$(output_stem)')] end
79
+
80
+ dsl.scope :de
81
+
82
+ idx._ = psqlf(script_name: '$stem_idx.sql')
83
+ pdf.buildings.func['(\S+)_graph'] = r(:graph)* %[
84
+ table_input("$(input_stem)") | draw_%{func0} | ggplot_output('$@') ]
85
+ table.buildings = [csv.admin] | psqlf(admin: '$<') | idx_this
86
+ ```
87
+
88
+ Assume that we have a schema named _de_ in database _osm_, have a input file _admin.csv_, and have _graph.R_ and _buildings.sql_ under _src/_. Now further assume that _graph.R_ contains two functions:
89
+
90
+ ```r
91
+ draw_stat_snapshot <- function(d) { ... }
92
+ draw_user_trend <- function(d) { ... }
93
+ ```
94
+
95
+ ...and _buildings.sql_ contains table creation code like:
96
+
97
+ ```sql
98
+ DROP TABLE IF EXISTS buildings;
99
+ CREATE TABLE buildings AS ( ... );
100
+ ```
101
+
102
+ We may also have a _buildings_idx.sql_ to create index for the table.
103
+
104
+ Then we can run either `rake de/stat_snapshot_graph__buildings.pdf` or `rake de/user_trend_graph__buildings.pdf`, which will do a bunch of things at first run (take the former as example):
105
+
106
+ 1. Target file not found.
107
+ 2. Rule `pdf.buildings.func['(\S+)_graph']` matched. "stat_snapshot_graph" is bound to `func` and "stat_snapshot" is bound to `func0`.
108
+ 3. None of the four possible input files: _de/buildings.table_, _de/buildings.txt_, _buildings.table_, _buildings.txt_ can be found. Rule `table.buildings` is matched and the only dependecy file _admin.csv_ is found.
109
+ 4. The protocol `psqlf` finds the source file _src/buildings.sql_, intepolate the options with automatic variables (`$<` as "admin.csv"), run the sql, and create a placeholder file _de/buildings.table_ afterwards.
110
+ 5. Run the post-job `idx_this`, according to the rule `idx._` it will find and run _buildings_idx.sql_, then create a placeholder file _de/buildings.idx_.
111
+ 6. For rule `pdf.buildings.func['(\S+)_graph']`, the R code in `%[]` is interpolated with several automatic variables (`$(input_stem)` as "buildings", `$@` as "de/stat_snapshot_graph\_\_buildings.pdf") and the variables (`func`, `func0`) bound before.
112
+ 7. Run the R code. The _buildings_ table is piped into the function `draw_snapshot_graph` and then output to `ggplot_output`, which writes the graph to the specified pdf file.
113
+
114
+ ## Syntax of Rules
115
+
116
+ It is possible to use Raka with little knowledge of ruby / rake, though minimal understandings are highly recommended. The formal syntax of rule can be defined as follows (EBNF form):
117
+
118
+ ```ebnf
119
+ rule = lexpr "=" {target_list "|"} protocol {"|" target_list};
120
+
121
+ target = rexpr | template;
122
+
123
+ target_list = "[]" | "[" target {"," target} "]";
124
+
125
+ lexpr = ext "." {ltoken "."} ltoken;
126
+ rexpr = ext "." rtoken {"." rtoken};
127
+
128
+ ltoken = word | word "[" pattern "]";
129
+ rtoken = word | word "(" template ")";
130
+
131
+ word = ("_" | letter) { letter | digit | "_" };
132
+
133
+ protocol = ("shell" | "r" | "psql") ("*" template | BLOCK )
134
+ | "psqlf" | "psqlf" "(" HASH ")";
135
+ ```
136
+
137
+ The corresponding railroad diagrams are:
138
+
139
+ ![](https://cdn.rawgit.com/yarray/raka/master/doc/figures/rule.svg)
140
+
141
+ ![](https://cdn.rawgit.com/yarray/raka/master/doc/figures/target.svg)
142
+
143
+ ![](https://cdn.rawgit.com/yarray/raka/master/doc/figures/target_list.svg)
144
+
145
+ ![](https://cdn.rawgit.com/yarray/raka/master/doc/figures/lexpr.svg)
146
+
147
+ ![](https://cdn.rawgit.com/yarray/raka/master/doc/figures/rexpr.svg)
148
+
149
+ ![](https://cdn.rawgit.com/yarray/raka/master/doc/figures/ltoken.svg)
150
+
151
+ ![](https://cdn.rawgit.com/yarray/raka/master/doc/figures/rtoken.svg)
152
+
153
+ ![](https://cdn.rawgit.com/yarray/raka/master/doc/figures/word.svg)
154
+
155
+ ![](https://cdn.rawgit.com/yarray/raka/master/doc/figures/protocol.svg)
156
+
157
+ The definition is concise but several details are omitted for simplicity:
158
+
159
+ 1. **BLOCK** and **HASH** is ruby's block and hash object.
160
+ 2. A **template** is just a ruby string, but with some placeholders (see the next section for details)
161
+ 3. A **pattern** is just a ruby string which represents regex (see the next section for details)
162
+ 4. The listed protocols are merely what we offered now. It can be greatly extended.
163
+ 5. Nearly any concept in the syntax can be replaced by a suitable ruby variable.
164
+
165
+ ## Pattern matching and template resolving
166
+
167
+ When defined a rule like `lexpr = rexpr`, the left side represents a pattern and the right side contains specifications for extra dependecies, actions and some targets to create thereafter. When raking a target file, the left sides of the rules will be examined one by one until a rule is matched. The matching process based on Regex also support named captures so that some varibales can be bound for use in the right side.
168
+
169
+ The specifications on the right side of a rule can be incomplete from various aspects, that is, they can contains some templates. The "holes" in the templates will be fulfilled by automatic variables and variables bounded when matching the left side.
170
+
171
+ ### Pattern matching
172
+
173
+ To match a given _file_ with a `lexpr`, asides the extension, the substrings of the file name between "\_\_" are mapped to tokens separated by `.`, in reverse order. After that, each substring is matched to the corresponding token or the regex in `[]`. For example, the rule
174
+
175
+ ```ruby
176
+ pdf.buildings.indicator['\S+'].top['top_(\d+)']
177
+ ```
178
+
179
+ can match "top_50\_\_node_num\_\_buildings.pdf". The logical process is:
180
+
181
+ 1. The extension `pdf` matches.
182
+ 2. The substrings and the tokens are paired and they all match:
183
+ - `buildings ~ buildings`
184
+ - `'\S+' ~ node_num`
185
+ - `top_(\d+) ~ top_50`
186
+ 3. Two levels of captures are made. First, 'node_num' is captured as `indicator`, 'top_50' is captured as `top`; Second, '50' is captured as `top0` since `\d+` is wrapped in parenthesis and is the first.
187
+
188
+ One can write special token `_` or `something[]` if the captured value is useful later, as the syntax sugar of `something['\S+']`.
189
+
190
+ ### Template resolving
191
+
192
+ In some places of `rexpr`, templates can be written instead of strings, so that it can represent different values at runtime. There are two types of variables that can be used in templates. The first is automatic variables, which is just like `$@` in Make or `task.name` in Rake. We even preserve some Make conventions for easier migrations. All automatic varibales begin with `$`. The possible automatic variables are:
193
+
194
+ | symbol | meaning | symbol | meaning |
195
+ | -------------- | ---------------------- | ------------- | ------------------------------- |
196
+ | \$@ | output file | \$^ | all dependecies (sep by spaces) |
197
+ | \$< | first dependency | $0, $1, … \$i | ith depdency |
198
+ | \$(scope) | scope for current task | \$(output_stem) | stem of the output file |
199
+ | \$(input_stem) | stem of the input file | | |
200
+
201
+ The other type of variables are those bounded during pattern matching,which can be referred to using `%{var}`. In the example of the [pattern matching](###pattern-matching) section, `%{indicator}` will be replaced by `node_num`, `%{top}` will be replaced by `top_50` and `%{top0}` will be replaced by `50`. In such case, a template as `'calculate top %{top0} of %{indicator} for $@'` will be resolved as `'calculate top 50 of node_num for top_50__node_num__buildings.pdf'`
202
+
203
+ The replacement of variables happen before any process to the template string. So do not include the symbols for automatic variables or `%{<anything>}` in templates.
204
+
205
+ Templates can happen in various places. For depdencies and post jobs, tokens with parenthesis can wrap in templates, like `csv._('%{indicator}')`. The symbol of a token with parenthesis is of no use and is generally omitted. It is also possible to write template literal directly, i.e. `'%{indicator}.csv'`. Where templates can be applied in actions depends on the protocols and will be explained later in the [Protocols](###protocols) section
206
+
207
+ ## APIs
208
+
209
+ ### Initialization and options
210
+
211
+ These APIs are bounded to an instance of DSL, you can create the object at the top:
212
+
213
+ ```ruby
214
+ dsl = DSL.new(<env>, <options>)
215
+ ```
216
+
217
+ The argument `<env>` should be the _self_ of a running Rakefile. In most case you can directly write:
218
+
219
+ ```ruby
220
+ dsl = DSL.new(self, <options>)
221
+ ```
222
+
223
+ The argument `options` currently support `output_types` and `input_types`. For each item in `output_types`, you will get an extra function to bootstrap a rule. For example, with
224
+
225
+ ```ruby
226
+ dsl = DSL.new(self, { output_types: [:csv, :pdf] })
227
+ ```
228
+
229
+ you can write these rules like:
230
+
231
+ ```ruby
232
+ csv.data = ...
233
+ pdf.graph = ...
234
+ ```
235
+
236
+ which will generate data.csv and graph.pdf
237
+
238
+ The `input_types` involves the strategy to find inputs. For example, raka will try to find both _numbers.csv_ and _numbers.table_ for a rule like `table.numbers.mean = …` if `input_type = [:csv, :table]`.
239
+
240
+ ### Scope
241
+
242
+ ### Protocols
243
+
244
+ Currently Raka support 4 lang: shell, psql, r and psqlf.
245
+
246
+ ```ruby
247
+ shell(base_dir='./')* code::templ_str { |task| ... }
248
+ psql(options={})* code::templ_str { |task| ... }
249
+ r(src:str, libs=[])* code::templ_str { |task| ... }
250
+
251
+ # options = { script_name: , script_file: , params: }
252
+ psqlf(options={})
253
+ ```
254
+
255
+ ## Rakefile Template
256
+
257
+ ## Write your own protocols
258
+
259
+ ## Compare to other tools
260
+
261
+ Raka borrows some ideas from Drake but not much (currently mainly the name "protocol"). Briefly we have different visions and maybe different suitable senarios.
data/VERSION ADDED
@@ -0,0 +1 @@
1
+ 0.2.3
data/lib/compile.rb ADDED
@@ -0,0 +1,173 @@
1
+ # frozen_string_literal: true
2
+
3
+ require 'fileutils'
4
+
5
+ require_relative './token'
6
+
7
+ def array_to_hash(array)
8
+ array.nil? ? {} : Hash[((0...array.size).map { |i| i.to_s.to_sym }).zip array]
9
+ end
10
+
11
+ def protect_percent_symbol(text)
12
+ anchor = '-_-_-'
13
+ safe_text = text.gsub(/%(?=[^\s{]+)/, anchor) # replace % not in shape of %{ to special sign
14
+ safe_text = yield safe_text
15
+ safe_text.gsub(anchor, '%') # replace % not in shape of %{ to special sign
16
+ end
17
+
18
+ # compiles rule (lhs = rhs) to rake task
19
+ class DSLCompiler
20
+ attr_reader :env
21
+
22
+ # keep env as running environment of rake since we want to inject rules
23
+ def initialize(env, options)
24
+ @env = env
25
+ @options = options
26
+ end
27
+
28
+ # Raka task structure, input task is rake's task pushed into blocks
29
+ def dsl_task(token, task)
30
+ name = task.name
31
+ deps = task.prerequisites
32
+
33
+ output_info = token._parse_output_ name
34
+ task_info = {
35
+ name: name,
36
+ deps: deps,
37
+ deps_str: deps.join(','),
38
+ input: deps.first || '',
39
+ task: task
40
+ }
41
+ OpenStruct.new(output_info.to_h.merge(task_info))
42
+ end
43
+
44
+ # resolve auto variables with only output info,
45
+ # useful when resolve extra deps (task is not available yet)
46
+ def resolve_by_output(target, output_info)
47
+ info = output_info
48
+ text = target.respond_to?(:_template_) ? target._template_(info.scope).to_s : target.to_s
49
+ text = text
50
+ .gsub('$(scope)', info.scope.nil? ? '' : info.scope)
51
+ .gsub('$(target_scope)', info.target_scope.nil? ? '' : info.target_scope)
52
+ .gsub('$(output)', info.output)
53
+ .gsub('$(output_stem)', info.stem)
54
+ .gsub('$(input_stem)', info.input_stem.nil? ? '' : info.input_stem)
55
+ .gsub('$@', info.name)
56
+
57
+ protect_percent_symbol text do |safe_text|
58
+ safe_text = safe_text % (info.to_h.merge info.captures.to_h)
59
+ safe_text = safe_text.gsub(/\$\(rule_scope(\d+)\)/, '%{\1}') % array_to_hash(info.rule_scopes)
60
+ safe_text.gsub(/\$\(target_scope(\d+)\)/, '%{\1}') % array_to_hash(info.target_scope_captures)
61
+ end
62
+ end
63
+
64
+ # resolve auto variables with dsl task
65
+ def resolve(target, task)
66
+ # convert target to text whether it is expression or already text
67
+ text = resolve_by_output target, task
68
+
69
+ # convert $0, $1 to the universal shape of %{dep} as captures
70
+ text = text
71
+ .gsub('$^', task.deps_str)
72
+ .gsub('$<', task.input || '')
73
+ .gsub('$(deps)', task.deps_str)
74
+ .gsub('$(input)', task.input || '')
75
+
76
+ protect_percent_symbol text do |safe_text|
77
+ # add numbered auto variables like $0, $2 referring to the first and third deps
78
+ safe_text.gsub(/\$\(dep(\d+)\)/, '%{\1}') % array_to_hash(task.deps)
79
+ end
80
+ end
81
+
82
+ def rule_action(lhs, actions, extra_tasks, task)
83
+ return if actions.empty?
84
+
85
+ task = dsl_task(lhs, task)
86
+ @env.logger.info "raking #{task.name}"
87
+ unless task.scope.nil?
88
+ folder = task.scope
89
+ folder = File.join(task.scope, task.target_scope) unless task.target_scope.nil?
90
+ FileUtils.makedirs(folder)
91
+ end
92
+ actions.each do |action|
93
+ action.call @env, task do |code|
94
+ resolve(code, task)
95
+ end
96
+ end
97
+
98
+ extra_tasks.each do |templ|
99
+ Rake::Task[resolve(templ, task)].invoke
100
+ end
101
+ end
102
+
103
+ # build one rule
104
+ def create_rule(lhs, input_ext, actions, extra_deps, extra_tasks)
105
+ # the "rule" method is private, maybe here are better choices
106
+ @env.send(:rule, lhs._pattern_ => [proc do |target|
107
+ inputs = lhs._inputs_(target, input_ext)
108
+ output = lhs._parse_output_(target)
109
+ plain_extra_deps = extra_deps.map do |templ|
110
+ resolve_by_output(templ, output)
111
+ end
112
+ # main data source and extra dependencies
113
+ inputs + plain_extra_deps
114
+ end]) do |task|
115
+ # rake continue task even if dependencies not met, we handle ourselves
116
+ absence = task.prerequisites.find_index { |f| !File.exist? f }
117
+ unless absence.nil?
118
+ @env.logger.warn\
119
+ "Dependent #{task.prerequisites[absence]} does not exist, skip task #{task.name}"
120
+ next
121
+ end
122
+ rule_action(lhs, actions, extra_tasks, task)
123
+ end
124
+ end
125
+
126
+ # compile token = rhs to rake rule
127
+ # rubocop:disable Style/MethodLength
128
+ def compile(lhs, rhs)
129
+ unless @env.instance_of?(Object)
130
+ raise "DSL compile error: seems not a valid @env of rake with class #{@env.class}"
131
+ end
132
+
133
+ # the format is [dep, ...] | [action, ...] | [post, ...], where the posts
134
+ # are those will be raked after the actions
135
+ actions_start = rhs.find_index { |item| item.respond_to?(:call) }
136
+
137
+ # case 1: has action
138
+ if actions_start
139
+ extra_deps = rhs[0, actions_start]
140
+ actions_end = rhs[actions_start, rhs.length].find_index do |item|
141
+ !item.respond_to?(:call)
142
+ end
143
+
144
+ # case 1.1: has post
145
+ if actions_end
146
+ actions_end += actions_start
147
+ actions = rhs[actions_start, actions_end]
148
+ extra_tasks = rhs[actions_end, rhs.length]
149
+ # case 1.2: no post
150
+ else
151
+ actions = rhs[actions_start, rhs.length]
152
+ extra_tasks = []
153
+ end
154
+ # case 2: no action
155
+ else
156
+ extra_deps = rhs
157
+ actions = []
158
+ extra_tasks = []
159
+ end
160
+
161
+ unless lhs._input_?
162
+ create_rule lhs, proc { [] }, actions, extra_deps, extra_tasks
163
+ return
164
+ end
165
+
166
+ # We generate a rule for each possible input type
167
+ @options.input_types.each do |ext|
168
+ # We find auto source from both THE scope and the root
169
+ create_rule lhs, ext, actions, extra_deps, extra_tasks
170
+ end
171
+ end
172
+ end
173
+ # rubocop:enable Style/MethodLength
data/lib/interface.rbs ADDED
@@ -0,0 +1,23 @@
1
+ class RakaTask
2
+ attr_reader name: String
3
+ attr_reader stem: String
4
+ attr_reader func: String?
5
+ attr_reader input_stem: String?
6
+ attr_reader scope: String?
7
+ attr_reader target_scope: String?
8
+ attr_reader scopes: Array[String]
9
+ attr_reader target_scope_captures: Array[String]
10
+ attr_reader captures: Hash[String, String]
11
+ attr_reader deps: Array[String]
12
+ attr_reader deps_str: String
13
+ attr_reader input: String
14
+ attr_reader task: Object # RakeTask
15
+ end
16
+
17
+ class RakaEnv
18
+ end
19
+
20
+ class LanguageImpl
21
+ def build: (String code, RakaTask task) -> String
22
+ def run_script: (RakaEnv env, String fname, RakaTask task) -> nil
23
+ end
@@ -0,0 +1,59 @@
1
+ # frozen_string_literal: true
2
+
3
+ require_relative '../../protocol'
4
+
5
+ def bash(env, cmd)
6
+ code = remove_common_indent(
7
+ %(set -e
8
+ set -o pipefail
9
+
10
+ #{cmd}
11
+ )
12
+ )
13
+ # puts code
14
+ env.send :sh, 'bash ' + create_tmp(code)
15
+ end
16
+
17
+ # postgresql protocol using psql, requires HOST, PORT, USER, DB
18
+ class Psql
19
+ # Sometimes we want to use the psql command with bash directly
20
+ def sh_cmd(scope)
21
+ c = @conn
22
+ env_vars = "PGOPTIONS='-c search_path=#{scope ? scope + ',' : ''}public' "
23
+ "PGPASSWORD=#{c.password} #{env_vars} psql -h #{c.host} -p #{c.port} -U #{c.user} -d #{c.db} -v ON_ERROR_STOP=1"
24
+ end
25
+
26
+ # 1. do not add required argument here, so psql.config will work or we can only use psql(conn: xxx).config
27
+ def initialize(conn: nil, create: 'mview', params: {})
28
+ @create = create
29
+ @params = params
30
+ @conn = conn
31
+ end
32
+
33
+ def build(code, _)
34
+ # 2. lazily check the argument only when used
35
+ raise 'argument conn required' if @conn.nil?
36
+
37
+ if @create.to_s == 'table'
38
+ 'DROP TABLE IF EXISTS :_name_;' \
39
+ 'CREATE TABLE :_name_ AS (' + code + ');'
40
+ elsif @create.to_s == 'mview'
41
+ 'DROP MATERIALIZED VIEW IF EXISTS :_name_;' \
42
+ 'CREATE MATERIALIZED VIEW :_name_ AS (' + code + ');'
43
+ else
44
+ code
45
+ end
46
+ end
47
+
48
+ def run_script(env, fname, task)
49
+ param_str = (@params || {}).map { |k, v| "-v #{k}=\"#{v}\"" }.join(' ')
50
+
51
+ bash env, %(
52
+ #{sh_cmd(task.scope)} #{param_str} -v _name_=#{task.stem} \
53
+ -f #{fname} | tee #{fname}.log
54
+ mv #{fname}.log #{task.name}
55
+ )
56
+ end
57
+ end
58
+
59
+ creator :psql, Psql
@@ -0,0 +1,32 @@
1
+ # frozen_string_literal: true
2
+
3
+ require_relative '../../protocol'
4
+
5
+ # Binding for python language, allow specifying imports and paths
6
+ class Python
7
+ # @implements LanguageImpl
8
+ def initialize(libs: [], paths: [], runner: 'python')
9
+ common_aliases = {
10
+ pandas: :pd,
11
+ numpy: :np
12
+ }.freeze
13
+
14
+ libs = libs.map(&:to_s) # convert all to strings
15
+ @imports = libs.map { |lib| "import #{lib}" }
16
+ common_aliases.each do |name, short|
17
+ @imports.push("import #{name} as #{short}") if libs.include? name.to_s
18
+ end
19
+ @paths = ['import sys'] + paths.map { |path| "sys.path.append('#{path}')" }
20
+ @runner = runner
21
+ end
22
+
23
+ def build(code, _task)
24
+ (@paths + @imports + [code]).join "\n"
25
+ end
26
+
27
+ def run_script(env, fname, _task)
28
+ run_cmd(env, "#{@runner} #{fname}")
29
+ end
30
+ end
31
+
32
+ creator :py, Python
@@ -0,0 +1,38 @@
1
+ # frozen_string_literal: true
2
+
3
+ require_relative '../protocol'
4
+
5
+ # r language protocol
6
+ class R
7
+ def initialize(src, libs = [], **kwargs)
8
+ @src = src
9
+ @libs = libs
10
+ super(**kwargs)
11
+ end
12
+
13
+ def build(code, _)
14
+ libraries = ([
15
+ :pipeR
16
+ ] + @libs).map { |name| "suppressPackageStartupMessages(library(#{name}))" }
17
+
18
+ sources = ["source('#{File.dirname(__FILE__)}/io.R')"] +
19
+ (@src ? [@src] : []).map { |name| "source('#{SRC_DIR}/#{name}.R')" }
20
+
21
+ extra = [
22
+ '`|` <- `%>>%`',
23
+ "conn_args <- list(host='#{HOST}', user='#{USER}', dbname='#{DB}', port='#{PORT}')",
24
+ 'args <- commandArgs(trailingOnly = T)',
25
+ 'sql_input <- init_sql_input(conn_args, args[1])',
26
+ 'table_input <- init_table_input(conn_args, args[1])',
27
+ 'table_output <- init_table_output(conn_args, args[1])'
28
+ ]
29
+
30
+ [libraries, sources, extra, code].join "\n"
31
+ end
32
+
33
+ def run_script(env, fname, task)
34
+ env.send :sh, "Rscript #{fname} '#{task.scope || 'public'}'"
35
+ end
36
+ end
37
+
38
+ creator :r, R