advance 0.1.1 → 0.1.3
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/.gitignore +2 -0
- data/Gemfile.lock +1 -1
- data/README.md +48 -32
- data/bin/concat_csv_nh.rb +18 -0
- data/bin/csv_select_nh.rb +21 -0
- data/lib/advance.rb +6 -5
- data/lib/advance/version.rb +1 -1
- metadata +6 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 6f3f8b730fb4d9ecaeabd457c7157b2b8d44e5bb
|
4
|
+
data.tar.gz: 59bb0f09290486935b6ac303d7a8bdbd93d3e985
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 8eeaa8b0c713aa37d2041789a7a3f360ccee41ed6e7ae001063539d9d00e9da6b1fbcd050a16d0aad8a8b51a6f7d3f22c6a2938d2f3eee4b5bbf1a898a233f72
|
7
|
+
data.tar.gz: 5a95d341914d9df76f0b108e6264ff6e21a93f1b7fcb39cc5472074779a56c23c673db36ba15f3732ccfc3f0b0156fba536b15130678224f6eb89ce07cbe6ad7
|
data/.gitignore
CHANGED
data/Gemfile.lock
CHANGED
data/README.md
CHANGED
@@ -16,11 +16,19 @@ are preserved in directories prefixed with "tmp_". This isolates incomplete
|
|
16
16
|
step data and ensures that the step is re-processed when the problem is
|
17
17
|
resolved.
|
18
18
|
|
19
|
-
|
20
|
-
|
21
|
-
|
22
|
-
step
|
23
|
-
|
19
|
+
Your project utilizing Advance contains a primary ruby script
|
20
|
+
that imports Advance and includes your data transformation steps,
|
21
|
+
which we will call "your Advance script."
|
22
|
+
Each step describes a command to be run on your data. These commands can be
|
23
|
+
one of the prepackaged Advance scripts, unix commands (like split, cut,
|
24
|
+
etc), or scripts/commands that you create in whatever language is
|
25
|
+
convenient for you. Advance invokes these scripts one by one much like
|
26
|
+
you would at the command line. Advance logs the exact command that is invoked
|
27
|
+
so that you can run it yourself to check the output manually and to
|
28
|
+
debug failures.
|
29
|
+
|
30
|
+
Advance steps are composed of a step processing type function, followed
|
31
|
+
by a slug for the step, followed by the command or script. For example:
|
24
32
|
|
25
33
|
```ruby
|
26
34
|
single :unzip_7z_raw_data_file, "7z x {previous_file}"
|
@@ -31,7 +39,7 @@ multi :add_local_time, "cat {file_path} | add_local_time.rb timestamp local_time
|
|
31
39
|
|
32
40
|
The step processing functions are `single` and `multi`. `Single` applies the command
|
33
41
|
to the last output, which should be a single file. `Multi` speeds processing of multiple
|
34
|
-
files by doing
|
42
|
+
files by doing work in parallel (via the [TeamEffort gem][1]).
|
35
43
|
|
36
44
|
[1]: https://rubygems.org/gems/team_effort
|
37
45
|
|
@@ -49,18 +57,15 @@ to your script:
|
|
49
57
|
|
50
58
|
$ gem install advance
|
51
59
|
|
52
|
-
* install [bundler][3], and add
|
60
|
+
* install [bundler][3], and add Advance to your `Gemfile`:
|
53
61
|
|
54
62
|
[3]: https://rubygems.org/gems/bundler
|
55
63
|
|
56
64
|
```ruby
|
57
|
-
|
58
|
-
|
59
|
-
|
60
|
-
|
61
|
-
source "https://rubygems.org"
|
62
|
-
gem "advance"
|
63
|
-
end
|
65
|
+
source "https://rubygems.org"
|
66
|
+
|
67
|
+
gem "advance"
|
68
|
+
# other gems...
|
64
69
|
```
|
65
70
|
|
66
71
|
## Usage
|
@@ -86,10 +91,10 @@ Steps have 3 components:
|
|
86
91
|
|
87
92
|
Advance adds the bin dir of the Advance gem to PATH, so that you can invoke the
|
88
93
|
supporting advance scripts in your pipeline without specifying the full path
|
89
|
-
of the script. Advance also adds the path of
|
90
|
-
invoke scripts in the same directory as your main script without
|
91
|
-
the full path of the script. Of course, you can invoke any script
|
92
|
-
to the script is fully specified or the path is already on PATH.
|
94
|
+
of the script. Advance also adds the path of _your Advance script_ to PATH so
|
95
|
+
that you can invoke scripts in the same directory as your main script without
|
96
|
+
specifying the full path of the script. Of course, you can invoke any script
|
97
|
+
if the path to the script is fully specified or the path is already on PATH.
|
93
98
|
|
94
99
|
**Specifying Script Input and Output**
|
95
100
|
|
@@ -97,29 +102,26 @@ Since your command is transforming data, you need a way to specify the input
|
|
97
102
|
file or directory and the output file name. Advance provides a few tokens
|
98
103
|
that can be inserted in the command string for this purpose:
|
99
104
|
|
100
|
-
*
|
105
|
+
* **`{previous_file}`** indicates the output file from the previous step when
|
101
106
|
the output of the previous step was a single output file. It is also used
|
102
107
|
to indicate the first file to be used and it finds that file in the current
|
103
108
|
working dir.
|
104
|
-
*
|
109
|
+
* **`{file_path}`** indicates an output file from the previous step when the
|
105
110
|
previous step generated multiple output files and the current step is a
|
106
111
|
`multi` step.
|
107
|
-
*
|
108
|
-
{file_path}
|
112
|
+
* **`{file}`** indicates an output file name, which is the basename from
|
113
|
+
`{file_path}`. Commands often process multiple files from previous steps,
|
109
114
|
generating multiple output files. Those output files are placed in the
|
110
115
|
step directory.
|
111
|
-
*
|
116
|
+
* **`{previous_dir}`** indicates the directory of the previous step.
|
112
117
|
|
113
118
|
**Example Script**
|
114
119
|
|
115
120
|
```ruby
|
116
121
|
#!/usr/bin/env ruby
|
117
|
-
require "bundler/inline"
|
118
122
|
|
119
|
-
|
120
|
-
|
121
|
-
gem "advance"
|
122
|
-
end
|
123
|
+
require "advance"
|
124
|
+
include Advance
|
123
125
|
|
124
126
|
ensure_bin_on_path # ensures the directory for this script is on
|
125
127
|
# the path so that related scripts can be referenced
|
@@ -137,14 +139,28 @@ When running your pipeline, it is helpful to have a directory with the single, i
|
|
137
139
|
1. Move to your data directory with your single initial file.
|
138
140
|
2. invoke your script from there.
|
139
141
|
|
140
|
-
##
|
142
|
+
## Questions / Answers
|
141
143
|
|
142
|
-
|
143
|
-
|
144
|
-
To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org).
|
144
|
+
* Q: **My script fails with `undefined local variable or method 'label' for main:Object`**
|
145
145
|
|
146
|
+
A: This indicates a ruby script is running that does not have access to a ruby gem.
|
147
|
+
First, make sure your script is using the expected ruby by adding to the beginning of your
|
148
|
+
script `puts RUBY_VERSION`. Make sure the gem is installed by listing installed gems with
|
149
|
+
`$ gem list`. Finally, check that the script requires the library with `require 'my-library'`
|
150
|
+
|
146
151
|
## Contributing
|
147
152
|
|
153
|
+
We ♥️ contributions!
|
154
|
+
|
155
|
+
Found a bug? Ideally submit a pull request. And if that's not possible, make a bug report.
|
156
|
+
|
157
|
+
Did you create a data transformation script? Please consider adding it to the
|
158
|
+
script collection in Advance by submitting a pull request.
|
159
|
+
|
160
|
+
Do you find the Advance documentation lacking? Please help us improve it.
|
161
|
+
|
162
|
+
Can you translate the Advance documentation to your language?
|
163
|
+
|
148
164
|
Bug reports and pull requests are welcome on GitHub at https://github.com/doctorjane/advance.
|
149
165
|
|
150
166
|
## License
|
@@ -0,0 +1,18 @@
|
|
1
|
+
#!/usr/bin/env ruby
|
2
|
+
|
3
|
+
require "find"
|
4
|
+
|
5
|
+
def do_cmd(cmd)
|
6
|
+
system cmd
|
7
|
+
status = $?
|
8
|
+
raise "'#{cmd}' failed with #{status}" if !status.success?
|
9
|
+
end
|
10
|
+
|
11
|
+
files_dir_path = ARGV[0]
|
12
|
+
output_file = ARGV[1]
|
13
|
+
files = Find.find(files_dir_path).reject { |p| FileTest.directory?(p) || File.basename(p) == "log" }
|
14
|
+
|
15
|
+
files.each_slice(20) do |files_to_concat|
|
16
|
+
file_list = files_to_concat.join(' ')
|
17
|
+
do_cmd "gcat #{file_list} >> #{output_file}"
|
18
|
+
end
|
@@ -0,0 +1,21 @@
|
|
1
|
+
#!/usr/bin/env ruby
|
2
|
+
require 'csv'
|
3
|
+
# $stderr.puts "#{__FILE__}:#{__LINE__}"
|
4
|
+
|
5
|
+
test_proc = eval "lambda {|row| #{ARGV.shift}}"
|
6
|
+
|
7
|
+
input = CSV.new(ARGF, :headers => true, :return_headers => true, :converters => :numeric)
|
8
|
+
output = CSV.new($stdout, :headers => true, :write_headers => true)
|
9
|
+
|
10
|
+
input.each.with_index do |row, index|
|
11
|
+
# $stderr.puts "#{index}: >>#{row.to_s.chomp}<<"
|
12
|
+
if row.header_row?
|
13
|
+
output << row
|
14
|
+
next
|
15
|
+
end
|
16
|
+
|
17
|
+
if test_proc.call(row)
|
18
|
+
output << row
|
19
|
+
next
|
20
|
+
end
|
21
|
+
end
|
data/lib/advance.rb
CHANGED
@@ -1,6 +1,6 @@
|
|
1
1
|
require "advance/version"
|
2
|
-
require
|
3
|
-
|
2
|
+
require "find"
|
3
|
+
require "open3"
|
4
4
|
require "team_effort"
|
5
5
|
|
6
6
|
module Advance
|
@@ -48,9 +48,7 @@ module Advance
|
|
48
48
|
end
|
49
49
|
|
50
50
|
def previous_file_path
|
51
|
-
|
52
|
-
dir_entries_clean = dir_entries.reject { |f| File.directory?(f) || f =~ %r{^\.\.?|log} }
|
53
|
-
dir_entries_clean.first
|
51
|
+
Find.find(previous_dir_path).reject { |p| FileTest.directory?(p) || File.basename(p) == "log" }.first
|
54
52
|
end
|
55
53
|
|
56
54
|
def single(label, command)
|
@@ -61,6 +59,9 @@ module Advance
|
|
61
59
|
if command =~ /\{previous_dir\}/
|
62
60
|
command.gsub!("{previous_dir}", previous_dir_path)
|
63
61
|
end
|
62
|
+
if command =~ /\{file\}/
|
63
|
+
command.gsub!("{file}", File.basename(previous_file_path))
|
64
|
+
end
|
64
65
|
do_command command
|
65
66
|
end
|
66
67
|
end
|
data/lib/advance/version.rb
CHANGED
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: advance
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.1.
|
4
|
+
version: 0.1.3
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- janemacfarlane
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2019-01-
|
11
|
+
date: 2019-01-19 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: team_effort
|
@@ -76,8 +76,10 @@ email:
|
|
76
76
|
- jfmacfarlane@lbl.gov
|
77
77
|
executables:
|
78
78
|
- concat_csv.rb
|
79
|
+
- concat_csv_nh.rb
|
79
80
|
- console
|
80
81
|
- csv_select.rb
|
82
|
+
- csv_select_nh.rb
|
81
83
|
- csv_split_on_change.rb
|
82
84
|
- setup
|
83
85
|
- split_csv.rb
|
@@ -93,8 +95,10 @@ files:
|
|
93
95
|
- Rakefile
|
94
96
|
- advance.gemspec
|
95
97
|
- bin/concat_csv.rb
|
98
|
+
- bin/concat_csv_nh.rb
|
96
99
|
- bin/console
|
97
100
|
- bin/csv_select.rb
|
101
|
+
- bin/csv_select_nh.rb
|
98
102
|
- bin/csv_split_on_change.rb
|
99
103
|
- bin/setup
|
100
104
|
- bin/split_csv.rb
|