advance 0.1.1 → 0.1.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.gitignore +2 -0
- data/Gemfile.lock +1 -1
- data/README.md +48 -32
- data/bin/concat_csv_nh.rb +18 -0
- data/bin/csv_select_nh.rb +21 -0
- data/lib/advance.rb +6 -5
- data/lib/advance/version.rb +1 -1
- metadata +6 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 6f3f8b730fb4d9ecaeabd457c7157b2b8d44e5bb
|
4
|
+
data.tar.gz: 59bb0f09290486935b6ac303d7a8bdbd93d3e985
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 8eeaa8b0c713aa37d2041789a7a3f360ccee41ed6e7ae001063539d9d00e9da6b1fbcd050a16d0aad8a8b51a6f7d3f22c6a2938d2f3eee4b5bbf1a898a233f72
|
7
|
+
data.tar.gz: 5a95d341914d9df76f0b108e6264ff6e21a93f1b7fcb39cc5472074779a56c23c673db36ba15f3732ccfc3f0b0156fba536b15130678224f6eb89ce07cbe6ad7
|
data/.gitignore
CHANGED
data/Gemfile.lock
CHANGED
data/README.md
CHANGED
@@ -16,11 +16,19 @@ are preserved in directories prefixed with "tmp_". This isolates incomplete
|
|
16
16
|
step data and ensures that the step is re-processed when the problem is
|
17
17
|
resolved.
|
18
18
|
|
19
|
-
|
20
|
-
|
21
|
-
|
22
|
-
step
|
23
|
-
|
19
|
+
Your project utilizing Advance contains a primary ruby script
|
20
|
+
that imports Advance and includes your data transformation steps,
|
21
|
+
which we will call "your Advance script."
|
22
|
+
Each step describes a command to be run on your data. These commands can be
|
23
|
+
one of the prepackaged Advance scripts, unix commands (like split, cut,
|
24
|
+
etc), or scripts/commands that you create in whatever language is
|
25
|
+
convenient for you. Advance invokes these scripts one by one much like
|
26
|
+
you would at the command line. Advance logs the exact command that is invoked
|
27
|
+
so that you can run it yourself to check the output manually and to
|
28
|
+
debug failures.
|
29
|
+
|
30
|
+
Advance steps are composed of a step processing type function, followed
|
31
|
+
by a slug for the step, followed by the command or script. For example:
|
24
32
|
|
25
33
|
```ruby
|
26
34
|
single :unzip_7z_raw_data_file, "7z x {previous_file}"
|
@@ -31,7 +39,7 @@ multi :add_local_time, "cat {file_path} | add_local_time.rb timestamp local_time
|
|
31
39
|
|
32
40
|
The step processing functions are `single` and `multi`. `Single` applies the command
|
33
41
|
to the last output, which should be a single file. `Multi` speeds processing of multiple
|
34
|
-
files by doing
|
42
|
+
files by doing work in parallel (via the [TeamEffort gem][1]).
|
35
43
|
|
36
44
|
[1]: https://rubygems.org/gems/team_effort
|
37
45
|
|
@@ -49,18 +57,15 @@ to your script:
|
|
49
57
|
|
50
58
|
$ gem install advance
|
51
59
|
|
52
|
-
* install [bundler][3], and add
|
60
|
+
* install [bundler][3], and add Advance to your `Gemfile`:
|
53
61
|
|
54
62
|
[3]: https://rubygems.org/gems/bundler
|
55
63
|
|
56
64
|
```ruby
|
57
|
-
|
58
|
-
|
59
|
-
|
60
|
-
|
61
|
-
source "https://rubygems.org"
|
62
|
-
gem "advance"
|
63
|
-
end
|
65
|
+
source "https://rubygems.org"
|
66
|
+
|
67
|
+
gem "advance"
|
68
|
+
# other gems...
|
64
69
|
```
|
65
70
|
|
66
71
|
## Usage
|
@@ -86,10 +91,10 @@ Steps have 3 components:
|
|
86
91
|
|
87
92
|
Advance adds the bin dir of the Advance gem to PATH, so that you can invoke the
|
88
93
|
supporting advance scripts in your pipeline without specifying the full path
|
89
|
-
of the script. Advance also adds the path of
|
90
|
-
invoke scripts in the same directory as your main script without
|
91
|
-
the full path of the script. Of course, you can invoke any script
|
92
|
-
to the script is fully specified or the path is already on PATH.
|
94
|
+
of the script. Advance also adds the path of _your Advance script_ to PATH so
|
95
|
+
that you can invoke scripts in the same directory as your main script without
|
96
|
+
specifying the full path of the script. Of course, you can invoke any script
|
97
|
+
if the path to the script is fully specified or the path is already on PATH.
|
93
98
|
|
94
99
|
**Specifying Script Input and Output**
|
95
100
|
|
@@ -97,29 +102,26 @@ Since your command is transforming data, you need a way to specify the input
|
|
97
102
|
file or directory and the output file name. Advance provides a few tokens
|
98
103
|
that can be inserted in the command string for this purpose:
|
99
104
|
|
100
|
-
*
|
105
|
+
* **`{previous_file}`** indicates the output file from the previous step when
|
101
106
|
the output of the previous step was a single output file. It is also used
|
102
107
|
to indicate the first file to be used and it finds that file in the current
|
103
108
|
working dir.
|
104
|
-
*
|
109
|
+
* **`{file_path}`** indicates an output file from the previous step when the
|
105
110
|
previous step generated multiple output files and the current step is a
|
106
111
|
`multi` step.
|
107
|
-
*
|
108
|
-
{file_path}
|
112
|
+
* **`{file}`** indicates an output file name, which is the basename from
|
113
|
+
`{file_path}`. Commands often process multiple files from previous steps,
|
109
114
|
generating multiple output files. Those output files are placed in the
|
110
115
|
step directory.
|
111
|
-
*
|
116
|
+
* **`{previous_dir}`** indicates the directory of the previous step.
|
112
117
|
|
113
118
|
**Example Script**
|
114
119
|
|
115
120
|
```ruby
|
116
121
|
#!/usr/bin/env ruby
|
117
|
-
require "bundler/inline"
|
118
122
|
|
119
|
-
|
120
|
-
|
121
|
-
gem "advance"
|
122
|
-
end
|
123
|
+
require "advance"
|
124
|
+
include Advance
|
123
125
|
|
124
126
|
ensure_bin_on_path # ensures the directory for this script is on
|
125
127
|
# the path so that related scripts can be referenced
|
@@ -137,14 +139,28 @@ When running your pipeline, it is helpful to have a directory with the single, i
|
|
137
139
|
1. Move to your data directory with your single initial file.
|
138
140
|
2. invoke your script from there.
|
139
141
|
|
140
|
-
##
|
142
|
+
## Questions / Answers
|
141
143
|
|
142
|
-
|
143
|
-
|
144
|
-
To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org).
|
144
|
+
* Q: **My script fails with `undefined local variable or method 'label' for main:Object`**
|
145
145
|
|
146
|
+
A: This indicates a ruby script is running that does not have access to a ruby gem.
|
147
|
+
First, make sure your script is using the expected ruby by adding to the beginning of your
|
148
|
+
script `puts RUBY_VERSION`. Make sure the gem is installed by listing installed gems with
|
149
|
+
`$ gem list`. Finally, check that the script requires the library with `require 'my-library'`
|
150
|
+
|
146
151
|
## Contributing
|
147
152
|
|
153
|
+
We ♥️ contributions!
|
154
|
+
|
155
|
+
Found a bug? Ideally submit a pull request. And if that's not possible, make a bug report.
|
156
|
+
|
157
|
+
Did you create a data transformation script? Please consider adding it to the
|
158
|
+
script collection in Advance by submitting a pull request.
|
159
|
+
|
160
|
+
Do you find the Advance documentation lacking? Please help us improve it.
|
161
|
+
|
162
|
+
Can you translate the Advance documentation to your language?
|
163
|
+
|
148
164
|
Bug reports and pull requests are welcome on GitHub at https://github.com/doctorjane/advance.
|
149
165
|
|
150
166
|
## License
|
@@ -0,0 +1,18 @@
|
|
1
|
+
#!/usr/bin/env ruby
|
2
|
+
|
3
|
+
require "find"
|
4
|
+
|
5
|
+
def do_cmd(cmd)
|
6
|
+
system cmd
|
7
|
+
status = $?
|
8
|
+
raise "'#{cmd}' failed with #{status}" if !status.success?
|
9
|
+
end
|
10
|
+
|
11
|
+
files_dir_path = ARGV[0]
|
12
|
+
output_file = ARGV[1]
|
13
|
+
files = Find.find(files_dir_path).reject { |p| FileTest.directory?(p) || File.basename(p) == "log" }
|
14
|
+
|
15
|
+
files.each_slice(20) do |files_to_concat|
|
16
|
+
file_list = files_to_concat.join(' ')
|
17
|
+
do_cmd "gcat #{file_list} >> #{output_file}"
|
18
|
+
end
|
@@ -0,0 +1,21 @@
|
|
1
|
+
#!/usr/bin/env ruby
|
2
|
+
require 'csv'
|
3
|
+
# $stderr.puts "#{__FILE__}:#{__LINE__}"
|
4
|
+
|
5
|
+
test_proc = eval "lambda {|row| #{ARGV.shift}}"
|
6
|
+
|
7
|
+
input = CSV.new(ARGF, :headers => true, :return_headers => true, :converters => :numeric)
|
8
|
+
output = CSV.new($stdout, :headers => true, :write_headers => true)
|
9
|
+
|
10
|
+
input.each.with_index do |row, index|
|
11
|
+
# $stderr.puts "#{index}: >>#{row.to_s.chomp}<<"
|
12
|
+
if row.header_row?
|
13
|
+
output << row
|
14
|
+
next
|
15
|
+
end
|
16
|
+
|
17
|
+
if test_proc.call(row)
|
18
|
+
output << row
|
19
|
+
next
|
20
|
+
end
|
21
|
+
end
|
data/lib/advance.rb
CHANGED
@@ -1,6 +1,6 @@
|
|
1
1
|
require "advance/version"
|
2
|
-
require
|
3
|
-
|
2
|
+
require "find"
|
3
|
+
require "open3"
|
4
4
|
require "team_effort"
|
5
5
|
|
6
6
|
module Advance
|
@@ -48,9 +48,7 @@ module Advance
|
|
48
48
|
end
|
49
49
|
|
50
50
|
def previous_file_path
|
51
|
-
|
52
|
-
dir_entries_clean = dir_entries.reject { |f| File.directory?(f) || f =~ %r{^\.\.?|log} }
|
53
|
-
dir_entries_clean.first
|
51
|
+
Find.find(previous_dir_path).reject { |p| FileTest.directory?(p) || File.basename(p) == "log" }.first
|
54
52
|
end
|
55
53
|
|
56
54
|
def single(label, command)
|
@@ -61,6 +59,9 @@ module Advance
|
|
61
59
|
if command =~ /\{previous_dir\}/
|
62
60
|
command.gsub!("{previous_dir}", previous_dir_path)
|
63
61
|
end
|
62
|
+
if command =~ /\{file\}/
|
63
|
+
command.gsub!("{file}", File.basename(previous_file_path))
|
64
|
+
end
|
64
65
|
do_command command
|
65
66
|
end
|
66
67
|
end
|
data/lib/advance/version.rb
CHANGED
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: advance
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.1.
|
4
|
+
version: 0.1.3
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- janemacfarlane
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2019-01-
|
11
|
+
date: 2019-01-19 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: team_effort
|
@@ -76,8 +76,10 @@ email:
|
|
76
76
|
- jfmacfarlane@lbl.gov
|
77
77
|
executables:
|
78
78
|
- concat_csv.rb
|
79
|
+
- concat_csv_nh.rb
|
79
80
|
- console
|
80
81
|
- csv_select.rb
|
82
|
+
- csv_select_nh.rb
|
81
83
|
- csv_split_on_change.rb
|
82
84
|
- setup
|
83
85
|
- split_csv.rb
|
@@ -93,8 +95,10 @@ files:
|
|
93
95
|
- Rakefile
|
94
96
|
- advance.gemspec
|
95
97
|
- bin/concat_csv.rb
|
98
|
+
- bin/concat_csv_nh.rb
|
96
99
|
- bin/console
|
97
100
|
- bin/csv_select.rb
|
101
|
+
- bin/csv_select_nh.rb
|
98
102
|
- bin/csv_split_on_change.rb
|
99
103
|
- bin/setup
|
100
104
|
- bin/split_csv.rb
|