advance 0.1.1 → 0.1.3

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: b554bb49d7d5003e2fc5031754afe1a4c08f7741
4
- data.tar.gz: 5231a634d47aca8a8d15a578b0b4e0d2f7b17fd9
3
+ metadata.gz: 6f3f8b730fb4d9ecaeabd457c7157b2b8d44e5bb
4
+ data.tar.gz: 59bb0f09290486935b6ac303d7a8bdbd93d3e985
5
5
  SHA512:
6
- metadata.gz: 02c498e7694e15b3b59cb2ac69582b0ddefab93de3b2863cc838e4b700dcc08ec873801268cbcb0fc47f308d705ce31f4d7db03a6fc11c9fa5d4d8791f590757
7
- data.tar.gz: 3526bb8ed0671d1cc43a1aae89b91ecaaa541d5143526acec587cdcdef9a1e605483f068449f7d279080197dee6c5ae9bf5e32d87dfb40d9f184430d31044e87
6
+ metadata.gz: 8eeaa8b0c713aa37d2041789a7a3f360ccee41ed6e7ae001063539d9d00e9da6b1fbcd050a16d0aad8a8b51a6f7d3f22c6a2938d2f3eee4b5bbf1a898a233f72
7
+ data.tar.gz: 5a95d341914d9df76f0b108e6264ff6e21a93f1b7fcb39cc5472074779a56c23c673db36ba15f3732ccfc3f0b0156fba536b15130678224f6eb89ce07cbe6ad7
data/.gitignore CHANGED
@@ -7,3 +7,5 @@
7
7
  /spec/reports/
8
8
  /tmp/
9
9
  .idea
10
+ .ruby-gemset
11
+ .ruby-version
data/Gemfile.lock CHANGED
@@ -1,7 +1,7 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- advance (0.1.1)
4
+ advance (0.1.3)
5
5
  team_effort
6
6
 
7
7
  GEM
data/README.md CHANGED
@@ -16,11 +16,19 @@ are preserved in directories prefixed with "tmp_". This isolates incomplete
16
16
  step data and ensures that the step is re-processed when the problem is
17
17
  resolved.
18
18
 
19
- Advance scripts are easy to understand. They are ruby scripts,
20
- composed of a series of function calls that invoke your scripts
21
- or commands to transform your data. Each step is composed of a
22
- step processing type function, followed by a
23
- slug for the step, followed by the command or script. For example:
19
+ Your project utilizing Advance contains a primary ruby script
20
+ that imports Advance and includes your data transformation steps,
21
+ which we will call "your Advance script."
22
+ Each step describes a command to be run on your data. These commands can be
23
+ one of the prepackaged Advance scripts, unix commands (like split, cut,
24
+ etc), or scripts/commands that you create in whatever language is
25
+ convenient for you. Advance invokes these scripts one by one much like
26
+ you would at the command line. Advance logs the exact command that is invoked
27
+ so that you can run it yourself to check the output manually and to
28
+ debug failures.
29
+
30
+ Advance steps are composed of a step processing type function, followed
31
+ by a slug for the step, followed by the command or script. For example:
24
32
 
25
33
  ```ruby
26
34
  single :unzip_7z_raw_data_file, "7z x {previous_file}"
@@ -31,7 +39,7 @@ multi :add_local_time, "cat {file_path} | add_local_time.rb timestamp local_time
31
39
 
32
40
  The step processing functions are `single` and `multi`. `Single` applies the command
33
41
  to the last output, which should be a single file. `Multi` speeds processing of multiple
34
- files by doing working in parallel (via the [TeamEffort gem][1]).
42
+ files by doing work in parallel (via the [TeamEffort gem][1]).
35
43
 
36
44
  [1]: https://rubygems.org/gems/team_effort
37
45
 
@@ -49,18 +57,15 @@ to your script:
49
57
 
50
58
  $ gem install advance
51
59
 
52
- * install [bundler][3], and add this ruby snippet to the beginning of your script:
60
+ * install [bundler][3], and add Advance to your `Gemfile`:
53
61
 
54
62
  [3]: https://rubygems.org/gems/bundler
55
63
 
56
64
  ```ruby
57
- #!/usr/bin/env ruby
58
- require "bundler/inline"
59
-
60
- gemfile do
61
- source "https://rubygems.org"
62
- gem "advance"
63
- end
65
+ source "https://rubygems.org"
66
+
67
+ gem "advance"
68
+ # other gems...
64
69
  ```
65
70
 
66
71
  ## Usage
@@ -86,10 +91,10 @@ Steps have 3 components:
86
91
 
87
92
  Advance adds the bin dir of the Advance gem to PATH, so that you can invoke the
88
93
  supporting advance scripts in your pipeline without specifying the full path
89
- of the script. Advance also adds the path of your script to PATH so that you can
90
- invoke scripts in the same directory as your main script without specifying
91
- the full path of the script. Of course, you can invoke any script if the path
92
- to the script is fully specified or the path is already on PATH.
94
+ of the script. Advance also adds the path of _your Advance script_ to PATH so
95
+ that you can invoke scripts in the same directory as your main script without
96
+ specifying the full path of the script. Of course, you can invoke any script
97
+ if the path to the script is fully specified or the path is already on PATH.
93
98
 
94
99
  **Specifying Script Input and Output**
95
100
 
@@ -97,29 +102,26 @@ Since your command is transforming data, you need a way to specify the input
97
102
  file or directory and the output file name. Advance provides a few tokens
98
103
  that can be inserted in the command string for this purpose:
99
104
 
100
- * **{previous_file}** indicates the output file from the previous step when
105
+ * **`{previous_file}`** indicates the output file from the previous step when
101
106
  the output of the previous step was a single output file. It is also used
102
107
  to indicate the first file to be used and it finds that file in the current
103
108
  working dir.
104
- * **{file_path}** indicates an output file from the previous step when the
109
+ * **`{file_path}`** indicates an output file from the previous step when the
105
110
  previous step generated multiple output files and the current step is a
106
111
  `multi` step.
107
- * **{file}** indicates an output file name, which is the basename from
108
- {file_path}. Commands often process multiple files from previous steps,
112
+ * **`{file}`** indicates an output file name, which is the basename from
113
+ `{file_path}`. Commands often process multiple files from previous steps,
109
114
  generating multiple output files. Those output files are placed in the
110
115
  step directory.
111
- * **{previous_dir}** indicates the directory a previous step.
116
+ * **`{previous_dir}`** indicates the directory of the previous step.
112
117
 
113
118
  **Example Script**
114
119
 
115
120
  ```ruby
116
121
  #!/usr/bin/env ruby
117
- require "bundler/inline"
118
122
 
119
- gemfile do
120
- source "https://rubygems.org"
121
- gem "advance"
122
- end
123
+ require "advance"
124
+ include Advance
123
125
 
124
126
  ensure_bin_on_path # ensures the directory for this script is on
125
127
  # the path so that related scripts can be referenced
@@ -137,14 +139,28 @@ When running your pipeline, it is helpful to have a directory with the single, i
137
139
  1. Move to your data directory with your single initial file.
138
140
  2. invoke your script from there.
139
141
 
140
- ## Development
142
+ ## Questions / Answers
141
143
 
142
- After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake test` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
143
-
144
- To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org).
144
+ * Q: **My script fails with `undefined local variable or method 'label' for main:Object`**
145
145
 
146
+ A: This indicates a ruby script is running that does not have access to a ruby gem.
147
+ First, make sure your script is using the expected ruby by adding to the beginning of your
148
+ script `puts RUBY_VERSION`. Make sure the gem is installed by listing installed gems with
149
+ `$ gem list`. Finally, check that the script requires the library with `require 'my-library'`
150
+
146
151
  ## Contributing
147
152
 
153
+ We ♥️ contributions!
154
+
155
+ Found a bug? Ideally submit a pull request. And if that's not possible, make a bug report.
156
+
157
+ Did you create a data transformation script? Please consider adding it to the
158
+ script collection in Advance by submitting a pull request.
159
+
160
+ Do you find the Advance documentation lacking? Please help us improve it.
161
+
162
+ Can you translate the Advance documentation to your language?
163
+
148
164
  Bug reports and pull requests are welcome on GitHub at https://github.com/doctorjane/advance.
149
165
 
150
166
  ## License
@@ -0,0 +1,18 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require "find"
4
+
5
+ def do_cmd(cmd)
6
+ system cmd
7
+ status = $?
8
+ raise "'#{cmd}' failed with #{status}" if !status.success?
9
+ end
10
+
11
+ files_dir_path = ARGV[0]
12
+ output_file = ARGV[1]
13
+ files = Find.find(files_dir_path).reject { |p| FileTest.directory?(p) || File.basename(p) == "log" }
14
+
15
+ files.each_slice(20) do |files_to_concat|
16
+ file_list = files_to_concat.join(' ')
17
+ do_cmd "gcat #{file_list} >> #{output_file}"
18
+ end
@@ -0,0 +1,21 @@
1
+ #!/usr/bin/env ruby
2
+ require 'csv'
3
+ # $stderr.puts "#{__FILE__}:#{__LINE__}"
4
+
5
+ test_proc = eval "lambda {|row| #{ARGV.shift}}"
6
+
7
+ input = CSV.new(ARGF, :headers => true, :return_headers => true, :converters => :numeric)
8
+ output = CSV.new($stdout, :headers => true, :write_headers => true)
9
+
10
+ input.each.with_index do |row, index|
11
+ # $stderr.puts "#{index}: >>#{row.to_s.chomp}<<"
12
+ if row.header_row?
13
+ output << row
14
+ next
15
+ end
16
+
17
+ if test_proc.call(row)
18
+ output << row
19
+ next
20
+ end
21
+ end
data/lib/advance.rb CHANGED
@@ -1,6 +1,6 @@
1
1
  require "advance/version"
2
- require 'open3'
3
-
2
+ require "find"
3
+ require "open3"
4
4
  require "team_effort"
5
5
 
6
6
  module Advance
@@ -48,9 +48,7 @@ module Advance
48
48
  end
49
49
 
50
50
  def previous_file_path
51
- dir_entries = Dir.glob(File.join(previous_dir_path, "*"))
52
- dir_entries_clean = dir_entries.reject { |f| File.directory?(f) || f =~ %r{^\.\.?|log} }
53
- dir_entries_clean.first
51
+ Find.find(previous_dir_path).reject { |p| FileTest.directory?(p) || File.basename(p) == "log" }.first
54
52
  end
55
53
 
56
54
  def single(label, command)
@@ -61,6 +59,9 @@ module Advance
61
59
  if command =~ /\{previous_dir\}/
62
60
  command.gsub!("{previous_dir}", previous_dir_path)
63
61
  end
62
+ if command =~ /\{file\}/
63
+ command.gsub!("{file}", File.basename(previous_file_path))
64
+ end
64
65
  do_command command
65
66
  end
66
67
  end
@@ -1,3 +1,3 @@
1
1
  module Advance
2
- VERSION = "0.1.1"
2
+ VERSION = "0.1.3"
3
3
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: advance
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.1
4
+ version: 0.1.3
5
5
  platform: ruby
6
6
  authors:
7
7
  - janemacfarlane
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2019-01-10 00:00:00.000000000 Z
11
+ date: 2019-01-19 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: team_effort
@@ -76,8 +76,10 @@ email:
76
76
  - jfmacfarlane@lbl.gov
77
77
  executables:
78
78
  - concat_csv.rb
79
+ - concat_csv_nh.rb
79
80
  - console
80
81
  - csv_select.rb
82
+ - csv_select_nh.rb
81
83
  - csv_split_on_change.rb
82
84
  - setup
83
85
  - split_csv.rb
@@ -93,8 +95,10 @@ files:
93
95
  - Rakefile
94
96
  - advance.gemspec
95
97
  - bin/concat_csv.rb
98
+ - bin/concat_csv_nh.rb
96
99
  - bin/console
97
100
  - bin/csv_select.rb
101
+ - bin/csv_select_nh.rb
98
102
  - bin/csv_split_on_change.rb
99
103
  - bin/setup
100
104
  - bin/split_csv.rb