pyrosome 0.3.0 → 0.4.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: cc5a6ff3c7d4e715388878be49a9bd371abc2115
4
- data.tar.gz: bc5610d94c47f2bfe40bb55532dd757658eb9373
3
+ metadata.gz: d1384caa550967b1e266bfccc75c925e9d1738cc
4
+ data.tar.gz: 4645453d5168e61a62632cad8403ce875fd0e530
5
5
  SHA512:
6
- metadata.gz: 196c87ac11f16c8837df5367693a6fc9d0671df65e9ffe02b67cd30b7a34fd5e27804413824665a76b3b714db92b83713f9e705a23fed0b8664ffcd6249396f2
7
- data.tar.gz: 1c9cbc02a78e8bdf43d99f1334ed3335291b84b636393be0bb3e3398b9c4ec4407bd95c24a944f9924be27dab0a4a7934c4e3fd0295c72461b761ec94e97a93b
6
+ metadata.gz: ec3f223bf71aa5792cbab2c671cf0e0fa77ae0bde1782e380cc5f3efaf08eb32c263e8972db5bd01385af78507482bc4659c8b55139308f3dc05751672aa965e
7
+ data.tar.gz: d18366abae8bb47ad510c1a55c4d35ef57acf2ab170987b8a35478950607eee488602fb73b9d9c89ec848f21a922ba268003b8d91b79403da80c971131a3ad89
data/README.md CHANGED
@@ -1,5 +1,9 @@
1
1
  # Pyrosome
2
2
 
3
+ ![Pyrosome Rider](pyrosome_ride.jpg)
4
+
5
+ Photo via [Mr Roger Fenwick](http://www.biodiversityexplorer.org/mm/tunicates/pyrostremma_spinosum.htm)
6
+
3
7
  "Pyrosomes, genus Pyrosoma, are free-floating colonial tunicates that live usually in the upper layers of the open ocean in warm seas, although some may be found at greater depths. Pyrosomes are cylindrical- or conical-shaped colonies made up of hundreds to thousands of individuals, known as zooids. Colonies range in size from less than one centimeter to several metres in length."
4
8
 
5
9
  Like Pyrosomes, files are made up of many individual pieces. The UNIX philosophy encorages operating on streams of data and Ruby embraces this in part with command line options such as `-n` and `-e` so that you can process a stream of data by specifying a Ruby script from the command line which works with a single piece of a file. This gem aims to provide the same functionality for more complex datasets such as JSON, CSV, XLS, etc...
@@ -21,17 +25,21 @@ Install the gem:
21
25
 
22
26
  ## Usage
23
27
 
28
+ psome [options...] script_name
29
+
24
30
  The command to execute is `psome` (pronounced `p-some`, ryhmes with `roam`). It takes a stream like this:
25
31
 
26
32
  cat file.csv | psome -i csv -e "puts _[0]"
27
33
 
28
- You can see that we specify the format that is expected and a bit of Ruby code which exects to use a `_` variable
34
+ cat file.csv | psome -i csv script.rb
35
+
36
+ You can see that we specify the format that is expected and a bit of Ruby code which exects to use a `_` variable. You can specify a Ruby script for re-usablility of code. The `_` variable will be available in your Ruby script as well.
29
37
 
30
38
  ### Arguments
31
39
 
32
40
  #### -i [FORMAT] / --input [FORMAT]
33
41
 
34
- Specify a input. `json` and `csv` currently supported
42
+ Specify a input. `json` and `csv` currently supported. If this flag isn't specified then each line of the input will be given, similar to Ruby's `-n` flag
35
43
 
36
44
  #### -e [CODE] / --exec [CODE]
37
45
 
@@ -41,25 +49,35 @@ Give some Ruby code to be executed. A `_` variable will be in scope and will re
41
49
 
42
50
  If the data is tabular, should we expect headers?
43
51
 
44
- ## TODOs
52
+ #### -f / --forks [FORK_COUNT]
53
+
54
+ Specify that parallel processing is used via separate processes and specify how many processes to use
45
55
 
46
- By default process lines like Ruby
56
+ #### -t / --threads [THREAD_COUNT]
47
57
 
48
- Parallel option
49
- * Threads or forks
50
- * Way to do mutex (`sync` method?)
58
+ Specify that parallel processing is used via separate threads and specify how many threads to use
51
59
 
52
- -i to do in place replacement of files
60
+ ### Methods
53
61
 
54
- Support file names as arguments.
62
+ #### ``sync``
63
+
64
+ If you are running in parallel mode you can call the `sync` method within your code to run part of it synchronously. This is particularly useful for having multiple threads/processes coordinate to write output, even if you don't care what order they do it in. If you specify that threads are to be used then the Ruby `Mutex` will do the synchronization, while for forks a temporary file with be created to `flock` to.
65
+
66
+ **Example:**
67
+
68
+ # Convert CSV to JSON:
69
+
70
+ some_unix_commands | psome -f4 -i csv -e "name, age = _[0], _[1].to_i; sync { puts({name: name, age: age}.to_json + ',') }"
71
+
72
+ ## TODOs
55
73
 
56
- Support file name for code execution instead of -e? Could files have both `_` and `ARGV` in scope so that scripts not made for pyrosome be used?
74
+ Something like -i to do in place replacement of files?
57
75
 
58
76
  Flag to echo sample of what you would get with the specified options (do I get an array or a hash? Does the hash have keys?)
59
77
 
60
78
  Support for colors?
61
79
 
62
- Support for different enumerators? Sort_by comes to mind, but that wouldn't work for streaming (or would it?). Select/reject. Does any?/all? make sense?
80
+ Support for different enumerators (select/reject particularly)? #sort_by probably wouldn't work for streaming
63
81
 
64
82
  Support for loading an app environment (automatically load Rails if inside a project to get models?)
65
83
 
data/bin/psome CHANGED
@@ -1,12 +1,14 @@
1
1
  #!/usr/bin/env ruby
2
2
 
3
3
  require 'optparse'
4
+ require 'json'
5
+ require 'csv'
4
6
 
5
7
  VALID_FORMATS = %w(json csv)
6
8
 
7
9
  options = {}
8
10
  OptionParser.new do |opts|
9
- opts.banner = "Usage: example.rb [options]"
11
+ opts.banner = "Usage: psome [options...] [script_name]"
10
12
 
11
13
  opts.on("-i", "--input [FORMAT]", "Format of input (#{VALID_FORMATS.inspect})") do |i|
12
14
  fail ArgumentError, "Invalid input format: #{i}" unless VALID_FORMATS.include?(i.to_s)
@@ -31,18 +33,64 @@ OptionParser.new do |opts|
31
33
  opts.on('-f', '--forks [FORK_COUNT]', "Number of forks to use (parallel mode)") do |forks|
32
34
  fail ArgumentError, "Invalid argument for fork count: #{forks.inspect}" if !forks.match(/^\d+$/)
33
35
 
36
+ fail ArgumentError, 'Cannot specify both fork and thread counts at the same time' if options[:threads]
37
+
34
38
  options[:forks] = forks.to_i
35
39
  end
40
+
41
+ opts.on('-t', '--threads [THREAD_COUNT]', "Number of threads to use (parallel mode)") do |threads|
42
+ fail ArgumentError, "Invalid argument for thread count: #{threads.inspect}" if !threads.match(/^\d+$/)
43
+
44
+ fail ArgumentError, 'Cannot specify both fork and thread counts at the same time' if options[:forks]
45
+
46
+ options[:threads] = threads.to_i
47
+ end
36
48
  end.parse!
37
49
 
38
50
  def process_datum(_, options)
39
- datum = _
51
+ code = if options[:code]
52
+ options[:code]
53
+ elsif ARGV[0]
54
+ "_ = #{_.inspect}; " + File.read(ARGV[0])
55
+ else
56
+ fail ArgumentError, 'Either the `-e`/`--exec` flag or a script should be specified'
57
+ end
40
58
 
41
- eval(options[:code]).tap do |result|
42
- if options[:print]
43
- puts result
44
- end
59
+ result = eval(code)
60
+
61
+ puts result if options[:print]
62
+ end
63
+
64
+ PARALLEL_MODE = options[:forks] || options[:threads]
65
+
66
+ if PARALLEL_MODE
67
+ if options[:forks]
68
+ require 'tempfile'
69
+ LOCK_FILE_PATH = Tempfile.new('pyrosome_mutex_lockfile').path
70
+ else
71
+ MUTEX = Mutex.new
72
+ end
73
+
74
+ # Forks can't use Mutex
75
+ def sync_with_flock(&block)
76
+ fail ArgumentError, "No block specified" if block.nil?
77
+
78
+ file = File.open(LOCK_FILE_PATH)
79
+ file.flock(File::LOCK_EX)
80
+ block.call
81
+ ensure
82
+ file.flock(File::LOCK_UN)
83
+ file.close
84
+ end
85
+
86
+ def sync_with_mutex(&block)
87
+ fail ArgumentError, "No block specified" if block.nil?
88
+
89
+ MUTEX.synchronize { block.call }
45
90
  end
91
+
92
+ sync_method_name = options[:forks] ? :sync_with_flock : :sync_with_mutex
93
+ define_method(:sync, method(sync_method_name))
46
94
  end
47
95
 
48
96
  def iterate(stream, options)
@@ -51,16 +99,16 @@ def iterate(stream, options)
51
99
  require 'yajl'
52
100
  Yajl::Parser.new.parse(STDIN)
53
101
  when 'csv'
54
- require 'csv'
55
102
  CSV.new(STDIN, headers: options[:headers])
103
+ else
104
+ STDIN.each_line
56
105
  end
57
106
 
58
- if options[:forks]
59
- puts 'parallel!'
107
+ if PARALLEL_MODE
60
108
  require 'parallel'
61
- puts 'parallel?'
62
- puts 'iterator.shift', iterator.shift.inspect
63
- Parallel.each(lambda { iterator.shift || Parallel::Stop }, in_processes: options[:forks]) do |datum|
109
+ parallel_key = options[:forks] ? :in_processes : :in_threads
110
+
111
+ Parallel.each(lambda { iterator.shift || Parallel::Stop }, parallel_key => options[:forks]) do |datum|
64
112
  process_datum datum, options
65
113
  end
66
114
  else
@@ -1,3 +1,3 @@
1
1
  module Pyrosome
2
- VERSION = "0.3.0"
2
+ VERSION = "0.4.0"
3
3
  end
Binary file
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: pyrosome
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.3.0
4
+ version: 0.4.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Brian Underwood
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2015-11-27 00:00:00.000000000 Z
11
+ date: 2015-11-29 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: yajl-ruby
@@ -102,6 +102,7 @@ files:
102
102
  - lib/pyrosome.rb
103
103
  - lib/pyrosome/version.rb
104
104
  - pyrosome.gemspec
105
+ - pyrosome_ride.jpg
105
106
  homepage: https://github.com/cheerfulstoic/pyrosome
106
107
  licenses:
107
108
  - MIT