pyrosome 0.3.0 → 0.4.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/README.md +29 -11
- data/bin/psome +60 -12
- data/lib/pyrosome/version.rb +1 -1
- data/pyrosome_ride.jpg +0 -0
- metadata +3 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: d1384caa550967b1e266bfccc75c925e9d1738cc
|
4
|
+
data.tar.gz: 4645453d5168e61a62632cad8403ce875fd0e530
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: ec3f223bf71aa5792cbab2c671cf0e0fa77ae0bde1782e380cc5f3efaf08eb32c263e8972db5bd01385af78507482bc4659c8b55139308f3dc05751672aa965e
|
7
|
+
data.tar.gz: d18366abae8bb47ad510c1a55c4d35ef57acf2ab170987b8a35478950607eee488602fb73b9d9c89ec848f21a922ba268003b8d91b79403da80c971131a3ad89
|
data/README.md
CHANGED
@@ -1,5 +1,9 @@
|
|
1
1
|
# Pyrosome
|
2
2
|
|
3
|
+
![Pyrosome Rider](pyrosome_ride.jpg)
|
4
|
+
|
5
|
+
Photo via [Mr Roger Fenwick](http://www.biodiversityexplorer.org/mm/tunicates/pyrostremma_spinosum.htm)
|
6
|
+
|
3
7
|
"Pyrosomes, genus Pyrosoma, are free-floating colonial tunicates that live usually in the upper layers of the open ocean in warm seas, although some may be found at greater depths. Pyrosomes are cylindrical- or conical-shaped colonies made up of hundreds to thousands of individuals, known as zooids. Colonies range in size from less than one centimeter to several metres in length."
|
4
8
|
|
5
9
|
Like Pyrosomes, files are made up of many individual pieces. The UNIX philosophy encorages operating on streams of data and Ruby embraces this in part with command line options such as `-n` and `-e` so that you can process a stream of data by specifying a Ruby script from the command line which works with a single piece of a file. This gem aims to provide the same functionality for more complex datasets such as JSON, CSV, XLS, etc...
|
@@ -21,17 +25,21 @@ Install the gem:
|
|
21
25
|
|
22
26
|
## Usage
|
23
27
|
|
28
|
+
psome [options...] script_name
|
29
|
+
|
24
30
|
The command to execute is `psome` (pronounced `p-some`, ryhmes with `roam`). It takes a stream like this:
|
25
31
|
|
26
32
|
cat file.csv | psome -i csv -e "puts _[0]"
|
27
33
|
|
28
|
-
|
34
|
+
cat file.csv | psome -i csv script.rb
|
35
|
+
|
36
|
+
You can see that we specify the format that is expected and a bit of Ruby code which exects to use a `_` variable. You can specify a Ruby script for re-usablility of code. The `_` variable will be available in your Ruby script as well.
|
29
37
|
|
30
38
|
### Arguments
|
31
39
|
|
32
40
|
#### -i [FORMAT] / --input [FORMAT]
|
33
41
|
|
34
|
-
Specify a input. `json` and `csv` currently supported
|
42
|
+
Specify a input. `json` and `csv` currently supported. If this flag isn't specified then each line of the input will be given, similar to Ruby's `-n` flag
|
35
43
|
|
36
44
|
#### -e [CODE] / --exec [CODE]
|
37
45
|
|
@@ -41,25 +49,35 @@ Give some Ruby code to be executed. A `_` variable will be in scope and will re
|
|
41
49
|
|
42
50
|
If the data is tabular, should we expect headers?
|
43
51
|
|
44
|
-
|
52
|
+
#### -f / --forks [FORK_COUNT]
|
53
|
+
|
54
|
+
Specify that parallel processing is used via separate processes and specify how many processes to use
|
45
55
|
|
46
|
-
|
56
|
+
#### -t / --threads [THREAD_COUNT]
|
47
57
|
|
48
|
-
|
49
|
-
* Threads or forks
|
50
|
-
* Way to do mutex (`sync` method?)
|
58
|
+
Specify that parallel processing is used via separate threads and specify how many threads to use
|
51
59
|
|
52
|
-
|
60
|
+
### Methods
|
53
61
|
|
54
|
-
|
62
|
+
#### ``sync``
|
63
|
+
|
64
|
+
If you are running in parallel mode you can call the `sync` method within your code to run part of it synchronously. This is particularly useful for having multiple threads/processes coordinate to write output, even if you don't care what order they do it in. If you specify that threads are to be used then the Ruby `Mutex` will do the synchronization, while for forks a temporary file with be created to `flock` to.
|
65
|
+
|
66
|
+
**Example:**
|
67
|
+
|
68
|
+
# Convert CSV to JSON:
|
69
|
+
|
70
|
+
some_unix_commands | psome -f4 -i csv -e "name, age = _[0], _[1].to_i; sync { puts({name: name, age: age}.to_json + ',') }"
|
71
|
+
|
72
|
+
## TODOs
|
55
73
|
|
56
|
-
|
74
|
+
Something like -i to do in place replacement of files?
|
57
75
|
|
58
76
|
Flag to echo sample of what you would get with the specified options (do I get an array or a hash? Does the hash have keys?)
|
59
77
|
|
60
78
|
Support for colors?
|
61
79
|
|
62
|
-
Support for different enumerators?
|
80
|
+
Support for different enumerators (select/reject particularly)? #sort_by probably wouldn't work for streaming
|
63
81
|
|
64
82
|
Support for loading an app environment (automatically load Rails if inside a project to get models?)
|
65
83
|
|
data/bin/psome
CHANGED
@@ -1,12 +1,14 @@
|
|
1
1
|
#!/usr/bin/env ruby
|
2
2
|
|
3
3
|
require 'optparse'
|
4
|
+
require 'json'
|
5
|
+
require 'csv'
|
4
6
|
|
5
7
|
VALID_FORMATS = %w(json csv)
|
6
8
|
|
7
9
|
options = {}
|
8
10
|
OptionParser.new do |opts|
|
9
|
-
opts.banner = "Usage:
|
11
|
+
opts.banner = "Usage: psome [options...] [script_name]"
|
10
12
|
|
11
13
|
opts.on("-i", "--input [FORMAT]", "Format of input (#{VALID_FORMATS.inspect})") do |i|
|
12
14
|
fail ArgumentError, "Invalid input format: #{i}" unless VALID_FORMATS.include?(i.to_s)
|
@@ -31,18 +33,64 @@ OptionParser.new do |opts|
|
|
31
33
|
opts.on('-f', '--forks [FORK_COUNT]', "Number of forks to use (parallel mode)") do |forks|
|
32
34
|
fail ArgumentError, "Invalid argument for fork count: #{forks.inspect}" if !forks.match(/^\d+$/)
|
33
35
|
|
36
|
+
fail ArgumentError, 'Cannot specify both fork and thread counts at the same time' if options[:threads]
|
37
|
+
|
34
38
|
options[:forks] = forks.to_i
|
35
39
|
end
|
40
|
+
|
41
|
+
opts.on('-t', '--threads [THREAD_COUNT]', "Number of threads to use (parallel mode)") do |threads|
|
42
|
+
fail ArgumentError, "Invalid argument for thread count: #{threads.inspect}" if !threads.match(/^\d+$/)
|
43
|
+
|
44
|
+
fail ArgumentError, 'Cannot specify both fork and thread counts at the same time' if options[:forks]
|
45
|
+
|
46
|
+
options[:threads] = threads.to_i
|
47
|
+
end
|
36
48
|
end.parse!
|
37
49
|
|
38
50
|
def process_datum(_, options)
|
39
|
-
|
51
|
+
code = if options[:code]
|
52
|
+
options[:code]
|
53
|
+
elsif ARGV[0]
|
54
|
+
"_ = #{_.inspect}; " + File.read(ARGV[0])
|
55
|
+
else
|
56
|
+
fail ArgumentError, 'Either the `-e`/`--exec` flag or a script should be specified'
|
57
|
+
end
|
40
58
|
|
41
|
-
eval(
|
42
|
-
|
43
|
-
|
44
|
-
|
59
|
+
result = eval(code)
|
60
|
+
|
61
|
+
puts result if options[:print]
|
62
|
+
end
|
63
|
+
|
64
|
+
PARALLEL_MODE = options[:forks] || options[:threads]
|
65
|
+
|
66
|
+
if PARALLEL_MODE
|
67
|
+
if options[:forks]
|
68
|
+
require 'tempfile'
|
69
|
+
LOCK_FILE_PATH = Tempfile.new('pyrosome_mutex_lockfile').path
|
70
|
+
else
|
71
|
+
MUTEX = Mutex.new
|
72
|
+
end
|
73
|
+
|
74
|
+
# Forks can't use Mutex
|
75
|
+
def sync_with_flock(&block)
|
76
|
+
fail ArgumentError, "No block specified" if block.nil?
|
77
|
+
|
78
|
+
file = File.open(LOCK_FILE_PATH)
|
79
|
+
file.flock(File::LOCK_EX)
|
80
|
+
block.call
|
81
|
+
ensure
|
82
|
+
file.flock(File::LOCK_UN)
|
83
|
+
file.close
|
84
|
+
end
|
85
|
+
|
86
|
+
def sync_with_mutex(&block)
|
87
|
+
fail ArgumentError, "No block specified" if block.nil?
|
88
|
+
|
89
|
+
MUTEX.synchronize { block.call }
|
45
90
|
end
|
91
|
+
|
92
|
+
sync_method_name = options[:forks] ? :sync_with_flock : :sync_with_mutex
|
93
|
+
define_method(:sync, method(sync_method_name))
|
46
94
|
end
|
47
95
|
|
48
96
|
def iterate(stream, options)
|
@@ -51,16 +99,16 @@ def iterate(stream, options)
|
|
51
99
|
require 'yajl'
|
52
100
|
Yajl::Parser.new.parse(STDIN)
|
53
101
|
when 'csv'
|
54
|
-
require 'csv'
|
55
102
|
CSV.new(STDIN, headers: options[:headers])
|
103
|
+
else
|
104
|
+
STDIN.each_line
|
56
105
|
end
|
57
106
|
|
58
|
-
if
|
59
|
-
puts 'parallel!'
|
107
|
+
if PARALLEL_MODE
|
60
108
|
require 'parallel'
|
61
|
-
|
62
|
-
|
63
|
-
Parallel.each(lambda { iterator.shift || Parallel::Stop },
|
109
|
+
parallel_key = options[:forks] ? :in_processes : :in_threads
|
110
|
+
|
111
|
+
Parallel.each(lambda { iterator.shift || Parallel::Stop }, parallel_key => options[:forks]) do |datum|
|
64
112
|
process_datum datum, options
|
65
113
|
end
|
66
114
|
else
|
data/lib/pyrosome/version.rb
CHANGED
data/pyrosome_ride.jpg
ADDED
Binary file
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: pyrosome
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.4.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Brian Underwood
|
8
8
|
autorequire:
|
9
9
|
bindir: exe
|
10
10
|
cert_chain: []
|
11
|
-
date: 2015-11-
|
11
|
+
date: 2015-11-29 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: yajl-ruby
|
@@ -102,6 +102,7 @@ files:
|
|
102
102
|
- lib/pyrosome.rb
|
103
103
|
- lib/pyrosome/version.rb
|
104
104
|
- pyrosome.gemspec
|
105
|
+
- pyrosome_ride.jpg
|
105
106
|
homepage: https://github.com/cheerfulstoic/pyrosome
|
106
107
|
licenses:
|
107
108
|
- MIT
|