schleyfox-peach 0.1.2

Sign up to get free protection for your applications and to get access to all the features.
data/LICENSE ADDED
@@ -0,0 +1,22 @@
1
+ Copyright (c) 2008 Ben Hughes
2
+
3
+ Permission is hereby granted, free of charge, to any person
4
+ obtaining a copy of this software and associated documentation
5
+ files (the "Software"), to deal in the Software without
6
+ restriction, including without limitation the rights to use,
7
+ copy, modify, merge, publish, distribute, sublicense, and/or sell
8
+ copies of the Software, and to permit persons to whom the
9
+ Software is furnished to do so, subject to the following
10
+ conditions:
11
+
12
+ The above copyright notice and this permission notice shall be
13
+ included in all copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
16
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
17
+ OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
18
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
19
+ HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
20
+ WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
21
+ FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
22
+ OTHER DEALINGS IN THE SOFTWARE.
data/README ADDED
@@ -0,0 +1,22 @@
1
+ Parallel Each (for ruby with threads)
2
+
3
+ It is pretty common to have iterations over Arrays that can be safely run in parallel. With multicore chips becoming pretty common, single threaded processing is about as cool as Pog. Unfortunately, standard Ruby hates real threads pretty hardcore at the present time; however, for some ruby projects alternate VMs like JRuby do give multicores some lovin'. Peach exists to make this power simple to use with minimal code changes.
4
+
5
+ Functions like map, each, and delete_if are often used in a functional, side-effect free style. If the operation in the block is computationally intense, performance can often be gained by multithreading the process. That's where Peach comes in. In the simplest case, you are one letter away from harnessing the power of parallelism and unlocking the secret of a guilt-free tan. At this stage, the goggles are purely optional.
6
+
7
+ Using Peach
8
+
9
+ Suppose you are going about your day job hacking away at code for the WOPR when you stumble upon the code:
10
+
11
+ cities.each {|city| thermonuclear_war(city)}
12
+
13
+ Clearly, the only winning move is to declare war in parallel. With Peach, the new code is:
14
+ require 'peach'
15
+
16
+ cities.peach {|city| thermonuclear_war(city)}
17
+
18
+ Requiring peach.rb monkey patches Array into submission. Currently Peach provides peach, pmap, and pdelete_if. Each of these functions takes an optional argument n, which represents the desired number of worker threads with the default being one thread per Array element. For cheaper operations on a large number of elements, you probably want to set n to something reasonably low.
19
+
20
+ (0...10000).to_a.pmap(4) {|x| process(x)}
21
+
22
+ Constructing the threads and adding on a few layers of indirection does add a bit of overhead to the iteration especially on MRI. Keep this in mind and remember to benchmark when unsure.
data/bn/peach_bn.rb ADDED
@@ -0,0 +1,57 @@
1
+ #Benchmark for Peach <http://peach.rubyforge.org>
2
+ #Count intrawiki links in wikipedia data
3
+
4
+ require 'peach'
5
+ require 'benchmark'
6
+ require 'digest/md5'
7
+
8
+ puts "PEACH BENCHMARK"
9
+ puts "Wikipedia Processing"
10
+ puts
11
+
12
+ #Read a small slice of the Wikipedia XML file
13
+ fn = "peach_bn_data.txt"
14
+ puts "Reading in dataset #{fn}"
15
+ puts "Dataset is #{File.size(fn)/1024} kb"
16
+ dataset = ""
17
+ puts Benchmark.measure("read dataset") { dataset = File.read(fn) }
18
+
19
+ puts "Splitting dataset into articles"
20
+ articles = []
21
+ puts Benchmark.measure("split dataset") {
22
+ articles = dataset.scan(/<text xml:space=\"preserve\">.*?<\/text>/m)
23
+ articles.delete_if {|x| /#redirect/i.match(x) }
24
+ }
25
+ puts "Found #{articles.size} articles"
26
+ puts
27
+ puts "BEGIN REAL BENCHMARK"
28
+ puts
29
+ puts "map:"
30
+ links1 = []
31
+ for i in (1...5)
32
+ puts Benchmark.measure {
33
+ links1 = articles.map do |article|
34
+ article.scan(/\[\[[\w -']+?\]\]/m)
35
+ #.each do |link|
36
+ # Digest::MD5.hexdigest(article)
37
+ #end
38
+ end
39
+ }
40
+ end
41
+ puts "Found #{links1.flatten.size} links"
42
+ puts
43
+ puts "pmap:"
44
+ links2 = []
45
+ for i in (1...5)
46
+ puts Benchmark.measure {
47
+ links2 = articles.pmap(6) do |article|
48
+ article.scan(/\[\[[\w -']+?\]\]/m)
49
+ #.each do |link|
50
+ # Digest::MD5.hexdigest(link)
51
+ #end
52
+ end
53
+ }
54
+ end
55
+ puts "Found #{links2.flatten.size} links"
56
+ p links2 - links1
57
+ puts "END"
data/bn/peach_test.rb ADDED
@@ -0,0 +1,46 @@
1
+ require 'peach'
2
+ require 'benchmark'
3
+
4
+ def fac(n)
5
+ if n == 0
6
+ return 1
7
+ else
8
+ n*fac(n-1)
9
+ end
10
+ end
11
+
12
+
13
+ puts "PEACH TEST"
14
+ puts "each:"
15
+ def each_test
16
+ (0...1000).to_a.sort_by{rand}.each do |x|
17
+ fac(x)
18
+ end
19
+ end
20
+ puts Benchmark.measure { each_test }
21
+
22
+ puts "peach:"
23
+ def peach_test
24
+ (0...1000).to_a.sort_by{rand}.peach(4) do |x|
25
+ fac(x)
26
+ end
27
+ end
28
+ puts Benchmark.measure { peach_test }
29
+
30
+ puts "map:"
31
+ def map_test
32
+ (0...1000).to_a.sort_by{rand}.map do |x|
33
+ fac(x)
34
+ end
35
+ end
36
+ puts Benchmark.measure { map_test }
37
+
38
+
39
+ puts "pmap:"
40
+ def pmap_test
41
+ (0...1000).to_a.sort_by{rand}.pmap(4) do |x|
42
+ fac(x)
43
+ end
44
+ end
45
+ puts Benchmark.measure { pmap_test }
46
+
data/lib/peach.rb ADDED
@@ -0,0 +1,39 @@
1
+ class Array
2
+ def peach(n = nil, &b)
3
+ peachrun(:each, b, n)
4
+ end
5
+ def pmap(n = nil, &b)
6
+ peachrun(:map, b, n)
7
+ end
8
+ def pdelete_if(n = nil, &b)
9
+ peachrun(:delete_if, b, n)
10
+ end
11
+
12
+ protected
13
+ def peachrun(meth, b, n = nil)
14
+ threads, results, result = [],[],[]
15
+ divvy(n).each_with_index do |x,i|
16
+ if x.size > 0
17
+ threads << Thread.new { results[i] = x.send(meth, &b)}
18
+ else
19
+ results[i] = []
20
+ end
21
+ end
22
+ threads.each {|t| t.join }
23
+ results.each {|x| result += x if x}
24
+ result
25
+ end
26
+
27
+ def divvy(n = nil)
28
+ n ||= $peach_default_threads || size
29
+ lists = []
30
+ div = (size/n).floor
31
+ offset = 0
32
+ for i in (0...n-1)
33
+ lists << slice(offset, div)
34
+ offset += div
35
+ end
36
+ lists << slice(offset...size)
37
+ lists
38
+ end
39
+ end
Binary file
data/web/index.html ADDED
@@ -0,0 +1,128 @@
1
+ <html>
2
+ <head>
3
+ <title>Peach - Parallel Each</title>
4
+ <style>
5
+ pre {
6
+ background-color: #f1f1f3;
7
+ color: #112;
8
+ padding: 10px;
9
+ font-size: 1.1em;
10
+ overflow: auto;
11
+ margin: 4px 0px;
12
+ width: 95%;
13
+ }
14
+
15
+
16
+
17
+ /* Syntax highlighting */
18
+ pre .normal {}
19
+ pre .comment { color: #005; font-style: italic; }
20
+ pre .keyword { color: #A00; font-weight: bold; }
21
+ pre .method { color: #077; }
22
+ pre .class { color: #074; }
23
+ pre .module { color: #050; }
24
+ pre .punct { color: #447; font-weight: bold; }
25
+ pre .symbol { color: #099; }
26
+ pre .string { color: #944; background: #FFE; }
27
+ pre .char { color: #F07; }
28
+ pre .ident { color: #004; }
29
+ pre .constant { color: #07F; }
30
+ pre .regex { color: #B66; background: #FEF; }
31
+ pre .number { color: #F99; }
32
+ pre .attribute { color: #5bb; }
33
+ pre .global { color: #7FB; }
34
+ pre .expr { color: #227; }
35
+ pre .escape { color: #277; }
36
+ </style>
37
+ </head>
38
+ <body style="background-color: #ecad27;">
39
+ <center><div style="border: dotted #000 1px;
40
+ background-color: #fff;
41
+ width: 745px;
42
+ text-align: left;">
43
+ <img src="Peach.sketch.png" alt="Peach">
44
+ <div style ="padding: 0px 20px 20px 20px;">
45
+ <h1>Parallel Each <small><small>
46
+ (for ruby with threads)
47
+ </small></small></h1>
48
+ <p>
49
+ It is pretty common to have iterations over Arrays that can be safely
50
+ run in parallel. With multicore chips becoming pretty common,
51
+ single threaded processing is about as cool as Pog. Unfortunately,
52
+ standard Ruby hates real threads pretty hardcore at the present time;
53
+ however, for some ruby projects alternate VMs like
54
+ <a href="http://jruby.codehaus.org/" title="JRuby: Coolest Thing Ever">
55
+ JRuby</a> do give multicores some lovin'. <i>Peach</i> exists to
56
+ make this power simple to use with minimal code changes.
57
+ </p>
58
+ <p>Functions like <tt>map</tt>, <tt>each</tt>, and <tt>delete_if</tt>
59
+ are often used in a functional, side-effect free style. If the
60
+ operation in the block is computationally intense, performance can
61
+ often be gained by multithreading the process. That's where
62
+ <i>Peach</i> comes in. In the simplest case, you are one letter away
63
+ from harnessing the power of parallelism and unlocking the secret of
64
+ a guilt-free tan. At this stage, the goggles are purely optional.
65
+ </p>
66
+ <h2>Using Peach</h2>
67
+ <p>Suppose you are going about your day job hacking away at code for
68
+ the <a href="http://en.wikipedia.org/wiki/WOPR">WOPR</a> when you
69
+ stumble upon the code:
70
+ </p>
71
+ <pre><span class=ident>cities</span><span class=punct>.</span><span class=ident>each</span> <span class=punct>{|</span><span class=ident>city</span><span class=punct>|</span> <span class=ident>thermonuclear_war</span><span class=punct>(</span><span class=ident>city</span><span class=punct>)}</span>
72
+ </pre>
73
+ <p>Clearly, the only winning move is to declare war in parallel. With
74
+ <i>Peach</i>, the new code is:
75
+ <pre>
76
+ <span class="ident">require</span> <span class="punct">'</span><span class="string">peach</span><span class="punct">'</span>
77
+
78
+ <span class=ident>cities</span><span class=punct>.</span><span class=ident>peach</span> <span class=punct>{|</span><span class=ident>city</span><span class=punct>|</span> <span class=ident>thermonuclear_war</span><span class=punct>(</span><span class=ident>city</span><span class=punct>)}</span>
79
+ </pre>
80
+ <p>
81
+ Requiring peach.rb monkey patches Array into submission.
82
+ Currently <i>Peach</i> provides <tt>peach</tt>, <tt>pmap</tt>, and
83
+ <tt>pdelete_if</tt>. Each of these functions takes an optional
84
+ argument <i>n</i>, which represents the desired number of worker
85
+ threads with the default being one thread per Array element. For
86
+ cheaper operations on a large number of elements, you probably want
87
+ to set <i>n</i> to something reasonably low.
88
+ </p>
89
+ <pre><span class="punct">(</span><span class="number">0</span><span class="punct">...</span><span class="number">10000</span><span class="punct">).</span><span class="ident">to_a</span><span class="punct">.</span><span class="ident">pmap</span><span class="punct">(</span><span class="number">4</span><span class="punct">)</span> <span class="punct">{|</span><span class="ident">x</span><span class="punct">|</span> <span class="ident">process</span><span class="punct">(</span><span class="ident">x</span><span class="punct">)}</span>
90
+ </pre>
91
+ <p>
92
+ Constructing the threads and adding on a few layers of indirection does
93
+ add a bit of overhead to the iteration especially on MRI. Keep this in
94
+ mind and remember to benchmark when unsure.
95
+ <h3>Syntax (without all the words)</h3>
96
+ <pre><span class="ident">require</span> <span class="punct">'</span><span class="string">peach</span><span class="punct">'</span>
97
+
98
+ <span class="punct">[</span><span class="number">1</span><span class="punct">,</span><span class="number">2</span><span class="punct">,</span><span class="number">3</span><span class="punct">,</span><span class="number">4</span><span class="punct">].</span><span class="ident">peach</span><span class="punct">{|</span><span class="ident">x</span><span class="punct">|</span> <span class="ident">f</span><span class="punct">(</span><span class="ident">x</span><span class="punct">)}</span> <span class="comment">#Spawns 4 threads, =&gt; [1,2,3,4]</span>
99
+ <span class="punct">[</span><span class="number">1</span><span class="punct">,</span><span class="number">2</span><span class="punct">,</span><span class="number">3</span><span class="punct">,</span><span class="number">4</span><span class="punct">].</span><span class="ident">pmap</span><span class="punct">{|</span><span class="ident">x</span><span class="punct">|</span> <span class="ident">f</span><span class="punct">(</span><span class="ident">x</span><span class="punct">)}</span> <span class="comment">#Spawns 4 threads, =&gt; [f(1),f(2),f(3),f(4)]</span>
100
+ <span class="punct">[</span><span class="number">1</span><span class="punct">,</span><span class="number">2</span><span class="punct">,</span><span class="number">3</span><span class="punct">,</span><span class="number">4</span><span class="punct">].</span><span class="ident">pdelete_if</span><span class="punct">{|</span><span class="ident">x</span><span class="punct">|</span> <span class="ident">x</span> <span class="punct">&gt;</span> <span class="number">2</span><span class="punct">}</span> <span class="comment">#Spawns 4 threads, =&gt; [3,4]</span>
101
+
102
+
103
+ <span class="punct">[</span><span class="number">1</span><span class="punct">,</span><span class="number">2</span><span class="punct">,</span><span class="number">3</span><span class="punct">,</span><span class="number">4</span><span class="punct">].</span><span class="ident">peach</span><span class="punct">(</span><span class="number">2</span><span class="punct">){|</span><span class="ident">x</span><span class="punct">|</span> <span class="ident">f</span><span class="punct">(</span><span class="ident">x</span><span class="punct">)}</span> <span class="comment">#Spawns 2 threads, =&gt; [1,2,3,4]</span>
104
+ <span class="punct">[</span><span class="number">1</span><span class="punct">,</span><span class="number">2</span><span class="punct">,</span><span class="number">3</span><span class="punct">,</span><span class="number">4</span><span class="punct">].</span><span class="ident">pmap</span><span class="punct">(</span><span class="number">2</span><span class="punct">){|</span><span class="ident">x</span><span class="punct">|</span> <span class="ident">f</span><span class="punct">(</span><span class="ident">x</span><span class="punct">)}</span> <span class="comment">#Spawns 2 threads, =&gt; [f(1),f(2),f(3),f(4)]</span>
105
+ <span class="punct">[</span><span class="number">1</span><span class="punct">,</span><span class="number">2</span><span class="punct">,</span><span class="number">3</span><span class="punct">,</span><span class="number">4</span><span class="punct">].</span><span class="ident">pdelete_if</span><span class="punct">(</span><span class="number">2</span><span class="punct">){|</span><span class="ident">x</span><span class="punct">|</span> <span class="ident">x</span> <span class="punct">&gt;</span> <span class="number">2</span><span class="punct">}</span> <span class="comment">#Spawns 2 threads, =&gt; [3,4]</span>
106
+ </pre>
107
+ <h2>FAQ</h2>
108
+ <p><b>Q: I use normal ruby (MRI 1.8 or 1.9), will Peach confer superpowers and great performance upon my code?</b><br/>
109
+ A: No, on MRI your code will be slightly slower because of the increased overhead for Thread creation. MRI is singlethreaded so Peach will not make it magically parallel.</p>
110
+ <p><b>Q: Why should I switch to JRuby to get the benefits of Peach?</b><br/>
111
+ A: Switching to JRuby for code that needs better performance is a good idea even without Peach. JRuby is insanely fast and a good idea. The multithreading and ability to use this humble utility is just another feature.</p>
112
+ <p><b>Q: Benchmarks?</b><br/>
113
+ A: I am pretty bad at benchmarking code, but I do have a simple test comparing performance between <tt>map</tt> and <tt>pmap</tt> on MRI and JRuby. Headius helped in the preparation of these materials. <a href="http://pastie.caboo.se/177240">Check it out</a>. JRuby on Java 1.6 is <a href="http://pastie.caboo.se/177263">even faster</a>. If you come up with any benchmarks, do let me know.</p>
114
+
115
+ <h2>Eat a Peach</h2>
116
+ <p>
117
+ <i>Peach</i> is distributed as a gem from github, so:<br/>
118
+ <tt>gem install schleyfox-peach --source=http://gems.github.com</tt>.
119
+ <ul>
120
+ <li>Project Page: <a href="http://rubyforge.org/projects/peach/">
121
+ RubyForge</a>, <a href="http://github.com/schleyfox/peach">Github</a></li>
122
+ </ul>
123
+ </p>
124
+ </div>
125
+ </div></center>
126
+ </body>
127
+ </html>
128
+
metadata ADDED
@@ -0,0 +1,59 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: schleyfox-peach
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.2
5
+ platform: ruby
6
+ authors:
7
+ - Ben Hughes
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+
12
+ date: 2008-05-06 00:00:00 -07:00
13
+ default_executable:
14
+ dependencies: []
15
+
16
+ description:
17
+ email: ben@pixelmachine.org
18
+ executables: []
19
+
20
+ extensions: []
21
+
22
+ extra_rdoc_files: []
23
+
24
+ files:
25
+ - README
26
+ - LICENSE
27
+ - lib/peach.rb
28
+ - bn/peach_bn.rb
29
+ - bn/peach_test.rb
30
+ - web/index.html
31
+ - web/Peach.sketch.png
32
+ has_rdoc: false
33
+ homepage: http://peach.rubyforge.org
34
+ post_install_message:
35
+ rdoc_options: []
36
+
37
+ require_paths:
38
+ - lib
39
+ required_ruby_version: !ruby/object:Gem::Requirement
40
+ requirements:
41
+ - - ">="
42
+ - !ruby/object:Gem::Version
43
+ version: "0"
44
+ version:
45
+ required_rubygems_version: !ruby/object:Gem::Requirement
46
+ requirements:
47
+ - - ">="
48
+ - !ruby/object:Gem::Version
49
+ version: "0"
50
+ version:
51
+ requirements: []
52
+
53
+ rubyforge_project:
54
+ rubygems_version: 1.0.1
55
+ signing_key:
56
+ specification_version: 2
57
+ summary: Parallel Each and other parallel things
58
+ test_files: []
59
+