forkandreturn 0.1.0 → 0.1.1

Sign up to get free protection for your applications and to get access to all the features.
data/CHANGELOG ADDED
@@ -0,0 +1,16 @@
1
+ 0.1.1 (19-07-2008)
2
+
3
+ * Added example.txt.
4
+
5
+ * at_exit blocks defined in the child itself will be executed
6
+ in the child, whereas at_exit blocks defined in the parent
7
+ won't be executed in the child. (In the previous version, the
8
+ at_exit blocks defined in the child weren't be executed at
9
+ all.)
10
+
11
+ * Added File#chmod(0600), so the temporary file with the
12
+ intermediate results can't be read by other people.
13
+
14
+ 0.1.0 (13-07-2008)
15
+
16
+ * First release.
data/README CHANGED
@@ -3,9 +3,63 @@ running a block of code in a subprocess. The result (Ruby
3
3
  object or exception) of the block will be available in the
4
4
  parent process.
5
5
 
6
+ The intermediate return value (or exception) will be
7
+ Marshal'led to disk. This means that it is possible to
8
+ (concurrently) run thousands of child process, with a relative
9
+ low memory footprint. Just gather the results once all child
10
+ process are done. ForkAndReturn will handle the writing,
11
+ reading and deleting of the temporary file.
12
+
13
+ The core of these methods is fork_and_return_core(). It returns
14
+ some nested lambdas, which are handled by the other methods and
15
+ by Enumerable#concurrent_collect(). These lambdas handle the
16
+ WAITing, LOADing and RESULTing (explained in
17
+ fork_and_return_core()).
18
+
19
+ The child process exits with Process.exit!(), so at_exit()
20
+ blocks are skipped in the child process. However, both $stdout
21
+ and $stderr will be flushed.
22
+
23
+ Only Marshal'lable Ruby objects can be returned.
24
+
6
25
  ForkAndReturn uses Process.fork(), so it only runs on platforms
7
26
  where Process.fork() is implemented.
8
27
 
9
- ForkAndReturn implements the low level stuff. Enumerable is
10
- enriched with some methods which should be used instead of
11
- ForkAndReturn under normal circumstances.
28
+ Example: (See example.txt for another example.)
29
+
30
+ [1, 2, 3, 4].collect do |object|
31
+ Thread.fork do
32
+ ForkAndReturn.fork_and_return do
33
+ 2*object
34
+ end
35
+ end
36
+ end.collect do |thread|
37
+ thread.value
38
+ end # ===> [2, 4, 6, 8]
39
+
40
+ This runs each "2*object" in a seperate process. Hopefully, the
41
+ processes are spread over all available CPU's. That's a simple
42
+ way of parallel processing! Although
43
+ Enumerable#concurrent_collect() is even simpler:
44
+
45
+ [1, 2, 3, 4].concurrent_collect do |object|
46
+ 2*object
47
+ end # ===> [2, 4, 6, 8]
48
+
49
+ Note that the code in the block is run in a seperate process,
50
+ so updating objects and variables in the block won't affect the
51
+ parent process:
52
+
53
+ count = 0
54
+ [...].concurrent_collect do
55
+ count += 1
56
+ end
57
+ count # ==> 0
58
+
59
+ Enuemerable#concurrent_collect() is suitable for handling a
60
+ couple of very CPU intensive jobs, like parsing large XML files.
61
+
62
+ Enuemerable#clustered_concurrent_collect() is suitable for
63
+ handling a lot of not too CPU intensive jobs. The situations
64
+ where the overhead of forking is too expensive, but where you
65
+ still want to use all available CPU's.
data/VERSION CHANGED
@@ -1 +1 @@
1
- 0.1.0
1
+ 0.1.1
data/example.txt ADDED
@@ -0,0 +1,158 @@
1
+ AN EXAMPLE OF MULTICORE-PROGRAMMING WITH FORKANDRETURN
2
+
3
+ We've got 42 GZ-files. Compressed, that's 44,093,076 bytes. The
4
+ question is: How much bytes is it if we decompress it?
5
+
6
+ We could do this, if we're on a Unix machine:
7
+
8
+ $ time zcat *.gz | wc -c
9
+ 860187300
10
+
11
+ real 0m7.266s
12
+ user 0m5.810s
13
+ sys 0m3.341s
14
+
15
+ Can we do it without the Unix tools? With pure Ruby? Sure, we
16
+ can:
17
+
18
+ $ cat count.rb
19
+ require "zlib"
20
+
21
+ count = 0
22
+
23
+ Dir.glob("*.gz").sort.each do |file|
24
+ Zlib::GzipReader.open(file) do |io|
25
+ while block = io.read(4096)
26
+ count += block.size
27
+ end
28
+ end
29
+ end
30
+
31
+ puts count
32
+
33
+ Which indeed returns the correct answer:
34
+
35
+ $ time ruby count.rb
36
+ 860187300
37
+
38
+ real 0m5.687s
39
+ user 0m5.499s
40
+ sys 0m0.186s
41
+
42
+ But can we take advantage of both CPU's? Yes, we can. The plan
43
+ is to use ForkAndReturn's Enumerable#concurrent_collect()
44
+ instead of Enumerable#each(). But lets reconsider our code
45
+ first. First question: What's the most expensive part of the
46
+ code? Well, even without profiling, we can say that the
47
+ iteration over the blocks and the deflating of the compressed
48
+ archives are the most expensive. Can we concurrently run these
49
+ blocks of code on several CPU's? Well, in fact, we can't. In
50
+ the while block, we update a global counter. That's not gonna
51
+ work if we fork to separate processes. So, we get rid of the
52
+ local use of the global variable first:
53
+
54
+ $ cat count.rb
55
+ require "zlib"
56
+
57
+ count = 0
58
+
59
+ Dir.glob("*.gz").sort.collect do |file|
60
+ c = 0
61
+
62
+ Zlib::GzipReader.open(file) do |io|
63
+ while block = io.read(4096)
64
+ c += block.size
65
+ end
66
+ end
67
+
68
+ c
69
+ end.each do |c|
70
+ count += c
71
+ end
72
+
73
+ puts count
74
+
75
+ Which runs as fast as the previous version:
76
+
77
+ $ time ruby count.rb
78
+ 860187300
79
+
80
+ real 0m5.703s
81
+ user 0m5.515s
82
+ sys 0m0.218s
83
+
84
+ We can now run the local counts concurrently, by changing only
85
+ one word (and requiring the library):
86
+
87
+ $ cat count.rb
88
+ require "zlib"
89
+ require "forkandreturn"
90
+
91
+ count = 0
92
+
93
+ Dir.glob("*.gz").sort.concurrent_collect do |file|
94
+ c = 0
95
+
96
+ Zlib::GzipReader.open(file) do |io|
97
+ while block = io.read(4096)
98
+ c += block.size
99
+ end
100
+ end
101
+
102
+ c
103
+ end.each do |c|
104
+ count += c
105
+ end
106
+
107
+ puts count
108
+
109
+ $ time ruby count.rb
110
+ 860187300
111
+
112
+ real 0m3.860s
113
+ user 0m6.511s
114
+ sys 0m1.120s
115
+
116
+ Yep, it runs faster! 1.5x as fast! Not really doubling the
117
+ speed, but it's close enough...
118
+
119
+ But, since all parsing of all files is done concurrently,
120
+ aren't we exhausting our memory? And how about parsing
121
+ thousands of files? Can't we run just a couple of jobs
122
+ concurrently, instead of all of them? Sure. Just add one
123
+ parameter:
124
+
125
+ $ cat count.rb
126
+ require "zlib"
127
+ require "forkandreturn"
128
+
129
+ count = 0
130
+
131
+ Dir.glob("*.gz").sort.concurrent_collect do |file|
132
+ c = 0
133
+
134
+ Zlib::GzipReader.open(file) do |io|
135
+ while block = io.read(4096)
136
+ c += block.size
137
+ end
138
+ end
139
+
140
+ c
141
+ end.each do |c|
142
+ count += c
143
+ end
144
+
145
+ puts count
146
+
147
+ Et voila:
148
+
149
+ $ time ruby count.rb
150
+ 860187300
151
+
152
+ real 0m3.953s
153
+ user 0m6.436s
154
+ sys 0m1.309s
155
+
156
+ A bit of overhead, but friendlier to other running applications.
157
+
158
+ (BTW, the answer is 860,187,300 bytes.)
@@ -72,6 +72,9 @@ module ForkAndReturn
72
72
  # If you call RESULT-lambda, the result of the child process will be handled.
73
73
  # This means either "return the return value of the block" or "raise the exception"
74
74
  #
75
+ # at_exit blocks defined in the child itself will be executed in the child,
76
+ # whereas at_exit blocks defined in the parent won't be executed in the child.
77
+ #
75
78
  # <i>*args</i> is passed to the block.
76
79
 
77
80
  def self.fork_and_return_core(*args, &block)
@@ -80,18 +83,24 @@ module ForkAndReturn
80
83
  #begin
81
84
  pid =
82
85
  Process.fork do
86
+ at_exit do
87
+ $stdout.flush
88
+ $stderr.flush
89
+
90
+ Process.exit! # To avoid the execution of already defined at_exit handlers.
91
+ end
92
+
83
93
  begin
84
94
  ok, res = true, yield(*args)
85
95
  rescue
86
96
  ok, res = false, $!
87
97
  end
88
98
 
89
- File.open(file, "wb"){|f| Marshal.dump([ok, res], f)}
90
-
91
- $stdout.flush
92
- $stderr.flush
99
+ File.open(file, "wb") do |f|
100
+ f.chmod(0600)
93
101
 
94
- Process.exit! # To avoid the execution of at_exit handlers.
102
+ Marshal.dump([ok, res], f)
103
+ end
95
104
  end
96
105
  #rescue Errno::EAGAIN # Resource temporarily unavailable - fork(2)
97
106
  # Kernel.sleep 0.1
data/test/test.rb CHANGED
@@ -194,19 +194,26 @@ class ForkAndReturnEnumerableTest < Test::Unit::TestCase
194
194
  end
195
195
 
196
196
  def test_at_exit_handler
197
- data = 1..10
198
- block = lambda{}
199
- file = "/tmp/FORK_AND_RETURN_TEST"
197
+ file1 = "/tmp/FORK_AND_RETURN_TEST_1"
198
+ file2 = "/tmp/FORK_AND_RETURN_TEST_2"
200
199
 
201
- File.delete(file) if File.file?(file)
200
+ File.delete(file1) if File.file?(file1)
201
+ File.delete(file2) if File.file?(file2)
202
202
 
203
- at_exit do
204
- File.open(file, "w"){|f| f.write "some data"}
203
+ at_exit do # Should not be executed.
204
+ File.open(file1, "w"){}
205
205
  end
206
206
 
207
- data.concurrent_collect(&block)
207
+ ForkAndReturn.fork_and_return do
208
+ at_exit do # Should be executed.
209
+ File.open(file2, "w"){}
210
+ end
211
+
212
+ nil
213
+ end
208
214
 
209
- assert(! File.file?(file))
215
+ assert(!File.file?(file1))
216
+ assert(File.file?(file2))
210
217
  end
211
218
 
212
219
  def test_clustered_concurrent_collect
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: forkandreturn
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.0
4
+ version: 0.1.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - Erik Veenstra
@@ -9,7 +9,7 @@ autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
11
 
12
- date: 2008-07-12 00:00:00 +02:00
12
+ date: 2008-07-19 00:00:00 +02:00
13
13
  default_executable:
14
14
  dependencies:
15
15
  - !ruby/object:Gem::Dependency
@@ -39,6 +39,8 @@ files:
39
39
  - README
40
40
  - LICENSE
41
41
  - VERSION
42
+ - CHANGELOG
43
+ - example.txt
42
44
  has_rdoc: true
43
45
  homepage: http://www.erikveen.dds.nl/forkandreturn/index.html
44
46
  post_install_message:
@@ -46,8 +48,10 @@ rdoc_options:
46
48
  - README
47
49
  - LICENSE
48
50
  - VERSION
51
+ - CHANGELOG
52
+ - example.txt
49
53
  - --title
50
- - forkandreturn (0.1.0)
54
+ - forkandreturn (0.1.1)
51
55
  - --main
52
56
  - README
53
57
  require_paths: