forkandreturn 0.1.0 → 0.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/CHANGELOG ADDED
@@ -0,0 +1,16 @@
1
+ 0.1.1 (19-07-2008)
2
+
3
+ * Added example.txt.
4
+
5
+ * at_exit blocks defined in the child itself will be executed
6
+ in the child, whereas at_exit blocks defined in the parent
7
+ won't be executed in the child. (In the previous version, the
8
+ at_exit blocks defined in the child weren't be executed at
9
+ all.)
10
+
11
+ * Added File#chmod(0600), so the temporary file with the
12
+ intermediate results can't be read by other people.
13
+
14
+ 0.1.0 (13-07-2008)
15
+
16
+ * First release.
data/README CHANGED
@@ -3,9 +3,63 @@ running a block of code in a subprocess. The result (Ruby
3
3
  object or exception) of the block will be available in the
4
4
  parent process.
5
5
 
6
+ The intermediate return value (or exception) will be
7
+ Marshal'led to disk. This means that it is possible to
8
+ (concurrently) run thousands of child process, with a relative
9
+ low memory footprint. Just gather the results once all child
10
+ process are done. ForkAndReturn will handle the writing,
11
+ reading and deleting of the temporary file.
12
+
13
+ The core of these methods is fork_and_return_core(). It returns
14
+ some nested lambdas, which are handled by the other methods and
15
+ by Enumerable#concurrent_collect(). These lambdas handle the
16
+ WAITing, LOADing and RESULTing (explained in
17
+ fork_and_return_core()).
18
+
19
+ The child process exits with Process.exit!(), so at_exit()
20
+ blocks are skipped in the child process. However, both $stdout
21
+ and $stderr will be flushed.
22
+
23
+ Only Marshal'lable Ruby objects can be returned.
24
+
6
25
  ForkAndReturn uses Process.fork(), so it only runs on platforms
7
26
  where Process.fork() is implemented.
8
27
 
9
- ForkAndReturn implements the low level stuff. Enumerable is
10
- enriched with some methods which should be used instead of
11
- ForkAndReturn under normal circumstances.
28
+ Example: (See example.txt for another example.)
29
+
30
+ [1, 2, 3, 4].collect do |object|
31
+ Thread.fork do
32
+ ForkAndReturn.fork_and_return do
33
+ 2*object
34
+ end
35
+ end
36
+ end.collect do |thread|
37
+ thread.value
38
+ end # ===> [2, 4, 6, 8]
39
+
40
+ This runs each "2*object" in a seperate process. Hopefully, the
41
+ processes are spread over all available CPU's. That's a simple
42
+ way of parallel processing! Although
43
+ Enumerable#concurrent_collect() is even simpler:
44
+
45
+ [1, 2, 3, 4].concurrent_collect do |object|
46
+ 2*object
47
+ end # ===> [2, 4, 6, 8]
48
+
49
+ Note that the code in the block is run in a seperate process,
50
+ so updating objects and variables in the block won't affect the
51
+ parent process:
52
+
53
+ count = 0
54
+ [...].concurrent_collect do
55
+ count += 1
56
+ end
57
+ count # ==> 0
58
+
59
+ Enuemerable#concurrent_collect() is suitable for handling a
60
+ couple of very CPU intensive jobs, like parsing large XML files.
61
+
62
+ Enuemerable#clustered_concurrent_collect() is suitable for
63
+ handling a lot of not too CPU intensive jobs. The situations
64
+ where the overhead of forking is too expensive, but where you
65
+ still want to use all available CPU's.
data/VERSION CHANGED
@@ -1 +1 @@
1
- 0.1.0
1
+ 0.1.1
data/example.txt ADDED
@@ -0,0 +1,158 @@
1
+ AN EXAMPLE OF MULTICORE-PROGRAMMING WITH FORKANDRETURN
2
+
3
+ We've got 42 GZ-files. Compressed, that's 44,093,076 bytes. The
4
+ question is: How much bytes is it if we decompress it?
5
+
6
+ We could do this, if we're on a Unix machine:
7
+
8
+ $ time zcat *.gz | wc -c
9
+ 860187300
10
+
11
+ real 0m7.266s
12
+ user 0m5.810s
13
+ sys 0m3.341s
14
+
15
+ Can we do it without the Unix tools? With pure Ruby? Sure, we
16
+ can:
17
+
18
+ $ cat count.rb
19
+ require "zlib"
20
+
21
+ count = 0
22
+
23
+ Dir.glob("*.gz").sort.each do |file|
24
+ Zlib::GzipReader.open(file) do |io|
25
+ while block = io.read(4096)
26
+ count += block.size
27
+ end
28
+ end
29
+ end
30
+
31
+ puts count
32
+
33
+ Which indeed returns the correct answer:
34
+
35
+ $ time ruby count.rb
36
+ 860187300
37
+
38
+ real 0m5.687s
39
+ user 0m5.499s
40
+ sys 0m0.186s
41
+
42
+ But can we take advantage of both CPU's? Yes, we can. The plan
43
+ is to use ForkAndReturn's Enumerable#concurrent_collect()
44
+ instead of Enumerable#each(). But lets reconsider our code
45
+ first. First question: What's the most expensive part of the
46
+ code? Well, even without profiling, we can say that the
47
+ iteration over the blocks and the deflating of the compressed
48
+ archives are the most expensive. Can we concurrently run these
49
+ blocks of code on several CPU's? Well, in fact, we can't. In
50
+ the while block, we update a global counter. That's not gonna
51
+ work if we fork to separate processes. So, we get rid of the
52
+ local use of the global variable first:
53
+
54
+ $ cat count.rb
55
+ require "zlib"
56
+
57
+ count = 0
58
+
59
+ Dir.glob("*.gz").sort.collect do |file|
60
+ c = 0
61
+
62
+ Zlib::GzipReader.open(file) do |io|
63
+ while block = io.read(4096)
64
+ c += block.size
65
+ end
66
+ end
67
+
68
+ c
69
+ end.each do |c|
70
+ count += c
71
+ end
72
+
73
+ puts count
74
+
75
+ Which runs as fast as the previous version:
76
+
77
+ $ time ruby count.rb
78
+ 860187300
79
+
80
+ real 0m5.703s
81
+ user 0m5.515s
82
+ sys 0m0.218s
83
+
84
+ We can now run the local counts concurrently, by changing only
85
+ one word (and requiring the library):
86
+
87
+ $ cat count.rb
88
+ require "zlib"
89
+ require "forkandreturn"
90
+
91
+ count = 0
92
+
93
+ Dir.glob("*.gz").sort.concurrent_collect do |file|
94
+ c = 0
95
+
96
+ Zlib::GzipReader.open(file) do |io|
97
+ while block = io.read(4096)
98
+ c += block.size
99
+ end
100
+ end
101
+
102
+ c
103
+ end.each do |c|
104
+ count += c
105
+ end
106
+
107
+ puts count
108
+
109
+ $ time ruby count.rb
110
+ 860187300
111
+
112
+ real 0m3.860s
113
+ user 0m6.511s
114
+ sys 0m1.120s
115
+
116
+ Yep, it runs faster! 1.5x as fast! Not really doubling the
117
+ speed, but it's close enough...
118
+
119
+ But, since all parsing of all files is done concurrently,
120
+ aren't we exhausting our memory? And how about parsing
121
+ thousands of files? Can't we run just a couple of jobs
122
+ concurrently, instead of all of them? Sure. Just add one
123
+ parameter:
124
+
125
+ $ cat count.rb
126
+ require "zlib"
127
+ require "forkandreturn"
128
+
129
+ count = 0
130
+
131
+ Dir.glob("*.gz").sort.concurrent_collect do |file|
132
+ c = 0
133
+
134
+ Zlib::GzipReader.open(file) do |io|
135
+ while block = io.read(4096)
136
+ c += block.size
137
+ end
138
+ end
139
+
140
+ c
141
+ end.each do |c|
142
+ count += c
143
+ end
144
+
145
+ puts count
146
+
147
+ Et voila:
148
+
149
+ $ time ruby count.rb
150
+ 860187300
151
+
152
+ real 0m3.953s
153
+ user 0m6.436s
154
+ sys 0m1.309s
155
+
156
+ A bit of overhead, but friendlier to other running applications.
157
+
158
+ (BTW, the answer is 860,187,300 bytes.)
@@ -72,6 +72,9 @@ module ForkAndReturn
72
72
  # If you call RESULT-lambda, the result of the child process will be handled.
73
73
  # This means either "return the return value of the block" or "raise the exception"
74
74
  #
75
+ # at_exit blocks defined in the child itself will be executed in the child,
76
+ # whereas at_exit blocks defined in the parent won't be executed in the child.
77
+ #
75
78
  # <i>*args</i> is passed to the block.
76
79
 
77
80
  def self.fork_and_return_core(*args, &block)
@@ -80,18 +83,24 @@ module ForkAndReturn
80
83
  #begin
81
84
  pid =
82
85
  Process.fork do
86
+ at_exit do
87
+ $stdout.flush
88
+ $stderr.flush
89
+
90
+ Process.exit! # To avoid the execution of already defined at_exit handlers.
91
+ end
92
+
83
93
  begin
84
94
  ok, res = true, yield(*args)
85
95
  rescue
86
96
  ok, res = false, $!
87
97
  end
88
98
 
89
- File.open(file, "wb"){|f| Marshal.dump([ok, res], f)}
90
-
91
- $stdout.flush
92
- $stderr.flush
99
+ File.open(file, "wb") do |f|
100
+ f.chmod(0600)
93
101
 
94
- Process.exit! # To avoid the execution of at_exit handlers.
102
+ Marshal.dump([ok, res], f)
103
+ end
95
104
  end
96
105
  #rescue Errno::EAGAIN # Resource temporarily unavailable - fork(2)
97
106
  # Kernel.sleep 0.1
data/test/test.rb CHANGED
@@ -194,19 +194,26 @@ class ForkAndReturnEnumerableTest < Test::Unit::TestCase
194
194
  end
195
195
 
196
196
  def test_at_exit_handler
197
- data = 1..10
198
- block = lambda{}
199
- file = "/tmp/FORK_AND_RETURN_TEST"
197
+ file1 = "/tmp/FORK_AND_RETURN_TEST_1"
198
+ file2 = "/tmp/FORK_AND_RETURN_TEST_2"
200
199
 
201
- File.delete(file) if File.file?(file)
200
+ File.delete(file1) if File.file?(file1)
201
+ File.delete(file2) if File.file?(file2)
202
202
 
203
- at_exit do
204
- File.open(file, "w"){|f| f.write "some data"}
203
+ at_exit do # Should not be executed.
204
+ File.open(file1, "w"){}
205
205
  end
206
206
 
207
- data.concurrent_collect(&block)
207
+ ForkAndReturn.fork_and_return do
208
+ at_exit do # Should be executed.
209
+ File.open(file2, "w"){}
210
+ end
211
+
212
+ nil
213
+ end
208
214
 
209
- assert(! File.file?(file))
215
+ assert(!File.file?(file1))
216
+ assert(File.file?(file2))
210
217
  end
211
218
 
212
219
  def test_clustered_concurrent_collect
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: forkandreturn
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.0
4
+ version: 0.1.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - Erik Veenstra
@@ -9,7 +9,7 @@ autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
11
 
12
- date: 2008-07-12 00:00:00 +02:00
12
+ date: 2008-07-19 00:00:00 +02:00
13
13
  default_executable:
14
14
  dependencies:
15
15
  - !ruby/object:Gem::Dependency
@@ -39,6 +39,8 @@ files:
39
39
  - README
40
40
  - LICENSE
41
41
  - VERSION
42
+ - CHANGELOG
43
+ - example.txt
42
44
  has_rdoc: true
43
45
  homepage: http://www.erikveen.dds.nl/forkandreturn/index.html
44
46
  post_install_message:
@@ -46,8 +48,10 @@ rdoc_options:
46
48
  - README
47
49
  - LICENSE
48
50
  - VERSION
51
+ - CHANGELOG
52
+ - example.txt
49
53
  - --title
50
- - forkandreturn (0.1.0)
54
+ - forkandreturn (0.1.1)
51
55
  - --main
52
56
  - README
53
57
  require_paths: