RubyGems - forkandreturn - Versions diffs - 0.1.0 → 0.1.1 - Mend

forkandreturn 0.1.0 → 0.1.1

Files changed (7) hide show

data/CHANGELOG +16 -0
data/README +57 -3
data/VERSION +1 -1
data/example.txt +158 -0
data/lib/forkandreturn/forkandreturn.rb +14 -5
data/test/test.rb +15 -8
metadata +7 -3

data/CHANGELOG ADDED Viewed

@@ -0,0 +1,16 @@
+0.1.1 (19-07-2008)
+* Added example.txt.
+* at_exit blocks defined in the child itself will be executed
+  in the child, whereas at_exit blocks defined in the parent
+  won't be executed in the child. (In the previous version, the
+  at_exit blocks defined in the child weren't be executed at
+  all.)
+* Added File#chmod(0600), so the temporary file with the
+  intermediate results can't be read by other people.
+0.1.0 (13-07-2008)
+* First release.

data/README CHANGED Viewed

@@ -3,9 +3,63 @@ running a block of code in a subprocess. The result (Ruby
 object or exception) of the block will be available in the
 parent process.
+The intermediate return value (or exception) will be
+Marshal'led to disk. This means that it is possible to
+(concurrently) run thousands of child process, with a relative
+low memory footprint. Just gather the results once all child
+process are done. ForkAndReturn will handle the writing,
+reading and deleting of the temporary file.
+The core of these methods is fork_and_return_core(). It returns
+some nested lambdas, which are handled by the other methods and
+by Enumerable#concurrent_collect(). These lambdas handle the
+WAITing, LOADing and RESULTing (explained in
+fork_and_return_core()).
+The child process exits with Process.exit!(), so at_exit()
+blocks are skipped in the child process. However, both $stdout
+and $stderr will be flushed.
+Only Marshal'lable Ruby objects can be returned.
 ForkAndReturn uses Process.fork(), so it only runs on platforms
 where Process.fork() is implemented.
-ForkAndReturn implements the low level stuff. Enumerable is
-enriched with some methods which should be used instead of
-ForkAndReturn under normal circumstances.
+Example: (See example.txt for another example.)
+ [1, 2, 3, 4].collect do |object|
+   Thread.fork do
+     ForkAndReturn.fork_and_return do
+       2*object
+     end
+   end
+ end.collect do |thread|
+   thread.value
+ end   # ===> [2, 4, 6, 8]
+This runs each "2*object" in a seperate process. Hopefully, the
+processes are spread over all available CPU's. That's a simple
+way of parallel processing! Although
+Enumerable#concurrent_collect() is even simpler:
+ [1, 2, 3, 4].concurrent_collect do |object|
+   2*object
+ end   # ===> [2, 4, 6, 8]
+Note that the code in the block is run in a seperate process,
+so updating objects and variables in the block won't affect the
+parent process:
+ count = 0
+ [...].concurrent_collect do
+   count += 1
+ end
+ count   # ==> 0
+Enuemerable#concurrent_collect() is suitable for handling a
+couple of very CPU intensive jobs, like parsing large XML files.
+Enuemerable#clustered_concurrent_collect() is suitable for
+handling a lot of not too CPU intensive jobs. The situations
+where the overhead of forking is too expensive, but where you
+still want to use all available CPU's.

data/VERSION CHANGED Viewed

	@@ -1 +1 @@
1	- 0.1.0
1	+ 0.1.1

data/example.txt ADDED Viewed

@@ -0,0 +1,158 @@
+AN EXAMPLE OF MULTICORE-PROGRAMMING WITH FORKANDRETURN
+We've got 42 GZ-files. Compressed, that's 44,093,076 bytes. The
+question is: How much bytes is it if we decompress it?
+We could do this, if we're on a Unix machine:
+ $ time zcat *.gz | wc -c
+ 860187300
+ real    0m7.266s
+ user    0m5.810s
+ sys     0m3.341s
+Can we do it without the Unix tools? With pure Ruby? Sure, we
+can:
+ $ cat count.rb
+ require "zlib"
+ count   = 0
+ Dir.glob("*.gz").sort.each do |file|
+   Zlib::GzipReader.open(file) do |io|
+     while block = io.read(4096)
+       count += block.size
+     end
+   end
+ end
+ puts count
+Which indeed returns the correct answer:
+ $ time ruby count.rb
+ 860187300
+ real    0m5.687s
+ user    0m5.499s
+ sys     0m0.186s
+But can we take advantage of both CPU's? Yes, we can. The plan
+is to use ForkAndReturn's Enumerable#concurrent_collect()
+instead of Enumerable#each(). But lets reconsider our code
+first. First question: What's the most expensive part of the
+code? Well, even without profiling, we can say that the
+iteration over the blocks and the deflating of the compressed
+archives are the most expensive. Can we concurrently run these
+blocks of code on several CPU's? Well, in fact, we can't. In
+the while block, we update a global counter. That's not gonna
+work if we fork to separate processes. So, we get rid of the
+local use of the global variable first:
+ $ cat count.rb
+ require "zlib"
+ count   = 0
+ Dir.glob("*.gz").sort.collect do |file|
+   c     = 0
+   Zlib::GzipReader.open(file) do |io|
+     while block = io.read(4096)
+       c += block.size
+     end
+   end
+   c
+ end.each do |c|
+   count += c
+ end
+ puts count
+Which runs as fast as the previous version:
+ $ time ruby count.rb
+ 860187300
+ real    0m5.703s
+ user    0m5.515s
+ sys     0m0.218s
+We can now run the local counts concurrently, by changing only
+one word (and requiring the library):
+ $ cat count.rb
+ require "zlib"
+ require "forkandreturn"
+ count   = 0
+ Dir.glob("*.gz").sort.concurrent_collect do |file|
+   c     = 0
+   Zlib::GzipReader.open(file) do |io|
+     while block = io.read(4096)
+       c += block.size
+     end
+   end
+   c
+ end.each do |c|
+   count += c
+ end
+ puts count
+ $ time ruby count.rb
+ 860187300
+ real    0m3.860s
+ user    0m6.511s
+ sys     0m1.120s
+Yep, it runs faster! 1.5x as fast! Not really doubling the
+speed, but it's close enough...
+But, since all parsing of all files is done concurrently,
+aren't we exhausting our memory? And how about parsing
+thousands of files? Can't we run just a couple of jobs
+concurrently, instead of all of them? Sure. Just add one
+parameter:
+ $ cat count.rb
+ require "zlib"
+ require "forkandreturn"
+ count   = 0
+ Dir.glob("*.gz").sort.concurrent_collect do |file|
+   c     = 0
+   Zlib::GzipReader.open(file) do |io|
+     while block = io.read(4096)
+       c += block.size
+     end
+   end
+   c
+ end.each do |c|
+   count += c
+ end
+ puts count
+Et voila:
+ $ time ruby count.rb
+ 860187300
+ real    0m3.953s
+ user    0m6.436s
+ sys     0m1.309s
+A bit of overhead, but friendlier to other running applications.
+(BTW, the answer is 860,187,300 bytes.)

data/lib/forkandreturn/forkandreturn.rb CHANGED Viewed

@@ -72,6 +72,9 @@ module ForkAndReturn
   # If you call RESULT-lambda, the result of the child process will be handled.
   # This means either "return the return value of the block" or "raise the exception"
   #
+  # at_exit blocks defined in the child itself will be executed in the child,
+  # whereas at_exit blocks defined in the parent won't be executed in the child.
+  #
   # <i>*args</i> is passed to the block.
   def self.fork_and_return_core(*args, &block)
@@ -80,18 +83,24 @@ module ForkAndReturn
     #begin
       pid =
       Process.fork do
+        at_exit do
+          $stdout.flush
+          $stderr.flush
+          Process.exit!	# To avoid the execution of already defined at_exit handlers.
+        end
         begin
           ok, res	= true, yield(*args)
         rescue
           ok, res	= false, $!
         end
-        File.open(file, "wb"){|f| Marshal.dump([ok, res], f)}
-        $stdout.flush
-        $stderr.flush
+        File.open(file, "wb") do |f|
+          f.chmod(0600)
-        Process.exit!		# To avoid the execution of at_exit handlers.
+          Marshal.dump([ok, res], f)
+        end
       end
     #rescue Errno::EAGAIN	# Resource temporarily unavailable - fork(2)
     #  Kernel.sleep 0.1

data/test/test.rb CHANGED Viewed

@@ -194,19 +194,26 @@ class ForkAndReturnEnumerableTest < Test::Unit::TestCase
   end
   def test_at_exit_handler
-    data	= 1..10
-    block	= lambda{}
-    file	= "/tmp/FORK_AND_RETURN_TEST"
+    file1	= "/tmp/FORK_AND_RETURN_TEST_1"
+    file2	= "/tmp/FORK_AND_RETURN_TEST_2"
-    File.delete(file)	if File.file?(file)
+    File.delete(file1)	if File.file?(file1)
+    File.delete(file2)	if File.file?(file2)
-    at_exit do
-      File.open(file, "w"){|f| f.write "some data"}
+    at_exit do		# Should not be executed.
+      File.open(file1, "w"){}
     end
-    data.concurrent_collect(&block)
+    ForkAndReturn.fork_and_return do
+      at_exit do	# Should be executed.
+        File.open(file2, "w"){}
+      end
+      nil
+    end
-    assert(! File.file?(file))
+    assert(!File.file?(file1))
+    assert(File.file?(file2))
   end
   def test_clustered_concurrent_collect

metadata CHANGED Viewed

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: forkandreturn
 version: !ruby/object:Gem::Version
-  version: 0.1.0
+  version: 0.1.1
 platform: ruby
 authors:
 - Erik Veenstra
@@ -9,7 +9,7 @@ autorequire:
 bindir: bin
 cert_chain: []
-date: 2008-07-12 00:00:00 +02:00
+date: 2008-07-19 00:00:00 +02:00
 default_executable:
 dependencies:
 - !ruby/object:Gem::Dependency
@@ -39,6 +39,8 @@ files:
 - README
 - LICENSE
 - VERSION
+- CHANGELOG
+- example.txt
 has_rdoc: true
 homepage: http://www.erikveen.dds.nl/forkandreturn/index.html
 post_install_message:
@@ -46,8 +48,10 @@ rdoc_options:
 - README
 - LICENSE
 - VERSION
+- CHANGELOG
+- example.txt
 - --title
-- forkandreturn (0.1.0)
+- forkandreturn (0.1.1)
 - --main
 - README
 require_paths: