RubyGems - spool_pool - Versions diffs - 0.2.1 - Mend

spool_pool 0.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (14) hide show

data/History.rdoc +15 -0
data/LICENSE.txt +27 -0
data/README.rdoc +53 -0
data/TODOs +1 -0
data/lib/spool_pool.rb +53 -0
data/lib/spool_pool/file.rb +161 -0
data/lib/spool_pool/pool.rb +172 -0
data/lib/spool_pool/spool.rb +113 -0
data/scripts/perf_test.rb +16 -0
data/spec/spec_helper.rb +10 -0
data/spec/spool_pool/file_spec.rb +119 -0
data/spec/spool_pool/pool_spec.rb +492 -0
data/spec/spool_pool/spool_spec.rb +245 -0
metadata +68 -0

data/History.rdoc ADDED

@@ -0,0 +1,15 @@
+= 0.2.1
+* Add upfront sanity checking of the pools directory environment and permissions
+= 0.2
+* Add safe behaviour for get: supply operations in a block; the spoolfile
+  only gets deleted if the block completes without an exception
+* Include into SpoolPool::File adapted Tempfile code from the ruby stdlib,
+  resulting in ~5x speed improvement for put operations
+* Change the naming scheme of the spool files
+* Sort files by name, not by ctime
+* Cache sorted list of spooled files, resulting in a massive speed up for
+  get/flush operations (10000 files took about 14000 seconds, now 4.4 seconds)
+= 0.1
+* First version with a basic implementation of all core features

data/LICENSE.txt ADDED

@@ -0,0 +1,27 @@
+Copyright 2010 Sven Riedel
+All rights reserved.
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions
+are met:
+1. Redistributions of source code must retain the above copyright
+   notice, this list of conditions and the following disclaimer.
+2. Redistributions in binary form must reproduce the above copyright
+   notice, this list of conditions and the following disclaimer in the
+   documentation and/or other materials provided with the distribution.
+3. Neither the names of the authors nor the names of their contributors
+   may be used to endorse or promote products derived from this software
+   without specific prior written permission.
+THIS SOFTWARE IS PROVIDED BY THE AUTHORS ``AS IS'' AND ANY EXPRESS
+OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHORS OR CONTRIBUTORS BE
+LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY,
+OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT
+OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
+BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
+WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
+OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE,
+EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

data/README.rdoc ADDED

@@ -0,0 +1,53 @@
+= Introduction
+This is a simple implementation of a file spooler. You can think of it as
+a filesystem based queueing service without a service running behind it.
+Like the spools used in unix for mail servers, print jobs, etc.
+In this module, a Pool instance can contain several different Spool instances,
+each of which can store files. Data is retrieved from the Spool in a
+non-strict order, oldest first.
+Data is serialized and deserialized on storage/retrieval (currently using
+YAML).
+Most users will want to start using this library by instantiating a Pool
+object, pointing it to a directory that will act as the parent directory
+for all subsequent Spools.
+= Note
+This library has currently only been tested with Ruby 1.9.1. It uses Pathname
+extensively, and while it might work with Ruby 1.8.7, it probably will not
+work with Ruby 1.8.6 and smaller.
+= Usage Example
+# instatiate a pool, pointing to a directory with
+# read/write permissions for the effective user of
+# the current process
+require 'spool_pool'
+pool = SpoolPool::Pool.new( "/path/to/my/spool/root" )
+# store data in one spool
+pool.put :my_spool, "some data here"
+# retrieve the data
+pool.get :my_spool
+# -> "some data here"
+# store data in another spool,
+# demonstrating the ordered retrieval
+pool.put :my_other_spool, :foo
+sleep 1
+spool.put :my_other_spool, :bar
+spool.get :my_other_spool # -> :foo
+spool.get :my_other_spool # -> :bar
+= Feedback/Suggestions
+By email to: sr@gimp.org

data/TODOs ADDED

	@@ -0,0 +1 @@
1	+ - clean up specs

data/lib/spool_pool.rb ADDED

@@ -0,0 +1,53 @@
+=begin rdoc
+= Introduction
+This is a simple implementation of a file spooler. You can think of it as
+a filesystem based queueing service without a service running behind it.
+Like the spools used in unix for mail servers, print jobs etc.
+In this module, a Pool instance can contain several different Spool instances,
+each of which can store files. Data is retrieved from the spool in a
+non-strict order, oldest first.
+Data is serialized and deserialized on storage/retrieval (currently using
+YAML).
+Most users will want to start using this library by instantiating a Pool
+object, pointing it to a directory that will act as the parent directory
+for all subsequent Spools.
+= Usage Example
+# instatiate a pool, pointing to a directory with read/write permissions
+# for the effective user of the current process
+require 'spool_pool'
+pool = SpoolPool::Pool.new( "/path/to/my/spool/root" )
+# store data in one spool
+pool.put :my_spool, "some data here"
+# retrieve the data
+pool.get :my_spool
+# -> "some data here"
+# store data in another spool, demonstrating the ordered retrieval
+pool.put :my_other_spool, :foo
+sleep 1
+spool.put :my_other_spool, :bar
+spool.get :my_other_spool
+# -> :foo
+spool.get :my_other_spool
+# -> :bar
+=end
+module SpoolPool
+end
+$: << File.expand_path( File.dirname( __FILE__ ) )
+require 'spool_pool/pool'
+require 'spool_pool/spool'
+require 'spool_pool/file'

data/lib/spool_pool/file.rb ADDED

@@ -0,0 +1,161 @@
+require 'tempfile'
+require 'delegate'
+require 'tmpdir'
+require 'thread'
+module SpoolPool
+=begin rdoc
+A class to deal with the writing of spool files. Currently uses Tempfile
+to do most of the heavy lifting.
+Most of this file has been adapted from the Tempfile code in the Ruby 1.9.1
+class library, written by yugui.
+=end
+  class File < DelegateClass( ::File )
+    attr_reader :path
+=begin rdoc
+  Returns the data read from the given +filename+, and deletes the file
+  before returning.
+  Yields the read data also to an optionally given block. If you give a block
+  to process your data and your code throws an exception, the file will not
+  be deleted and another processing of the data can be attempted in the
+  future.
+=end
+    def self.safe_read( filename )
+      data = ::File.read( filename )
+      yield data if block_given?
+      ::File.unlink( filename )
+      data
+    end
+=begin rdoc
+Stores the given +data+ in a unique file in the directory +basepath+.
++basepath+ can be either a file path as a String or a Pathname.
+If the data can't be written to the file (permissions, quota, I/O errors...),
+it will attempt to delete the file before throwing an exception.
+Returns the path of the file storing the data.
+=end
+    def self.write( basepath, data )
+      file = nil
+      begin
+        file = new( basepath.to_s )
+        file.write data
+      rescue
+        file.unlink if file
+        raise $!
+      else
+        file.path
+      ensure
+        file.close
+      end
+    end
+    # If no block is given, this is a synonym for new().
+    #
+    # If a block is given, it will be passed the spool file as an argument,
+    # and the spool file will automatically be closed when the block
+    # terminates.  The call returns the value of the block.
+    def self.open(*args)
+      file = new(*args)
+      return file unless block_given?
+      begin
+        yield(file)
+      ensure
+        file.close
+      end
+    end
+    MAX_TRY = 10
+    FILE_PERMISSIONS = 0600
+    @@lock = Mutex.new
+    # Creates a spool file of mode 0600 in the directory +basedir+,
+    # opens it with mode "w+", and returns a SpoolPool::File object which
+    # represents the created spool file.  A SpoolPool::File object can be
+    # treated just like a normal File object.
+    #
+    def initialize( basedir )
+      create_threadsafe_spoolname( basedir ) do |spoolname|
+        @spoolfile = ::File.open( spoolname,
+                                  ::File::RDWR | ::File::CREAT | ::File::EXCL,
+                                  FILE_PERMISSIONS )
+        @path = spoolname
+        super(@spoolfile)
+        # Now we have all the File/IO methods defined, you must not
+        # carelessly put bare puts(), etc. after this.
+      end
+    end
+    # Opens or reopens the file with mode "r+".
+    def open
+      @spoolfile.close if @spoolfile
+      @spoolfile = ::File.open(@path, 'r+')
+      __setobj__(@spoolfile)
+    end
+    #Closes the file.
+    def close
+      @spoolfile.close if @spoolfile
+      @spoolfile = nil
+    end
+    # Unlinks the file.
+    def unlink
+      # keep this order for thread safeness
+      begin
+        if ::File.exist?(@path)
+          close unless closed?
+          ::File.unlink(@path)
+        end
+        @path = nil
+      rescue Errno::EACCES
+        # may not be able to unlink on Windows; just ignore
+      end
+    end
+    # Returns the size of the file.  As a side effect, the IO
+    # buffer is flushed before determining the size.
+    def size
+      return 0 unless @spoolfile
+      @spoolfile.flush
+      @spoolfile.stat.size
+    end
+    alias length size
+    private
+    def spoolfilename_for_try(n)
+      "#{Time.now.to_f}-#{$$}-#{n}"
+    end
+    def create_threadsafe_spoolname( basedir )
+      lock = spoolname = nil
+      n = failure = 0
+      @@lock.synchronize {
+        begin
+          begin
+            spoolname = ::File.join( basedir, spoolfilename_for_try(n) )
+            lock = spoolname + '.lock'
+            n += 1
+          end while ::File.exist?(lock) or ::File.exist?(spoolname)
+          Dir.mkdir(lock)
+        rescue
+          failure += 1
+          retry if failure < MAX_TRY
+          raise "cannot generate spool file `%s': #{$!}" % spoolname
+        end
+      }
+      yield spoolname
+      Dir.rmdir(lock)
+    end
+  end
+end

data/lib/spool_pool/pool.rb ADDED

@@ -0,0 +1,172 @@
+require 'pathname'
+require 'spool_pool/spool'
+module SpoolPool
+=begin rdoc
+This is a container class used to manage the interaction with the
+individual Spool instances. Spool directories are created using the name
+given in the put/get methods on demand as subdirectories of the +spool_dir+
+passed to the initializer..
+= Security Note
+Some naive tests are in place to catch the most blatant directory traversal
+attempts. But for real security you should never blindly pass any
+user-supplied or computed queue name to these methods. Always validate
+user input!
+=end
+  class Pool
+    attr_reader :spool_dir
+    attr_reader :spools
+=begin rdoc
+  Sanity checking of the given pool +directory+ and it's children (and parent,
+  if the +directory+ itself doesn't exist yet).
+  Will throw an exception if anything permission-wise looks fishy.
+=end
+    def self.validate_pool_dir( directory )
+      pool_dir = Pathname.new( directory )
+      begin
+        if !pool_dir.exist?
+          raise Errno::EACCES unless pool_dir.parent.writable? and
+                                     pool_dir.parent.executable?
+          return
+        end
+        raise Errno::EACCES unless pool_dir.readable? and
+                                   pool_dir.writable? and
+                                   pool_dir.executable?
+        return if pool_dir.children.empty?
+        pool_dir.children.select{ |d| d.dir? }.each do |spool_dir|
+          raise Errno::EACCES unless spool_dir.readable? and
+                                     spool_dir.writable? and
+                                     spool_dir.executable?
+          spool_dir.children.select{ |f| f.file? }.each do |spool_file|
+            raise Errno::EACCES unless spool_file.readable?
+          end
+        end
+      rescue Errno::EACCES
+        raise Errno::EACCES.new( "Something doesn't look right permission wise. Consider running 'chmod -R 0755 #{directory}' or the equivalent. If the #{directory} itself doesn't exist, check to make sure it's parent exists, and is write- and executable for the current process owner." )
+      end
+    end
+=begin rdoc
+Sets up a spooling pool in the +spool_path+ given.
+If the directory does not exist, it will try to create it for you.
+Will throw an exception if it can't create the directoy, or if the
+directory exists and is not read- and writeable by the effective user id
+of the process.
+=end
+    def initialize( spool_path )
+      @spool_dir = Pathname.new spool_path
+      @spools = {}
+      self.class.validate_pool_dir( spool_path )
+      setup_spooldir unless @spool_dir.exist?
+      assert_readable @spool_dir
+      assert_writeable @spool_dir
+    end
+=begin rdoc
+Serializes and stores the +data+ in the given +spool+. If the +spool+
+doesn't exist yet, it will try to create a new spool and directory.
+Returns the path of the file storing the data.
+This method performs a naive check on the spool name for directory
+traversal attempts. *DO NOT* rely on this for security relevant systems,
+always validate user supplied queue names yourself before handing them
+off to this method!
+=end
+    def put( spool, data )
+      validate_spool_path spool
+      @spools[spool] ||= SpoolPool::Spool.new( @spool_dir + spool.to_s )
+      @spools[spool].put( data )
+    end
+=begin rdoc
+Retrieves and deserializes oldest data in the given +spool+, yielding it to
+an optional block as well. The spool file is deleted just before the method
+returns. If a block was given, and an exception was raised within the block,
+the spool file is not deleted and another try at processing can be attempted
+in the future.
+Note that while data is retrieved oldest first, the order is non-strict, i.e.
+different data written during the same second to the storage will be
+retrieved in a random order. Or to put it another way: Ordering is exact down
+to the second, but sub-second ordering is random.
+This method performs a naive check on the spool name for directory
+traversal attempts. *DO NOT* rely on this for security relevant systems,
+always validate user supplied queue names yourself before handing them
+off to this method!
+=end
+    def get( spool, &block )
+      validate_spool_path spool
+      missing_spool_on_read_handler( spool ) unless @spools.has_key?( spool )
+      data = nil
+      data = @spools[spool].get( &block ) if @spools[spool]
+      data
+    end
+=begin rdoc
+Retrieves and deserializes all data in the given +spool+, yielding
+each deserialized data to the supplied block. Ordering is oldest data first.
+Note that while data is retrieved oldest first, the order is non-strict, i.e.
+different data written during the same second to the storage will be
+retrieved in a random order. Or to put it another way: Ordering is
+exact down to the second, but sub-second ordering is random.
+This method performs a naive check on the spool name for directory
+traversal attempts. *DO NOT* rely on this for security relevant systems,
+always validate user supplied queue names yourself before handing them
+off to this method!
+=end
+    def flush( spool, &block )
+      validate_spool_path spool
+      missing_spool_on_read_handler( spool ) unless @spools.has_key?( spool )
+      @spools[spool].flush( &block ) if @spools[spool]
+    end
+    private
+    def setup_spooldir
+      raise Errno::EACCES.new("The directory '#{@spool_dir}' does not exist and I don't have enough permissions to create it!") unless @spool_dir.parent.writable?
+      @spool_dir.mkpath
+      @spool_dir.chmod 0755
+    end
+    def create_spool_for_existing_path( pathname )
+      pathname.exist? ? SpoolPool::Spool.new( pathname ) : nil
+    end
+    def missing_spool_on_read_handler( spool )
+      spool_instance = create_spool_for_existing_path( @spool_dir + spool.to_s )
+      @spools[spool] = spool_instance if spool_instance
+    end
+    def assert_readable( pathname )
+      raise Errno::EACCES.new( "I can't read in the directory '#{pathname}'!" ) unless pathname.readable?
+    end
+    def assert_writeable( pathname )
+      raise Errno::EACCES.new( "I can't write to the directory '#{pathname}'!" ) unless pathname.writable?
+    end
+    def validate_spool_path( spool )
+      raise "Directory traversal attempt" if spool =~ %r{/\.\./} ||
+                                             spool =~ %r{\A\.\.\/}
+    end
+  end
+end