RubyGems - ioblockreader - Versions diffs - 1.0.3.20130618 → 1.0.4.20130725 - Mend

ioblockreader 1.0.3.20130618 → 1.0.4.20130725

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (7) hide show

data/AUTHORS +1 -0
data/ChangeLog +6 -0
data/README.md +1 -141
data/ReleaseInfo +1 -1
data/lib/ioblockreader/datablock.rb +11 -1
data/lib/ioblockreader/ioblockreader.rb +26 -11
metadata +2 -2

data/AUTHORS CHANGED Viewed

@@ -4,3 +4,4 @@
 * 1.0.1.20130611
 * 1.0.2.20130613
 * 1.0.3.20130618
+* 1.0.4.20130725

data/ChangeLog CHANGED Viewed

@@ -1,5 +1,11 @@
 = IOBlockReader Release History
+== 1.0.4.20130725 (Beta)
+* cached_block was not being updated correctly when blocks were invalidated.
+* index with a token of length 1 returned wrong indexes when found using cross-blocks.
+* Added test for cached block
 == 1.0.3.20130618 (Beta)
 * Added get_block_containing_offset interface

data/README.md CHANGED Viewed

@@ -3,144 +3,4 @@ IOBlockReader
 Ruby library giving block-buffered and cached read over IO objects with a String-like interface. Ideal to parse big files as Strings, limiting memory consumption.
-## Install
-``` bash
-gem install ioblockreader
-```
-## Usage
-``` ruby
-# Require the library
-require 'ioblockreader'
-# Open an IO
-File.open('my_big_file', 'rb') do |file|
-  # Get an IOBlockReader on it
-  content = IOBlockReader.init(file)
-  # Access it directy
-  puts "Content: " + content[10..20]
-  # Perform a search
-  puts "Search 0123: " + content.index('0123')
-end
-```
-## API
-### IOBlockReader.init(io, options = {})
-Get an IOBlockReader instance on an IO.
-Parameters:
-* **io** ( _IO_ ): The IO object used to give the String interface
-* **options** (<em>map< Symbol, Object ></em>): Additional options:
-  * **:block_size** ( _Fixnum_ ): The block size in bytes used internally. [default = 268435456]
-  * **:blocks_in_memory** ( _Fixnum_ ): Maximal number of blocks in memory. If it is required to load more blocks than this value for a single operation, this value is ignored. [default = 2]
-Result:
-* _IOBlockReader_: The IO Block Reader ready for use
-Example:
-```
-content = IOBlockReader.init(file, :block_size => 32768, :blocks_in_memory => 5)
-```
-### IOBlockReader#\[\](range)
-Access a part of the data in the IO as a String.
-Parameters:
-* **range** ( _Fixnum_ or _Range_ ): Range to extract
-Result:
-* _String_: The resulting data
-Example:
-```
-single_char = content[10]
-substring = content[10..20]
-```
-### IOBlockReader#index(token, offset = 0, max_size_regexp = 32)
-Search for a token or a list of tokens.
-Parameters:
-* **token** ( _String_ , _Regexp_ or <em>list< Object ></em>): Token to be found. Can be a list of tokens.
-* **offset** ( _Fixnum_ ): Offset starting the search [optional = 0]
-* **max_size_regexp** ( _Fixnum_ ): Maximal number of characters the match should take in case of a Regexp token. Ignored if token is a String. [optional = 32]
-Result:
-* _Fixnum_: Index of the token (or the first one found from the given token list), or nil if none found.
-* _Fixnum_: In case token was an Array, return the index of the matching token in the array, or nil if none found.
-Example:
-```
-# Simple string search
-i = content.index('search string')
-# Simple string search from a given offset
-i = content.index('search string', 20)
-# Regexp search: have to specify the maximal token length
-i = content.index(/search \d words/, 0, 14)
-# Regexp search from a given offset
-i = content.index(/search \d words/, 20, 14)
-# Search for multiple strings at once: will stop on the first one encountered
-i, token_index = content.index( [ 'search string', 'another string' ] )
-# Search for multiple tokens at once from a given offset: don't forget token length if using Regexp
-i, token_index = content.index( [ 'search string', /another f.....g string/ ], 20, 22)
-```
-### IOBlockReader#each_block(range = 0)
-Iterate over blocks in the data.
-Parameters:
-* **range** ( _Range_ or _Fixnum_ ): The boundaries of the iteration, or the starting index [default = 0]
-* _Block_ : Code called for each block encountered
-  * Parameters:
-  * **data** ( _String_ ): The data
-Example:
-```
-# Iterate all over the IO
-content.each_block do |data|
-  puts "Got a block of #{data.size} bytes"
-end
-# Iterate on just a part
-content.each_block(10..50) do |data|
-  puts "Got a block of #{data.size} bytes"
-end
-```
-### IOBlockReader#get_block_containing_offset(offset = 0)
-Get the block containing a given offset.
-This method is mainly used to provide some low-level access for processes needing great parsing performance.
-Parameters:
-* **offset** ( _Fixnum_ ): The offset to be accessed [default = 0]
-Return:
-* _String_ : The block of data containing this offset
-* _Fixnum_ : The beginning offset of this data block
-* _Boolean_ : Is this block the last one?
-Example:
-```
-str_data, begin_offset, last_one = content.get_block_containing_offset(20)
-```
-## Contact
-Want to contribute? Have any questions? [Contact Muriel!](muriel@x-aeon.com)
+[See its documentation here.](http://ioblockreader.sourceforge.net)

data/ReleaseInfo CHANGED Viewed

@@ -2,7 +2,7 @@
 # This file has been generated by RubyPackager during a delivery.
 # More info about RubyPackager: http://rubypackager.sourceforge.net
 {
-  :version => '1.0.3.20130618',
+  :version => '1.0.4.20130725',
   :tags => [ 'Beta' ],
   :dev_status => 'Beta'
 }

data/lib/ioblockreader/datablock.rb CHANGED Viewed

@@ -39,9 +39,10 @@ module IOBlockReader
       @offset = offset
       @last_access_time = @@access_time_sequence
       @@access_time_sequence += 1
-      #puts "[IOBlockReader] - Read #{size} @#{@offset}"
+      #puts "[IOBlockReader] - Read #{size} bytes @#{@offset} in datablock ##{self.object_id}"
       @io.seek(@offset)
       @io.read(size, @data)
+      #puts "[IOBlockReader] - Data read: #{@data.inspect}"
       @last_block = @io.eof?
     end
@@ -59,6 +60,15 @@ module IOBlockReader
       @@access_time_sequence += 1
     end
+    # Get a string representation of this block.
+    # This is mainly used for debugging purposes.
+    #
+    # Result::
+    # * _String_: String representation
+    def to_s
+      return "[##{self.object_id}: @#{@offset} (last access: #{@last_access_time})#{@last_block ? ' (last block)' : ''}]"
+    end
   end
 end

data/lib/ioblockreader/ioblockreader.rb CHANGED Viewed

@@ -35,6 +35,7 @@ module IOBlockReader
     # * _String_: The resulting data
     def [](range)
       #puts "[IOBlockReader] - [](#{range.inspect})"
+      #display_current_blocks
       if (range.is_a?(Fixnum))
         # Use the cache if possible
         return @cached_block.data[range - @cached_block.offset] if ((@cached_block != nil) and (range >= @cached_block.offset) and (range < @cached_block_end_offset))
@@ -94,7 +95,7 @@ module IOBlockReader
     # Warning: The token(s) to be found have to be smaller than the block size given to the constructor, otherwise they won't be found (you've been warned!). If you really need to search for tokens bigger than block size, extract the data using [] operator first, and then use index on it ; it will however make a complete copy of the data in memory prior to searching tokens.
     #
     # Parameters::
-    # * *token* (_String_, _Regexp_ or <em>list<Object></em>): Token to be found. Can be a list of tokens.
+    # * *token* (_String_, _Regexp_ or <em>list<Object></em>): Token to be found. Can be a list of tokens. Please note than using a list of tokens is slower than using a single Regexp.
     # * *offset* (_Fixnum_): Offset starting the search [optional = 0]
     # * *max_size_regexp* (_Fixnum_): Maximal number of characters the match should take in case of a Regexp token. Ignored if token is a String. [optional = 32]
     # Result::
@@ -143,6 +144,7 @@ module IOBlockReader
         # Loop on subsequent blocks to search for token
         result = nil
         while ((result == nil) and (!current_block.last_block?))
+          #puts "[IOBlockReader] - index(#{token.inspect}, #{offset}, #{max_size_regexp}) - No find in last block #{current_block}. Continuing..."
           # Check that next block is loaded
           if ((next_block = @blocks[current_block_index+1]) == nil)
             read_needed_blocks([current_block_index+1], current_block_index+1, current_block_index+1)
@@ -150,20 +152,24 @@ module IOBlockReader
           else
             next_block.touch
           end
-          # Get data across the 2 blocks: enough to search for token_size data only
-          cross_data = current_block.data[1-token_size..-1] + next_block.data[0..token_size-2]
-          if token_is_array
-            token.each_with_index do |token2, idx|
-              index_token2_in_block = cross_data.index(token2)
-              if (index_token2_in_block != nil) and ((index_in_block == nil) or (index_token2_in_block < index_in_block))
-                index_in_block = index_token2_in_block
-                index_matching_token = idx
+          # Get data across the 2 blocks if needed: enough to search for token_size data only
+          if (token_size > 1)
+            cross_data = current_block.data[1-token_size..-1] + next_block.data[0..token_size-2]
+            #puts "[IOBlockReader] - index(#{token.inspect}, #{offset}, #{max_size_regexp}) - Find token in cross data: #{cross_data.inspect}..."
+            if token_is_array
+              token.each_with_index do |token2, idx|
+                index_token2_in_block = cross_data.index(token2)
+                if (index_token2_in_block != nil) and ((index_in_block == nil) or (index_token2_in_block < index_in_block))
+                  index_in_block = index_token2_in_block
+                  index_matching_token = idx
+                end
               end
+            else
+              index_in_block = cross_data.index(token)
             end
-          else
-            index_in_block = cross_data.index(token)
           end
           if (index_in_block == nil)
+            #puts "[IOBlockReader] - index(#{token.inspect}, #{offset}, #{max_size_regexp}) - No find in cross blocks #{current_block} / #{next_block}. Continuing..." if (token_size > 1)
             # Search in the next block
             if token_is_array
               token.each_with_index do |token2, idx|
@@ -302,6 +308,7 @@ module IOBlockReader
     # Parameters::
     # * *block* (_DataBlock_): Block to be cached
     def set_cache_block(block)
+      #puts "[IOBlockReader] - Set cached block to offset #{block.offset}"
       @cached_block = block
       @cached_block_end_offset = block.offset + @block_size
     end
@@ -344,10 +351,18 @@ module IOBlockReader
         block_to_fill = removed_blocks.pop
         block_to_fill = DataBlock.new(@io) if (block_to_fill == nil)
         block_to_fill.fill(block_index * @block_size, @block_size)
+        # Update the cached block end offset if it was modified
+        @cached_block_end_offset = block_to_fill.offset + @block_size if (block_to_fill == @cached_block)
         @blocks[block_index] = block_to_fill
       end
     end
+    # Display current blocks
+    def display_current_blocks
+      puts "[IOBlockReader] - #{@blocks.size} blocks: #{@blocks.map { |block| (block == nil) ? '[nil]' : block }.join(' ')}"
+      puts "[IOBlockReader] - Cached block: #{(@cached_block == nil) ? '[nil]' : @cached_block } - End: #{@cached_block_end_offset}"
+    end
   end
 end

metadata CHANGED Viewed

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: ioblockreader
 version: !ruby/object:Gem::Version
-  version: 1.0.3.20130618
+  version: 1.0.4.20130725
   prerelease:
 platform: ruby
 authors:
@@ -9,7 +9,7 @@ authors:
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2013-06-18 00:00:00.000000000 Z
+date: 2013-07-25 00:00:00.000000000 Z
 dependencies: []
 description: Ruby library giving block-buffered and cached read over IO objects with
   a String-like interface. Ideal to parse big files as Strings, limiting memory consumption.