ioblockreader 1.0.0.20130611

Sign up to get free protection for your applications and to get access to all the features.
data/AUTHORS ADDED
@@ -0,0 +1,3 @@
1
+ = Muriel Salvan (muriel@x-aeon.com)
2
+
3
+ * 1.0.0.20130611
data/ChangeLog ADDED
@@ -0,0 +1,5 @@
1
+ = IOBlockReader Release History
2
+
3
+ == 1.0.0.20130611 (Beta)
4
+
5
+ * Initial public release.
data/Credits ADDED
@@ -0,0 +1,6 @@
1
+ = Projects used by IO Block Reader
2
+
3
+ == Ruby
4
+ * Yukihiro « matz » Matsumoto (http://www.rubyist.net/~matz/)
5
+ * http://www.ruby-lang.org/
6
+ * Thanks a lot Matz for this truly wonderful language !
data/LICENSE ADDED
@@ -0,0 +1,31 @@
1
+
2
+ The license stated herein is a copy of the BSD License (modified on July 1999).
3
+ The AUTHOR mentionned below refers to the list of people involved in the
4
+ creation and modification of any file included in the delivered package.
5
+ This list is found in the file named AUTHORS.
6
+ The AUTHORS and LICENSE files have to be included in any release of software
7
+ embedding source code of this package, or using it as a derivative software.
8
+
9
+ Copyright (c) 2010 - 2013 Muriel Salvan (muriel@x-aeon.com)
10
+
11
+ Redistribution and use in source and binary forms, with or without
12
+ modification, are permitted provided that the following conditions are met:
13
+
14
+ 1. Redistributions of source code must retain the above copyright notice,
15
+ this list of conditions and the following disclaimer.
16
+ 2. Redistributions in binary form must reproduce the above copyright notice,
17
+ this list of conditions and the following disclaimer in the documentation
18
+ and/or other materials provided with the distribution.
19
+ 3. The name of the author may not be used to endorse or promote products
20
+ derived from this software without specific prior written permission.
21
+
22
+ THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR IMPLIED
23
+ WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
24
+ MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO
25
+ EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
26
+ EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT
27
+ OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
28
+ INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
29
+ CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
30
+ IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY
31
+ OF SUCH DAMAGE.
data/README ADDED
@@ -0,0 +1,15 @@
1
+ = IO Block Reader
2
+
3
+ Ruby library giving block-buffered and cached read over IO objects with a String-like interface. Ideal to parse big files as Strings, limiting memory consumption.
4
+
5
+ == Where is the documentation ?
6
+
7
+ Check the website at https://github.com/Muriel-Salvan/ioblockreader
8
+
9
+ == Who wrote it ?
10
+
11
+ Check the AUTHORS[link:AUTHORS.html] file.
12
+
13
+ == What is the license ?
14
+
15
+ You can find out in the LICENSE[link:LICENSE.html] file.
data/README.md ADDED
@@ -0,0 +1,99 @@
1
+ IOBlockReader
2
+ =============
3
+
4
+ Ruby library giving block-buffered and cached read over IO objects with a String-like interface. Ideal to parse big files as Strings, limiting memory consumption.
5
+
6
+ ## Install
7
+
8
+ ``` bash
9
+ gem install ioblockreader
10
+ ```
11
+
12
+ ## Usage
13
+
14
+ ``` ruby
15
+ # Require the library
16
+ require 'ioblockreader'
17
+
18
+ # Open an IO
19
+ File.open('my_big_file', 'rb') do |file|
20
+
21
+ # Get an IOBlockReader on it
22
+ content = IOBlockReader.init(file)
23
+
24
+ # Access it directy
25
+ puts "Content: " + content[10..20]
26
+
27
+ # Perform a search
28
+ puts "Search 0123: " + content.index('0123')
29
+
30
+ end
31
+ ```
32
+
33
+ ## API
34
+
35
+ ### IOBlockReader.init(io, options = {})
36
+
37
+ Parameters:
38
+ * **io** ( _IO_ ): The IO object used to give the String interface
39
+ * **options** (<em>map< Symbol, Object ></em>): Additional options:
40
+ * **:block_size** ( _Fixnum_ ): The block size in bytes used internally. [default = 268435456]
41
+ * **:blocks_in_memory** ( _Fixnum_ ): Maximal number of blocks in memory. If it is required to load more blocks than this value for a single operation, this value is ignored. [default = 2]
42
+
43
+ Result:
44
+ * _IOBlockReader_: The IO Block Reader ready for use
45
+
46
+ Example:
47
+ ```
48
+ content = IOBlockReader.init(file, :block_size => 32768, :blocks_in_memory => 5)
49
+ ```
50
+
51
+ ### IOBlockReader#\[\](range)
52
+
53
+ Parameters:
54
+ * **range** ( _Fixnum_ or _Range_ ): Range to extract
55
+
56
+ Result:
57
+ * _String_: The resulting data
58
+
59
+ Example:
60
+ ```
61
+ single_char = content[10]
62
+ substring = content[10..20]
63
+ ```
64
+
65
+ ### IOBlockReader#index(token, offset = 0, max_size_regexp = 32)
66
+
67
+ Parameters:
68
+ * **token** ( _String_ , _Regexp_ or <em>list< Object ></em>): Token to be found. Can be a list of tokens.
69
+ * **offset** ( _Fixnum_ ): Offset starting the search [optional = 0]
70
+ * **max_size_regexp** ( _Fixnum_ ): Maximal number of characters the match should take in case of a Regexp token. Ignored if token is a String. [optional = 32]
71
+
72
+ Result:
73
+ * _Fixnum_: Index of the token (or the first one found from the given token list), or nil if none found.
74
+ * _Fixnum_: In case token was an Array, return the index of the matching token in the array, or nil if none found.
75
+
76
+ Example:
77
+ ```
78
+ # Simple string search
79
+ i = content.index('search string')
80
+
81
+ # Simple string search from a given offset
82
+ i = content.index('search string', 20)
83
+
84
+ # Regexp search: have to specify the maximal token length
85
+ i = content.index(/search \d words/, 0, 14)
86
+
87
+ # Regexp search from a given offset
88
+ i = content.index(/search \d words/, 20, 14)
89
+
90
+ # Search for multiple strings at once: will stop on the first one encountered
91
+ i, token_index = content.index( [ 'search string', 'another string' ] )
92
+
93
+ # Search for multiple tokens at once from a given offset: don't forget token length if using Regexp
94
+ i, token_index = content.index( [ 'search string', /another f.....g string/ ], 20, 22)
95
+ ```
96
+
97
+ ## Contact
98
+
99
+ Want to contribute? Have any questions? [Contact Muriel!](muriel@x-aeon.com)
data/Rakefile ADDED
@@ -0,0 +1,7 @@
1
+ #!/usr/bin/env rake
2
+
3
+ task :test do |t|
4
+ load("#{File.dirname(__FILE__)}/test/run.rb")
5
+ end
6
+
7
+ task :default => :test
data/ReleaseInfo ADDED
@@ -0,0 +1,8 @@
1
+
2
+ # This file has been generated by RubyPackager during a delivery.
3
+ # More info about RubyPackager: http://rubypackager.sourceforge.net
4
+ {
5
+ :version => '1.0.0.20130611',
6
+ :tags => [ 'Beta' ],
7
+ :dev_status => 'Beta'
8
+ }
@@ -0,0 +1,60 @@
1
+ module IOBlockReader
2
+
3
+ # Class defining a data block
4
+ class DataBlock
5
+
6
+ # Use a Fixnum sequence instead of real Time values for last_access_time for performance reasons
7
+ @@access_time_sequence = 0
8
+
9
+ # Offset of this block, in bytes
10
+ # _Fixnum_
11
+ attr_reader :offset
12
+
13
+ # Timestamp indicating when this block has been touched (created or done with touch method)
14
+ # _Fixnum_
15
+ attr_reader :last_access_time
16
+
17
+ # Data contained in this block
18
+ # _String_
19
+ attr_reader :data
20
+
21
+ # Constructor
22
+ def initialize
23
+ @offset = nil
24
+ @last_access_time = nil
25
+ @data = ''.force_encoding('ASCII-8BIT')
26
+ end
27
+
28
+ # Fill the data block for a given IO
29
+ #
30
+ # Parameters::
31
+ # * *io* (_IO_): IO to read from
32
+ # * *offset* (_Fixnum_): Offset of this block in the IO
33
+ # * *size* (_Fixnum_): Size of the block to be read
34
+ def fill(io, offset, size)
35
+ @offset = offset
36
+ @last_access_time = @@access_time_sequence
37
+ @@access_time_sequence += 1
38
+ #puts "[IOBlockReader] - Read #{size} @#{@offset}"
39
+ io.seek(@offset)
40
+ io.read(size, @data)
41
+ @last_block = io.eof?
42
+ end
43
+
44
+ # Is this block the last of its IO stream?
45
+ #
46
+ # Result:
47
+ # * _Boolean_: Is this block the last of its IO stream?
48
+ def last_block?
49
+ return @last_block
50
+ end
51
+
52
+ # Update the last access time
53
+ def touch
54
+ @last_access_time = @@access_time_sequence
55
+ @@access_time_sequence += 1
56
+ end
57
+
58
+ end
59
+
60
+ end
@@ -0,0 +1,257 @@
1
+ require 'ioblockreader/datablock'
2
+
3
+ module IOBlockReader
4
+
5
+ # Class giving a String-like interface over an IO, reading it by blocks.
6
+ # Very useful to access big files' content as it was a String containing the whole file's content.
7
+ class IOBlockReader
8
+
9
+ # Constructor
10
+ #
11
+ # Parameters::
12
+ # * *io* (_IO_): The IO object used to give the String interface
13
+ # * *options* (<em>map<Symbol,Object></em>): Additional options:
14
+ # ** *:block_size* (_Fixnum_): The block size in bytes used internally. [default = 268435456]
15
+ # ** *:blocks_in_memory* (_Fixnum_): Maximal number of blocks in memory. If it is required to load more blocks than this value for a single operation, this value is ignored. [default = 2]
16
+ def initialize(io, options = {})
17
+ # The underlying IO
18
+ @io = io
19
+ # Parse options
20
+ @block_size = options[:block_size] || 268435456
21
+ @blocks_in_memory = options[:blocks_in_memory] || 2
22
+ # The blocks
23
+ @blocks = []
24
+ # The last accessed block, used as a cache for quick [] access
25
+ @cached_block = nil
26
+ @cached_block_end_offset = nil
27
+ end
28
+
29
+ # Get a subset of the data.
30
+ # DO NOT USE NEGATIVE INDEXES.
31
+ #
32
+ # Parameters::
33
+ # * *range* (_Fixnum_ or _Range_): Range to extract
34
+ # Result::
35
+ # * _String_: The resulting data
36
+ def [](range)
37
+ #puts "[IOBlockReader] - [](#{range.inspect})"
38
+ if (range.is_a?(Fixnum))
39
+ # Use the cache if possible
40
+ return @cached_block.data[range - @cached_block.offset] if ((@cached_block != nil) and (range >= @cached_block.offset) and (range < @cached_block_end_offset))
41
+ #puts "[IOBlockReader] - [](#{range.inspect}) - Cache miss"
42
+ # Separate this case for performance
43
+ single_block_index, offset_in_block = range.divmod(@block_size)
44
+ # First check if all blocks are already loaded
45
+ if ((block = @blocks[single_block_index]) == nil)
46
+ read_needed_blocks([single_block_index], single_block_index, single_block_index)
47
+ block = @blocks[single_block_index]
48
+ else
49
+ block.touch
50
+ end
51
+ set_cache_block(block)
52
+ return block.data[offset_in_block]
53
+ else
54
+ # Use the cache if possible
55
+ return @cached_block.data[range.first - @cached_block.offset..range.last - @cached_block.offset] if ((@cached_block != nil) and (range.first >= @cached_block.offset) and (range.last < @cached_block_end_offset))
56
+ #puts "[IOBlockReader] - [](#{range.inspect}) - Cache miss"
57
+ first_block_index, first_offset_in_block = range.first.divmod(@block_size)
58
+ last_block_index, last_offset_in_block = range.last.divmod(@block_size)
59
+ # First check if all blocks are already loaded
60
+ if (first_block_index == last_block_index)
61
+ if ((block = @blocks[first_block_index]) == nil)
62
+ read_needed_blocks([first_block_index], first_block_index, last_block_index)
63
+ block = @blocks[first_block_index]
64
+ else
65
+ block.touch
66
+ end
67
+ set_cache_block(block)
68
+ return block.data[first_offset_in_block..last_offset_in_block]
69
+ else
70
+ # Get all indexes to be loaded
71
+ indexes_needing_loading = []
72
+ (first_block_index..last_block_index).each do |block_index|
73
+ if ((block = @blocks[block_index]) == nil)
74
+ indexes_needing_loading << block_index
75
+ else
76
+ block.touch
77
+ end
78
+ end
79
+ read_needed_blocks(indexes_needing_loading, first_block_index, last_block_index) if (!indexes_needing_loading.empty?)
80
+ # Now read across the blocks
81
+ result = @blocks[first_block_index].data[first_offset_in_block..-1].dup
82
+ (first_block_index+1..last_block_index-1).each do |block_index|
83
+ result.concat(@blocks[block_index].data)
84
+ end
85
+ result.concat(@blocks[last_block_index].data[0..last_offset_in_block])
86
+ # There are more chances that the last block will be accessed again. Cache this one.
87
+ set_cache_block(@blocks[last_block_index])
88
+ return result
89
+ end
90
+ end
91
+ end
92
+
93
+ # Perform a search of a token (or a list of tokens) in the IO.
94
+ # Warning: The token(s) to be found have to be smaller than the block size given to the constructor, otherwise they won't be found (you've been warned!). If you really need to search for tokens bigger than block size, extract the data using [] operator first, and then use index on it ; it will however make a complete copy of the data in memory prior to searching tokens.
95
+ #
96
+ # Parameters::
97
+ # * *token* (_String_, _Regexp_ or <em>list<Object></em>): Token to be found. Can be a list of tokens.
98
+ # * *offset* (_Fixnum_): Offset starting the search [optional = 0]
99
+ # * *max_size_regexp* (_Fixnum_): Maximal number of characters the match should take in case of a Regexp token. Ignored if token is a String. [optional = 32]
100
+ # Result::
101
+ # * _Fixnum_: Index of the token (or the first one found from the given token list), or nil if none found.
102
+ # * _Fixnum_: In case token was an Array, return the index of the matching token in the array, or nil if none found.
103
+ def index(token, offset = 0, max_size_regexp = 32)
104
+ #puts "[IOBlockReader] - index(#{token.inspect}, #{offset}, #{max_size_regexp})"
105
+ # Separate the trivial algo for performance reasons
106
+ current_block_index, offset_in_current_block = offset.divmod(@block_size)
107
+ if ((current_block = @blocks[current_block_index]) == nil)
108
+ read_needed_blocks([current_block_index], current_block_index, current_block_index)
109
+ current_block = @blocks[current_block_index]
110
+ else
111
+ current_block.touch
112
+ end
113
+ index_in_block = nil
114
+ index_matching_token = nil
115
+ if (token_is_array = token.is_a?(Array))
116
+ token.each_with_index do |token2, idx|
117
+ index_token2_in_block = current_block.data.index(token2, offset_in_current_block)
118
+ if (index_token2_in_block != nil) and ((index_in_block == nil) or (index_token2_in_block < index_in_block))
119
+ index_in_block = index_token2_in_block
120
+ index_matching_token = idx
121
+ end
122
+ end
123
+ else
124
+ index_in_block = current_block.data.index(token, offset_in_current_block)
125
+ end
126
+ if (index_in_block == nil)
127
+ # We have to search further: across blocks
128
+ # Compute the size of the token to be searched
129
+ token_size = 0
130
+ if token_is_array
131
+ token.each do |token2|
132
+ if (token2.is_a?(String))
133
+ token_size = token2.size if (token2.size > token_size)
134
+ else
135
+ token_size = max_size_regexp if (max_size_regexp > token_size)
136
+ end
137
+ end
138
+ elsif (token.is_a?(String))
139
+ token_size = token.size
140
+ else
141
+ token_size = max_size_regexp
142
+ end
143
+ # Loop on subsequent blocks to search for token
144
+ result = nil
145
+ while ((result == nil) and (!current_block.last_block?))
146
+ # Check that next block is loaded
147
+ if ((next_block = @blocks[current_block_index+1]) == nil)
148
+ read_needed_blocks([current_block_index+1], current_block_index+1, current_block_index+1)
149
+ next_block = @blocks[current_block_index+1]
150
+ else
151
+ next_block.touch
152
+ end
153
+ # Get data across the 2 blocks: enough to search for token_size data only
154
+ cross_data = current_block.data[1-token_size..-1] + next_block.data[0..token_size-2]
155
+ if token_is_array
156
+ token.each_with_index do |token2, idx|
157
+ index_token2_in_block = cross_data.index(token2)
158
+ if (index_token2_in_block != nil) and ((index_in_block == nil) or (index_token2_in_block < index_in_block))
159
+ index_in_block = index_token2_in_block
160
+ index_matching_token = idx
161
+ end
162
+ end
163
+ else
164
+ index_in_block = cross_data.index(token)
165
+ end
166
+ if (index_in_block == nil)
167
+ # Search in the next block
168
+ if token_is_array
169
+ token.each_with_index do |token2, idx|
170
+ index_token2_in_block = next_block.data.index(token2)
171
+ if (index_token2_in_block != nil) and ((index_in_block == nil) or (index_token2_in_block < index_in_block))
172
+ index_in_block = index_token2_in_block
173
+ index_matching_token = idx
174
+ end
175
+ end
176
+ else
177
+ index_in_block = next_block.data.index(token)
178
+ end
179
+ if (index_in_block == nil)
180
+ # Loop on the next block
181
+ current_block_index += 1
182
+ current_block = next_block
183
+ else
184
+ result = next_block.offset + index_in_block
185
+ end
186
+ else
187
+ result = next_block.offset - token_size + 1 + index_in_block
188
+ end
189
+ end
190
+ if token_is_array
191
+ return result, index_matching_token
192
+ else
193
+ return result
194
+ end
195
+ elsif token_is_array
196
+ return current_block.offset + index_in_block, index_matching_token
197
+ else
198
+ return current_block.offset + index_in_block
199
+ end
200
+ end
201
+
202
+ private
203
+
204
+ # Set the new cache block
205
+ #
206
+ # Parameters::
207
+ # * *block* (_DataBlock_): Block to be cached
208
+ def set_cache_block(block)
209
+ @cached_block = block
210
+ @cached_block_end_offset = block.offset + @block_size
211
+ end
212
+
213
+ # Read blocks from the IO
214
+ #
215
+ # Parameters::
216
+ # * *indexes_needing_loading* (<em>list<Fixnum></em>): List of indexes to be read
217
+ # * *first_block_index* (_Fixnum_): First block that has to be loaded
218
+ # * *last_block_index* (_Fixnum_): Last block that has to be loaded
219
+ def read_needed_blocks(indexes_needing_loading, first_block_index, last_block_index)
220
+ # We need to read from the IO
221
+ # First check if we need to remove some blocks prior
222
+ removed_blocks = []
223
+ nbr_freeable_blocks = 0
224
+ other_blocks = @blocks[0..first_block_index-1]
225
+ other_blocks.concat(@blocks[last_block_index+1..-1]) if (last_block_index+1 < @blocks.size)
226
+ other_blocks.each do |block|
227
+ nbr_freeable_blocks += 1 if (block != nil)
228
+ end
229
+ nbr_blocks_to_be_loaded = last_block_index - first_block_index + 1
230
+ if ((nbr_freeable_blocks > 0) and
231
+ (nbr_blocks_to_be_loaded + nbr_freeable_blocks > @blocks_in_memory))
232
+ # Need to make some space
233
+ nbr_blocks_to_free = [ nbr_blocks_to_be_loaded + nbr_freeable_blocks - @blocks_in_memory, nbr_freeable_blocks ].min
234
+ # Get the blocks that we remove for future reuse
235
+ other_blocks.
236
+ select { |block| block != nil }.
237
+ sort { |block1, block2| block1.last_access_time <=> block2.last_access_time }.each do |block|
238
+ #puts "[IOBlockReader] - Remove block #{block.offset}"
239
+ removed_blocks << block
240
+ break if (removed_blocks.size == nbr_blocks_to_free)
241
+ end
242
+ # Remove them for real
243
+ @blocks.map! { |block| removed_blocks.include?(block) ? nil : block }
244
+ end
245
+ # Now read the blocks, reusing the ones in removed_blocks if possible
246
+ indexes_needing_loading.each do |block_index|
247
+ # Have to load this block
248
+ block_to_fill = removed_blocks.pop
249
+ block_to_fill = DataBlock.new if (block_to_fill == nil)
250
+ block_to_fill.fill(@io, block_index * @block_size, @block_size)
251
+ @blocks[block_index] = block_to_fill
252
+ end
253
+ end
254
+
255
+ end
256
+
257
+ end
@@ -0,0 +1,17 @@
1
+ require 'ioblockreader/ioblockreader'
2
+
3
+ module IOBlockReader
4
+
5
+ # Init an IOBlockReader from an existing io.
6
+ # The io object has to be readable and seekable.
7
+ #
8
+ # Parameters::
9
+ # * *io* (_IO_): The IO object
10
+ # * *options* (<em>map<Symbol,Object></em>): Options (see IOBlockReader::IOBlockReader documentation) [default = {}]
11
+ # Result::
12
+ # * <em>IOBlockReader::IOBlockReader</em>: Resulting interface on the IO
13
+ def self.init(io, options = {})
14
+ ::IOBlockReader::IOBlockReader.new(io, options)
15
+ end
16
+
17
+ end
metadata ADDED
@@ -0,0 +1,57 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: ioblockreader
3
+ version: !ruby/object:Gem::Version
4
+ version: 1.0.0.20130611
5
+ prerelease:
6
+ platform: ruby
7
+ authors:
8
+ - Muriel Salvan
9
+ autorequire:
10
+ bindir: bin
11
+ cert_chain: []
12
+ date: 2013-06-11 00:00:00.000000000 Z
13
+ dependencies: []
14
+ description: Ruby library giving block-buffered and cached read over IO objects with
15
+ a String-like interface. Ideal to parse big files as Strings, limiting memory consumption.
16
+ email: muriel@x-aeon.com
17
+ executables: []
18
+ extensions: []
19
+ extra_rdoc_files: []
20
+ files:
21
+ - AUTHORS
22
+ - ChangeLog
23
+ - Credits
24
+ - lib/ioblockreader/datablock.rb
25
+ - lib/ioblockreader/ioblockreader.rb
26
+ - lib/ioblockreader.rb
27
+ - LICENSE
28
+ - Rakefile
29
+ - README
30
+ - README.md
31
+ - ReleaseInfo
32
+ homepage: https://github.com/Muriel-Salvan/ioblockreader
33
+ licenses: []
34
+ post_install_message:
35
+ rdoc_options: []
36
+ require_paths:
37
+ - lib
38
+ required_ruby_version: !ruby/object:Gem::Requirement
39
+ none: false
40
+ requirements:
41
+ - - ! '>='
42
+ - !ruby/object:Gem::Version
43
+ version: '0'
44
+ required_rubygems_version: !ruby/object:Gem::Requirement
45
+ none: false
46
+ requirements:
47
+ - - ! '>='
48
+ - !ruby/object:Gem::Version
49
+ version: '0'
50
+ requirements: []
51
+ rubyforge_project: ioblockreader
52
+ rubygems_version: 1.8.24
53
+ signing_key:
54
+ specification_version: 3
55
+ summary: Ruby library giving block-buffered and cached read over IO objects with a
56
+ String-like interface. Ideal to parse big files as Strings, limiting memory consumption.
57
+ test_files: []