external 0.1.0 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (49) hide show
  1. data/History +7 -0
  2. data/MIT-LICENSE +1 -3
  3. data/README +162 -127
  4. data/lib/external.rb +2 -3
  5. data/lib/external/base.rb +174 -47
  6. data/lib/external/chunkable.rb +131 -105
  7. data/lib/external/enumerable.rb +78 -33
  8. data/lib/external/io.rb +163 -398
  9. data/lib/external/patches/ruby_1_8_io.rb +31 -0
  10. data/lib/external/patches/windows_io.rb +53 -0
  11. data/lib/external/patches/windows_utils.rb +27 -0
  12. data/lib/external/utils.rb +148 -0
  13. data/lib/external_archive.rb +840 -0
  14. data/lib/external_array.rb +57 -0
  15. data/lib/external_index.rb +1053 -0
  16. metadata +42 -58
  17. data/lib/ext_arc.rb +0 -108
  18. data/lib/ext_arr.rb +0 -727
  19. data/lib/ext_ind.rb +0 -1120
  20. data/test/benchmarks/benchmarks_20070918.txt +0 -45
  21. data/test/benchmarks/benchmarks_20070921.txt +0 -91
  22. data/test/benchmarks/benchmarks_20071006.txt +0 -147
  23. data/test/benchmarks/test_copy_file.rb +0 -80
  24. data/test/benchmarks/test_pos_speed.rb +0 -47
  25. data/test/benchmarks/test_read_time.rb +0 -55
  26. data/test/cached_ext_ind_test.rb +0 -219
  27. data/test/check/benchmark_check.rb +0 -441
  28. data/test/check/namespace_conflicts_check.rb +0 -23
  29. data/test/check/pack_check.rb +0 -90
  30. data/test/ext_arc_test.rb +0 -286
  31. data/test/ext_arr/alt_sep.txt +0 -3
  32. data/test/ext_arr/cr_lf_input.txt +0 -3
  33. data/test/ext_arr/input.index +0 -0
  34. data/test/ext_arr/input.txt +0 -1
  35. data/test/ext_arr/inputb.index +0 -0
  36. data/test/ext_arr/inputb.txt +0 -1
  37. data/test/ext_arr/lf_input.txt +0 -3
  38. data/test/ext_arr/lines.txt +0 -19
  39. data/test/ext_arr/without_index.txt +0 -1
  40. data/test/ext_arr_test.rb +0 -534
  41. data/test/ext_ind_test.rb +0 -1472
  42. data/test/external/base_test.rb +0 -74
  43. data/test/external/chunkable_test.rb +0 -182
  44. data/test/external/index/input.index +0 -0
  45. data/test/external/index/inputb.index +0 -0
  46. data/test/external/io_test.rb +0 -414
  47. data/test/external_test_helper.rb +0 -31
  48. data/test/external_test_suite.rb +0 -4
  49. data/test/test_array.rb +0 -1192
data/History CHANGED
@@ -1,3 +1,10 @@
1
+ == 0.3.0 / 2008-10-27
2
+
3
+ Major update with refactoring (ex ExtArr is now ExternalArray)
4
+ and greatly expanded testing. [] and []= methods all Externals
5
+ now comply with the Array specification in RubySpec[rubyspec.org].
6
+ Implementation of other methods is under way.
7
+
1
8
  == 0.1.0 / 2007-12-10 revision 23
2
9
 
3
10
  Initial release with working [] and []= methods
data/MIT-LICENSE CHANGED
@@ -1,6 +1,4 @@
1
- Copyright (c) 2006-2007, Regents of the University of Colorado.
2
- Developer:: Simon Chiang, Biomolecular Structure Program, Hansen Lab
3
- Support:: CU Denver School of Medicine Deans Academic Enrichment Fund
1
+ Copyright (c) 2006-2008, Regents of the University of Colorado.
4
2
 
5
3
  Permission is hereby granted, free of charge, to any person obtaining a copy of this
6
4
  software and associated documentation files (the "Software"), to deal in the Software
data/README CHANGED
@@ -4,165 +4,200 @@ Indexing and array-like access to data stored on disk rather than in memory.
4
4
 
5
5
  == Description
6
6
 
7
- External provides an easy way to index files such that array-like calls can store and
8
- retrieve entries directly from the file without loading it into memory. The indexes can
9
- be cached for performance or stored on disk alongside the data file, in essence giving you
10
- arbitrarily large arrays.
7
+ External provides a way to index and access array data directly from a file
8
+ without loading it into memory. Indexes may be cached in memory or stored
9
+ on disk with the data file, in essence giving you arbitrarily large arrays.
10
+ Externals automatically chunk and buffer methods like <tt>each</tt> so that
11
+ the memory footprint remains low even during enumeration.
11
12
 
12
- The main classes of external provide array-like access to the following:
13
- * ExtInd (External Index) -- formatted binary data
14
- * ExtArr (External Array) -- externally stored ruby objects
15
- * ExtArc (External Archive) -- externally stored string data
13
+ The main External classes are:
16
14
 
17
- ExtArc is a subclass of ExtArr specialized for string archival files, formats like FASTA
18
- where entries are strings delimited by '>':
15
+ * ExternalIndex -- for formatted binary data
16
+ * ExternalArchive -- for string data
17
+ * ExternalArray -- for objects (serialized as YAML)
19
18
 
20
- >Q9BXQ0|Q9BXQ0_HUMAN Tissue transglutaminase (Fragment) - Homo sapiens (Human).
21
- LEPFSGKALCSWSIC
22
- >P02452|CO1A1_HUMAN Collagen alpha-1(I) chain - Homo sapiens (Human).
23
- MFSFVDLRLLLLLAATALLTHGQEEGQVEGQDEDIPPITCVQNGLRYHDRDVWKPEPCRI
24
- CVCDNGKVLCDDVICDETKNCPGAEVPEGECCPVCPDGSESPTDQETTGVEGPKGDTGPR
25
- GPRGPAGPPGRDGIPGQPGLPGPPGPPGPPGPPGLGGNFAPQLSYGYDEKSTGGISVPGP
26
- ...
19
+ The array-like behavior of these classes is developed using modified versions
20
+ of the RubySpec[http://rubyspec.org] specification for Array. The idea is to
21
+ eventually duck-type all Array methods, including sort and collect, with
22
+ acceptable performance.
27
23
 
28
- The array-like behavior of these classes is developed against modified versions of the
29
- Array tests themselves, and often uses the exact same tests. The idea is to eventually
30
- duck-type all Array methods, including sort and collect, with acceptable performance.
24
+ * Rubyforge[http://rubyforge.org/projects/external]
25
+ * Lighthouse[http://bahuvrihi.lighthouseapp.com/projects/10590-external]
26
+ * Github[http://github.com/bahuvrihi/external/tree/master]
31
27
 
32
- === Bugs/Known Issues
28
+ ==== Bugs/Known Issues
33
29
 
34
30
  * only a limited set of array methods are currently supported
35
- * reindexing of ExtArr does not work for arrays containing yaml strings
36
- * yaml serialization/deserialization of some strings do not reproduce identical input
37
- and so will not be faithfully store in ExtArr. Carriage return string are notable:
38
- "\r", "\r\n", "string_with_\r\n_internal", as are chains of newlines: "\n", "\n\n"
39
- * documentation is poor at the moment
31
+ * currently only [] and []= are fully tested vs RubySpec
32
+ * documentation is patchy
40
33
 
41
- --
42
- == Performance
43
- ++
34
+ Note also that YAML dump/load of some objects doesn't work or doesn't
35
+ reproduce the object; such objects will not be properly stored in an
36
+ ExternalArray. Problematic objects include:
44
37
 
45
- == Info
38
+ Proc and Class:
46
39
 
47
- Copyright (c) 2006-2007, Regents of the University of Colorado.
48
- Developer:: {Simon Chiang}[http://bahuvrihi.wordpress.com], {Biomolecular Structure Program}[http://biomol.uchsc.edu/], {Hansen Lab}[http://hsc-proteomics.uchsc.edu/hansenlab/]
49
- Support:: CU Denver School of Medicine Deans Academic Enrichment Fund
50
- Licence:: MIT-Style
40
+ block = lambda {}
41
+ YAML.load(YAML.dump(block)) # !> TypeError: allocator undefined for Proc
42
+ YAML.dump(Object) # !> TypeError: can't dump anonymous class Class
51
43
 
52
- == Installation
44
+ Carriage returns ("\r"):
53
45
 
54
- External is available from RubyForge[http://rubyforge.org/projects/external]. Use:
46
+ YAML.load(YAML.dump("\r")) # => nil
47
+ YAML.load(YAML.dump("\r\n")) # => ""
48
+ YAML.load(YAML.dump("string with \r\n inside")) # => "string with \n inside"
55
49
 
56
- % gem install external
50
+ Chains of newlines ("\n"):
57
51
 
58
- == Usage
52
+ YAML.load(YAML.dump("\n")) # => ""
53
+ YAML.load(YAML.dump("\n\n")) # => ""
54
+
55
+ DateTime is loaded as Time:
59
56
 
60
- === ExtArr
57
+ YAML.load(YAML.dump(DateTime.now)).class # => Time
58
+
59
+ == Usage
61
60
 
62
- ExtArr can be initialized from data using the [] operator and used as an array.
61
+ === ExternalArray
63
62
 
64
- ea = ExtArr[1, 2.2, "cat", {:key => 'value'}]
65
- ea[2] # => "cat"
66
- ea.last # => {:key => 'value'}
67
- ea << [:a, :b]
68
- ea.to_a # => [1, 2.2, "cat", {:key => 'value'}, [:a, :b]]
63
+ ExternalArray can be initialized from data using the [] operator and used like
64
+ an array.
69
65
 
70
- Behind the scenes, ExtArr serializes and stores entries on a data source (io) and builds an
71
- ExtInd that tracks where each entry begins and ends.
66
+ a = ExternalArray['str', {'key' => 'value'}]
67
+ a[0] # => 'str'
68
+ a.last # => {'key' => 'value'}
69
+ a << [1,2]; a.to_a # => ['str', {'key' => 'value'}, [1,2]]
72
70
 
73
- ea.io.class # => Tempfile
74
- ea.io.rewind
75
- ea.io.read # => "--- 1\n--- 2.2\n--- cat\n--- \n:key: value\n--- \n- :a\n- :b\n"
71
+ ExternalArray serializes and stores entries to an io while building an io_index
72
+ that tracks the start and length of each entry. By default ExternalArray
73
+ will serialize to a Tempfile and use an Array as the io_index:
76
74
 
77
- ea.index.class # => ExtInd
78
- ea.index.to_a # => [[0, 6], [6, 8], [14, 8], [22, 17], [39, 15]]
75
+ a.io.class # => Tempfile
76
+ a.io.rewind; a.io.read # => "--- str\n--- \nkey: value\n--- \n- 1\n- 2\n"
77
+ a.io_index.class # => Array
78
+ a.io_index.to_a # => [[0, 8], [8, 16], [24, 13]]
79
79
 
80
- By default External supports File, Tempfile, and StringIO data sources. If no data source is
81
- given (as above), the external array is initialized to a Tempfile so that it will be cleaned
82
- up on exit.
80
+ To save this data more permanently, provide a path to <tt>close</tt>; the tempfile
81
+ is moved to the path and a binary index file will be created:
83
82
 
84
- ExtArr can be initialized from existing data sources. In this case, ExtArr tries to find and
85
- load an existing index; if the index doesn't exist, then you have to reindex the data manually.
83
+ a.close('example.yml')
84
+ File.read('example.yml') # => "--- str\n--- \nkey: value\n--- \n- 1\n- 2\n"
85
+
86
+ index = File.read('example.index')
87
+ index.unpack('I*') # => [0, 8, 8, 16, 24, 13]
86
88
 
87
- File.open('path/to/file.txt', "w+") do |file|
88
- file << "--- 1\n--- 2.2\n--- cat\n--- \n:key: value\n--- \n- :a\n- :b\n"
89
- file.flush
89
+ ExternalArray provides <tt>open</tt> to create ExternalArrays from an existing
90
+ file; the instance will use an index file if it exists and automatically
91
+ reindex the data if it does not. Manual calls to reindex may be necessary when
92
+ you initialize an ExternalArray with <tt>new</tt> instead of <tt>open</tt>:
90
93
 
91
- index_filepath = ExtArr.default_index_filepath(file.path)
92
- File.exists?(index_filepath) # => false
93
-
94
- ea = ExtArr.new(file)
95
- ea.to_a # => []
96
- ea.reindex
97
- ea.to_a # => [1, 2.2, "cat", {:key => 'value'}, [:a, :b]]
94
+ # use of an existing index file
95
+ ExternalArray.open('example.yml') do |b|
96
+ File.basename(b.io_index.io.path) # => 'example.index'
97
+ b.to_a # => ['str', {'key' => 'value'}, [1,2]]
98
98
  end
99
99
 
100
- ExtArr provides an open method for easy access to file data:
101
-
102
- ExtArr.open('path/to/file.txt') do |ea|
103
- # ...
100
+ # automatic reindexing
101
+ FileUtils.rm('example.index')
102
+ ExternalArray.open('example.yml') do |b|
103
+ b.to_a # => ['str', {'key' => 'value'}, [1,2]]
104
104
  end
105
-
106
- === ExtArc
107
-
108
- ExtArc is a subclass of ExtArr designed for string archival files. Rather than serialize and
109
- load ruby objects to and from the data file, ExtArc simply read and writes strings. In
110
- addition, ExtArc provides additional reindexing methods designed to make reindexing easy.
111
-
112
- arc = ExtArc[">swift", ">brown", ">fox"]
113
- arc[2] # => ">fox"
114
- arc.to_a # => [">swift", ">brown", ">fox"]
115
-
116
- arc.io.class # => Tempfile
117
- arc.io.rewind
118
- arc.io.read # => ">swift>brown>fox"
119
-
120
- File.open('path/to/file.txt', "w+") do |file|
121
- file << ">swift>brown>fox"
122
- file.flush
123
-
124
- # Reindex by a separation string
125
- arc = ExtArc.new(file)
126
- arc.to_a # => []
127
- arc.reindex_by_sep(:sep_string => ">", :entry_follows_sep => true)
128
- arc.to_a # => [">swift", ">brown", ">fox"]
129
-
130
- # Reindex by scanning an entry
131
- arc = ExtArc.new(file)
132
- arc.to_a # => []
133
- arc.reindex_by_scan(/>\w*/)
134
- arc.to_a # => [">swift", ">brown", ">fox"]
105
+
106
+ # manual reindexing
107
+ file = File.open('example.yml')
108
+ c = ExternalArray.new(file)
109
+
110
+ c.to_a # => []
111
+ c.reindex
112
+ c.to_a # => ['str', {'key' => 'value'}, [1,2]]
113
+
114
+ === ExternalArchive
115
+
116
+ ExternalArchive is exactly like ExternalArray except that it only stores
117
+ strings (ExternalArray is actually a subclass of ExternalArchive which
118
+ dumps/loads strings).
119
+
120
+ arc = ExternalArchive["swift", "brown", "fox"]
121
+ arc[2] # => "fox"
122
+ arc.to_a # => ["swift", "brown", "fox"]
123
+ arc.io.rewind; arc.io.read # => "swiftbrownfox"
124
+
125
+ ExternalArchive is useful as a base for classes to access archival data.
126
+ Here is a simple parser for FASTA[http://en.wikipedia.org/wiki/Fasta_format]
127
+ data:
128
+
129
+ # A sample FASTA entry
130
+ # >gi|5524211|gb|AAD44166.1| cytochrome b [Elephas maximus maximus]
131
+ # LCLYTHIGRNIYYGSYLYSETWNTGIMLLLITMATAFMGYVLPWGQMSFWGATVITNLFSAIPYIGTNLV
132
+ # EWIWGGFSVDKATLNRFFAFHFILPFTMVALAGVHLTFLHETGSNNPLGLTSDSDKIPFHPYYTIKDFLG
133
+ # LLILILLLLLLALLSPDMLGDPDNHMPADPLNTPLHIKPEWYFLFAYAILRSVPNKLGGVLALFLSIVIL
134
+ # GLMPFLHTSKHRSMMLRPLSQALFWTLTMDLLTLTWIGSQPVEYPYTIIGQMASILYFSIILAFLPIAGX
135
+ # IENY
136
+
137
+ class FastaEntry
138
+ attr_reader :header, :body
139
+
140
+ def initialize(str)
141
+ @body = str.split(/\r?\n/)
142
+ @header = body.shift
143
+ end
135
144
  end
136
-
137
- === ExtInd
138
-
139
- ExtInd provides array-like access to formatted binary data. The index of ExtArr is an
140
- ExtInd constructed to access data formatted as 'II'; two integers corresponding to the
141
- start position and length of entries in the ExtArr data source. For simple, repetitive
142
- formats like 'II', processing is optimized to use a general format and frame.
143
-
144
- ea = ExtArr.new
145
- ea.index.class # => ExtInd
146
- index = ea.index
147
-
148
- index.format # => 'I*'
149
- index.frame # => 2
150
- index << [1,2]
151
- index << [3,4]
152
- index.to_a # => [[1,2],[3,4]]
153
-
154
- ExtInd handles arbitrary packing formats, opening many possibilites:
155
-
156
- File.open('path/to/file', "w+") do |file|
145
+
146
+ class FastaArchive < ExternalArchive
147
+ def str_to_entry(str); FastaEntry.new(str); end
148
+ def entry_to_str(entry); ([entry.header] + entry.body).join("\n"); end
149
+
150
+ def reindex
151
+ reindex_by_sep('>', :entry_follows_sep => true)
152
+ end
153
+ end
154
+
155
+ require 'open-uri'
156
+ fasta = FastaArchive.new open('http://external.rubyforge.org/doc/tiny_fasta.txt')
157
+ fasta.reindex
158
+
159
+ fasta.length # => 5
160
+ fasta[0].body # => ["MEVNILAFIATTLFVLVPTAFLLIIYVKTVSQSD"]
161
+
162
+ The non-redundant {NCBI protein database}[ftp://ftp.ncbi.nih.gov/blast/db/FASTA/]
163
+ contains greater than 7 million FASTA entries in a 3.56 GB file; ExternalArchive
164
+ is targeted at files that size, where lazy loading of data and a small memory
165
+ footprint are critical.
166
+
167
+ === ExternalIndex
168
+
169
+ ExternalIndex provides array-like access to formatted binary data. The index of an
170
+ uncached ExternalArray is an ExternalIndex configured for binary data like 'II'; two
171
+ integers corresponding to the start position and length an entry.
172
+
173
+ index = ExternalIndex[1, 2, 3, 4, 5, 6, {:format => 'II'}]
174
+ index.format # => 'I*'
175
+ index.frame # => 2
176
+ index[1] # => [3,4]
177
+ index.to_a # => [[1,2], [3,4], [5,6]]
178
+
179
+ ExternalIndex handles arbitrary packing formats, opening many possibilities:
180
+
181
+ Tempfile.new('sample.txt') do |file|
157
182
  file << [1,2,3].pack("IQS")
158
183
  file << [4,5,6].pack("IQS")
159
184
  file << [7,8,9].pack("IQS")
160
185
  file.flush
161
186
 
162
- index = ExtInd.new(file, :format => "IQS")
163
- index[1] # => [4,5,6]
164
- index.to_a # => [[1,2,3],[4,5,6],[7,8,9]]
187
+ index = ExternalIndex.new(file, :format => "IQS")
188
+ index[1] # => [4,5,6]
189
+ index.to_a # => [[1,2,3], [4,5,6], [7,8,9]]
165
190
  end
166
191
 
167
- Note: at the moment formats must be specified longhand, ie 'III' cannot be written as 'I3',
168
- and the native size directives for sSiIlL are not supported.
192
+ == Installation
193
+
194
+ External is available from RubyForge[http://rubyforge.org/projects/external]. Use:
195
+
196
+ % gem install external
197
+
198
+ == Info
199
+
200
+ Copyright (c) 2006-2008, Regents of the University of Colorado.
201
+ Developer:: {Simon Chiang}[http://bahuvrihi.wordpress.com], {Biomolecular Structure Program}[http://biomol.uchsc.edu/], {Hansen Lab}[http://hsc-proteomics.uchsc.edu/hansenlab/]
202
+ Support:: CU Denver School of Medicine Deans Academic Enrichment Fund
203
+ Licence:: {MIT-Style}[link:files/MIT-LICENSE.html]
data/lib/external.rb CHANGED
@@ -1,3 +1,2 @@
1
- require 'ext_ind'
2
- require 'ext_arr'
3
- require 'ext_arc'
1
+ $:.unshift File.expand_path(File.dirname(__FILE__))
2
+ require 'external_array'
data/lib/external/base.rb CHANGED
@@ -1,66 +1,65 @@
1
- require 'external/io'
2
- require 'external/chunkable'
3
- require 'external/enumerable'
1
+ # For some inexplicable reason yaml MUST be required before
2
+ # tempfile in order for ExtArrTest::test_LSHIFT to pass.
3
+ # Otherwise it fails with 'TypeError: allocator undefined for Proc'
4
+
5
+ require 'yaml'
4
6
  require 'tempfile'
5
7
 
8
+ require 'external/enumerable'
9
+ require 'external/io'
10
+
6
11
  module External
7
12
 
8
- #--
9
- # Base provides the basic array functionality shared by ExtArr and Index,
10
- # essentially wrapping the IO functions required to access and utilized external
11
- # array data with the standard array functions. Bases can be opened with
12
- # in any of the IO modes; the capabilities of Base will be reduced accordingly
13
- # (ie read-only Bases cannot write values using []=, for instance).
14
- #
15
- # It is VERY IMPORTANT to realize that the underlying IO will be opened using the
16
- # given mode. The 'w' mode will overwrite all existing data; 'r+' is a safer mode
17
- # for full read-write functionality. Note that since Base actively scans over
18
- # the IO, append modes essentially behaves like write, but does not overwrite existing
19
- # data.
20
- #
21
- # To work properly, Base must be subclassed with methods:
22
- # * length
23
- # * io_fetch
24
- #++
25
- #
26
- #
13
+ # Base provides shared IO and Array-like methods used by ExternalArchive,
14
+ # ExternalArray, and ExternalIndex.
27
15
  class Base
28
16
  class << self
29
- def open(fd=nil, mode="r", options={})
30
- fd = File.open(fd, mode) unless fd == nil
31
- ab = self.new(fd, options)
17
+
18
+ # Initializes an instance of self with File.open(path, mode) as an io.
19
+ # As with File.open, the instance will be passed to the block and
20
+ # closed when the block returns. If no block is given, open returns
21
+ # the new instance.
22
+ #
23
+ # Nil may be provided as an fd, in which case a Tempfile will be
24
+ # used (in which case mode gets ignored as Tempfiles always open
25
+ # in 'r+' mode).
26
+ def open(path=nil, mode="rb", *argv)
27
+ path = File.open(path, mode) unless path == nil
28
+ base = new(path, *argv)
32
29
 
33
30
  if block_given?
34
31
  begin
35
- yield(ab)
32
+ yield(base)
36
33
  ensure
37
- ab.close
34
+ base.close
38
35
  end
39
36
  else
40
- ab
37
+ base
41
38
  end
42
39
  end
43
40
  end
44
41
 
45
42
  include External::Enumerable
46
43
  include External::Chunkable
47
-
44
+
45
+ # The underlying io for self.
48
46
  attr_reader :io
49
47
 
50
- # Initializes a new Base given the file descriptor, mode and options.
51
- # (see open_io for details on what io is opened for a given file descriptor)
52
- #
53
- # If mode contains an 's', then the Base will be initialized in strio
54
- # mode where the underlying IO will be a StringIO. In this case the fd
55
- # will be used as the string to initialize the StringIO.
56
- #
57
- # Standard options for Base include:
58
- # nil_value:: the value written to file for nils, and converted to nil on read
59
- # (default ' ')
60
- # max_gap:: the maximum gap size used by Offset (default 10000)
61
- # max_chunk_size:: the chunk size used by Offset (default 1M)
48
+ # The default tempfile basename for Base instances
49
+ # initialized without an io.
50
+ TEMPFILE_BASENAME = "external_base"
51
+
52
+ # Creates a new instance of self with the specified io. A
53
+ # nil io causes initialization with a Tempfile; a string
54
+ # io will be converted into a StringIO.
62
55
  def initialize(io=nil)
63
- self.io = (io.nil? ? Tempfile.new("array_base") : io)
56
+ self.io = case io
57
+ when nil then Tempfile.new(TEMPFILE_BASENAME)
58
+ when String then StringIO.new(io)
59
+ else io
60
+ end
61
+
62
+ @enumerate_to_a = true
64
63
  end
65
64
 
66
65
  # True if io is closed.
@@ -68,18 +67,146 @@ module External
68
67
  io.closed?
69
68
  end
70
69
 
71
- # Closes io.
72
- def close
70
+ # Closes io. If a path is specified, io will be dumped to it. If
71
+ # io is a File or Tempfile, the existing file is moved (not dumped)
72
+ # to path. Raises an error if path already exists and overwrite is
73
+ # not specified.
74
+ def close(path=nil, overwrite=false)
75
+ result = !io.closed?
76
+
77
+ if path
78
+ if File.exists?(path) && !overwrite
79
+ raise ArgumentError, "already exists: #{path}"
80
+ end
81
+
82
+ case io
83
+ when File, Tempfile
84
+ io.close unless io.closed?
85
+ FileUtils.move(io.path, path)
86
+ else
87
+ io.flush
88
+ io.rewind
89
+ File.open(path, "w") do |file|
90
+ file << io.read(io.default_blksize) while !io.eof?
91
+ end
92
+ end
93
+ end
94
+
73
95
  io.close unless io.closed?
96
+ result
97
+ end
98
+
99
+ # Flushes the io and resets the io length. Returns self
100
+ def flush
101
+ io.flush
102
+ io.reset_length
103
+ self
104
+ end
105
+
106
+ # Returns a duplicate of self. This can be a slow operation
107
+ # as it may involve copying the full contents of one large
108
+ # file to another.
109
+ def dup
110
+ flush
111
+ another.concat(self)
112
+ end
113
+
114
+ # Returns another instance of self. Must be
115
+ # implemented in a subclass.
116
+ def another
117
+ raise NotImplementedError
118
+ end
119
+
120
+ ###########################
121
+ # Array methods
122
+ ###########################
123
+
124
+ # Returns true if _self_ contains no elements
125
+ def empty?
126
+ length == 0
127
+ end
128
+
129
+ def eql?(another)
130
+ self == another
131
+ end
132
+
133
+ # Returns the first n entries (default 1)
134
+ def first(n=nil)
135
+ n.nil? ? self[0] : self[0,n]
136
+ end
137
+
138
+ # Alias for []
139
+ def slice(one, two = nil)
140
+ self[one, two]
141
+ end
142
+
143
+ # Returns self.
144
+ #--
145
+ # Warning -- errors show up when this doesn't return
146
+ # an Array... however to return an array with to_ary
147
+ # may mean converting a Base into an Array for
148
+ # insertions... see/modify convert_to_ary
149
+ def to_ary
150
+ self
151
+ end
152
+
153
+ #
154
+ def inspect
155
+ "#<#{self.class}:#{object_id} #{ellipse_inspect(self)}>"
74
156
  end
75
157
 
76
158
  protected
77
159
 
78
- # Sets io and extends the input io with External::Position.
79
- def io=(io)
80
- io.extend External::IO unless io.kind_of?(External::IO)
160
+ # Sets io and extends the input io with Io.
161
+ def io=(io) # :nodoc:
162
+ io.extend Io unless io.kind_of?(Io)
81
163
  @io = io
82
164
  end
165
+
166
+ # converts obj to an int using the <tt>to_int</tt>
167
+ # method, if the object responds to <tt>to_int</tt>
168
+ def convert_to_int(obj) # :nodoc:
169
+ obj.respond_to?(:to_int) ? obj.to_int : obj
170
+ end
83
171
 
172
+ # converts obj to an array using the <tt>to_ary</tt>
173
+ # method, if the object responds to <tt>to_ary</tt>
174
+ def convert_to_ary(obj) # :nodoc:
175
+ obj == nil ? [] : obj.respond_to?(:to_ary) ? obj.to_ary : [obj]
176
+ end
177
+
178
+ # a more array-compliant version of Chunkable#split_range
179
+ def split_range(range, total=length) # :nodoc:
180
+ # split the range
181
+ start = convert_to_int(range.begin)
182
+ raise TypeError, "can't convert #{range.begin.class} into Integer" unless start.kind_of?(Integer)
183
+ start += total if start < 0
184
+
185
+ finish = convert_to_int(range.end)
186
+ raise TypeError, "can't convert #{range.end.class} into Integer" unless finish.kind_of?(Integer)
187
+ finish += total if finish < 0
188
+
189
+ length = finish - start
190
+ length -= 1 if range.exclude_end?
191
+
192
+ [start, length]
193
+ end
194
+
195
+ # helper to inspect large arrays
196
+ def ellipse_inspect(array) # :nodoc:
197
+ if array.length > 10
198
+ "[#{collect_join(array[0,5])} ... #{collect_join(array[-5,5])}] (length = #{array.length})"
199
+ else
200
+ "[#{collect_join(array.to_a)}]"
201
+ end
202
+ end
203
+
204
+ # another helper to inspect large arrays
205
+ def collect_join(array) # :nodoc:
206
+ array.collect do |obj|
207
+ obj.inspect
208
+ end.join(', ')
209
+ end
210
+
84
211
  end
85
212
  end