sequence 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/README.txt ADDED
@@ -0,0 +1,320 @@
1
+ Sequence provides a unified api for access to sequential data types, like
2
+ Strings, Arrays, Files, IOs, and Enumerations. Each sequence encapsulates
3
+ some data and a current position within it. Some operations apply to data
4
+ at (or relative to) the position, others are independant of position. The
5
+ api contains operations for moving the position, reading and writing data
6
+ (with or without moving the position) forward or backward from the current
7
+ position or anywhere, scanning for patterns (like StringScanner, but it
8
+ works in Files too, among others), and saving a position that will remain
9
+ valid even after data is deleted or inserted elsewhere within the
10
+ sequence.
11
+
12
+ There are also some utility classes for making sequences reversed or
13
+ circular, turning one-way sequences into two-way, buffering, and making
14
+ sequences that are subsets or aggregations of existing sequences.
15
+
16
+ Sequence is based on Eric Mahurin's Cursor library. I'd like to thank Eric
17
+ for Cursor, without which Sequence would not have existed; my design is
18
+ very much a derivative of his.
19
+
20
+
21
+ Sequences always fall into one of two broad categories: string-like and
22
+ array- like. String-like cursors contain only character data, whereas
23
+ Array-like cursors contain objects.
24
+
25
+
26
+
27
+
28
+ Stuff that's unworking at the moment is marked with **
29
+
30
+ Reading a single element:
31
+ When any of these operations fail (at beginning/end), nil is returned.
32
+ (Note: nil can also be found inside array-like cursors, so a nil result
33
+ doesn't necessarily mean you're at eof.) A name with -back means
34
+ backwards, -ahead/-behind means lookahead/lookbehind (pos not moved).
35
+
36
+ #read1 #get next element and advance position
37
+ #readback1 #get previous element and move position back
38
+ #readahead1 #get next element, leaving position alone
39
+ #readbehind1 #get previous element, leaving position alone
40
+
41
+
42
+ Read methods:
43
+ These all come in forward and backward forms, and forms that can hold the
44
+ position in place. The element sequences are passed/returned in
45
+ Array/String like things.
46
+
47
+ A note on backwards operations: some methods move the position backward
48
+ instead of forward. They operate on data immediately before the position
49
+ instead of immediately after. Data is still read or processed in normal
50
+ order. To get data in backwards order, use Sequence::Reversed.
51
+
52
+ #read(len) #read +len+ elements. leaving position after the data read
53
+ #readback(len) #read data behind current position,
54
+ #leaving position before the data read.
55
+
56
+ #readahead(len) #like read, but position is left alone.
57
+ #readbehind(len) #like readback, but position is left alone.
58
+ #read!(reverse=false) # read the remaining elements.
59
+
60
+ Numeric position methods:
61
+ The methods below deal with numeric positions (a pos) to represent the
62
+ sequence location. A non-negative number represents the number of elements
63
+ from the beginning. A negative number represents the location relative to
64
+ the end.
65
+
66
+ #pos # number of elements from the beginning (0 is at the beginning).
67
+ #pos?(p) # this checks to see if p is a valid numeric position.
68
+
69
+ #pos=(p) # Set #pos to be +p+.
70
+ #goto p #go to an absolute position; identical to #pos=
71
+
72
+ #move(len) # move len elements, relative to the current position
73
+ #and return distance moved
74
+
75
+ #move!(reverse=false) # move to end of the remaining elements
76
+ # and return distance moved
77
+ #begin! #go to beginning
78
+ #end! #go to end
79
+
80
+ #rest_size #number of data items remaining
81
+ #eof? #are we at past the end of the data,
82
+ #with no more data ever to arrive?
83
+
84
+ Sequence::Position methods:
85
+
86
+ The position methods below use a Sequence::Position to hold the position
87
+ rather than simply a numeric position. These position objects hold a
88
+ #pos, which does not change when the parent sequence's position changes.
89
+ (Also, the #pos in these objects adjust based on insertions and
90
+ deletions.)
91
+
92
+ #position(_pos=pos) #returns a Sequence::Position to
93
+ #represent the current location.
94
+ #position=(p) #C Set the position to a Position +p+ (from #position)
95
+ #position?(p) # this queries whether a particular #position +p+
96
+ #is valid (is a child or self).
97
+
98
+ #-(other) #return new Position decreased by a length or
99
+ #distance between 2 positions.
100
+ #+(len) #Returns a new #position increased by +len+
101
+ #(positive or negative).
102
+
103
+ #succ # Return a new #position for next location
104
+ # or +nil+ if we are at the end.
105
+ #pred # Return a new #position for previous location
106
+ # or +nil+ if we are at the beginning.
107
+
108
+ #begin # Return a new #position for the beginning.
109
+ #end # Return a new #position for the end.
110
+
111
+ #<=>(other) # Compare +other+ (a #position) to the current position.
112
+
113
+
114
+ Access the entire collection:
115
+ #data, #return the data underlying the sequence.
116
+ #the type of the result depends on the sequence type
117
+
118
+ #data_class #return Array if this sequence can contain any object,
119
+ #String if it contains only characters
120
+ #new_data # Return an empty String or Array,
121
+ #depending on what #data_class is
122
+
123
+
124
+ #all_data #return a String or Array
125
+ #containing all the data of the sequence
126
+
127
+ #size/length # Returns the number of elements.
128
+
129
+ #empty? # is there any data in the sequence?
130
+
131
+ #each # Performs each just to make this class Enumerable.
132
+
133
+
134
+ Random access:
135
+ #<<(elem) #append to the end
136
+
137
+ #slice/[](index) # random access to sequence data like in Array/String
138
+ #slice/[](index,len) # random access to sequence data like in Array/String
139
+ #slice/[](range) # random access to sequence data like in Array/String
140
+
141
+ Modifying data:
142
+ #slice! # slice and delete data
143
+ #[]=/modify(sliceargs,newdata) #replace an arbitrary subsequence
144
+ #with a different one
145
+
146
+ #modify has a number of special subcases:
147
+ #insert -- len is 0 (all existing element are retained)
148
+ #delete -- newdata.size is 0
149
+ #append -- insert data after end
150
+ #prepend -- insert data before start
151
+ #push/pop -- insert/delete element(s) at end
152
+ #shift/unshift -- insert/delete elements at start
153
+ #overwrite -- replacedata.size == len, no shifting needed
154
+
155
+ #subcases that use the position:
156
+ #(over)write -- overwrite after current position and move ahead
157
+ #(over)writeback -- overwrite before current position and move back
158
+ #(over)writeahead/behind -- overwrite near location without moving it
159
+ #deletebehind/#deleteahead -- delete before or after the location **
160
+ #insertbehind/#insertahead -- insert before or after the location **
161
+
162
+
163
+
164
+
165
+ Taken from/inspired by StringScanner:
166
+ See the StringScanner documentation (ri StringScanner) for a description
167
+ of these methods. I have extended StringScanner's interface to take
168
+ Strings and character literals (Integers) as well as Regexps.
169
+
170
+ About anchors:
171
+
172
+ When going forward:
173
+ \A ^ match current position
174
+ \Z $ match end of data (String, File, whatever)
175
+ When going backward:
176
+ \A ^ match beginning of data
177
+ \Z $ match current position
178
+
179
+ ^ and $ also match at line edges, as usual
180
+
181
+
182
+ My strategy is to rewrite the anchors in the regexp to make them conform
183
+ to the desired definition. For instance, \Z is replaced with (?!), unless
184
+ the last byte of the file is within the buffer to be compared against, in
185
+ which case it is left alone.
186
+
187
+ To counter the speed problem, there's a cache so the same regexp doesn't
188
+ have to be rewritten more than once.
189
+
190
+ about matchdata:
191
+ #pre_match/#post_match may not be what you expect; they are Sequences.
192
+ #offset contains numeric positions from the very beginning of the Sequence.
193
+ #anchored, forwards
194
+ #scan(pat) #if pat is right after current position,
195
+ #advance position and return what pat matched, else nil
196
+ #skip(pat) #like scan, but returns length instead of match data
197
+ #check(pat) #like scan, but doesn't move position
198
+ #match?(pat)#like scan, but returns length and doesn't move position
199
+
200
+ #unanchored, forwards
201
+ #scan_until(pat) #scan for pat somewhere after position
202
+ #(not necessarily right after)
203
+ #skip_until(pat) #skip til pattern somewhere after position
204
+ #check_until(pat) #check til pattern somewhere after position
205
+ #exist?(pat) #does pat exist somewhere after position? if so where?
206
+
207
+ #anchored, backwards
208
+ #scanback(pat) #scan for pat right before pos
209
+ #skipback(pat) #skip pat before pos
210
+ #checkback(pat) #check pat before pos
211
+ #matchback?(pat) #match pat before pos
212
+
213
+ #unanchored, backwards
214
+ #scanback_until(pat) #scan for pat somewhere before pos
215
+ #skipback_until(pat) #skip til pat somewhere before pos
216
+ #checkback_until(pat) #check for pat somewhere before pos
217
+ #existback?(pat) #does pat exist somewhere before pos? if so where?
218
+
219
+ #skip_literal
220
+ #skip_literals
221
+
222
+ #skip_until_literal **
223
+ #skip_until_literals **
224
+
225
+ #last_match
226
+
227
+ #maxmatchlen #query scan buffer size
228
+ #maxmatchlen= len #set scan buffer size
229
+
230
+ #split **
231
+
232
+
233
+ #index/#rindex
234
+
235
+ #stride **
236
+
237
+ #holding #hold current position while executing a block.
238
+ #The current pos is passed in.
239
+ #holding? #like #holding, but position is reset only if
240
+ #block returns false or nil
241
+ #holding! &block #like #holding, but block is instance_eval'd
242
+ #in the sequence.
243
+
244
+
245
+ #subseq(*args) #make a new seq out of a subrange of current seq data.
246
+
247
+ #reverse #make a new seq that reverses the order of data.
248
+
249
+ #close # Close the seq. This will also close/invalidate
250
+ #every child #position and derived sequence attached to
251
+ #this one.
252
+ #closed? # Is the seq closed?
253
+
254
+
255
+
256
+ #nearbegin(len) #is this seq within len elements of the beginning?
257
+ #nearend(len) #is this seq within len elements of the end?
258
+
259
+ #more_data? #is there any more data in the seq?
260
+ #was_data? #has any data been seen so far, or are
261
+ #we still at the beginning?
262
+
263
+
264
+ #first #return first element of data
265
+ #last #return last element of data
266
+
267
+
268
+
269
+ sequence classes:
270
+ base sequences:
271
+ Sequence #ancestor class
272
+ Indexed
273
+ OfArray #over data in an Array
274
+ OfString #over data in a String
275
+ File #over data in a File (no insert/delete)
276
+ IO (R/O) #over data in an IO (pipe/socket/tty/whatever)
277
+ Enum (R/O) #over data in an Enumeration
278
+ SingleItem #over a single scalar item
279
+ OfHash (**)
280
+ OfObjectIvars (**)
281
+ OfObjectMethods (**)
282
+
283
+ derived sequences:
284
+ Shifting #saves a copy of the base sequence data
285
+ #(in another sequence) as it is read
286
+ Buffered (**) #makes unidirectional sequences bidirectional
287
+ #(up to the buffer size)
288
+ Reversed #reverses the order of its base sequence
289
+ Circular #loops base sequence data over and over
290
+ SubSeq #extracts a contiguous subset of base sequence
291
+ #data into a new seq
292
+ List (**) #logically concatenates its base sequences
293
+ Overlay (**) #intercepts writes to the base seq
294
+ Position #an independant position within the base seq
295
+
296
+ Future:
297
+ Scanning (**)
298
+ Splitting (**)
299
+ Striding (**)
300
+ Transforming (**)
301
+ Branching (**)
302
+ LinkedList (**)
303
+ DoubleLinkedList (**)
304
+ BinaryTree (**)
305
+
306
+ Pipe (**) #a bidirectional communication channel
307
+ #with sequence api
308
+ IndexedList (**)
309
+
310
+
311
+ known problems:
312
+ Some unit tests fail
313
+ Buffered does not work at all
314
+ Shifting's modify methods don't work reliably
315
+ Some write operations failing for List
316
+ No unit tests at all for array-like sequences
317
+ (tho Reg's unit test does test OfArray at least somewhat...)
318
+
319
+ Copyright (C) 2006 Caleb Clausen
320
+ Distributed under the terms of Ruby's license.
data/Rakefile ADDED
@@ -0,0 +1,32 @@
1
+ # Copyright (C) 2006 Caleb Clausen
2
+ # Distributed under the terms of Ruby's license.
3
+ require 'rubygems'
4
+ require 'hoe'
5
+ require 'lib/sequence/version.rb'
6
+
7
+
8
+
9
+
10
+ Hoe.new("sequence", Sequence::VERSION) do |_|
11
+
12
+ _.author = "Caleb Clausen"
13
+ _.email = "sequence-owner @at@ inforadical .dot. net"
14
+ _.url = "http://sequence.rubyforge.org/"
15
+ _.summary = "A single api for reading and writing sequential data types."
16
+ _.description = <<-END
17
+ A unified wrapper api for accessing data in Strings, Arrays, Files, IOs,
18
+ and Enumerations. Each sequence encapsulates some data and a current position
19
+ within it. There are methods for moving the position, reading and writing data
20
+ (with or without moving the position) forward or backward from the current
21
+ position (or anywhere at all), scanning for patterns (like StringScanner, but
22
+ it works in Files too, among others), and saving a position that will remain
23
+ valid even after data is deleted or inserted elsewhere within the sequence.
24
+
25
+ There are also some utility classes for making sequences reversed or
26
+ circular, turning one-way sequences into two-way, buffering, and making
27
+ sequences that are subsets or aggregations of existing sequences.
28
+ END
29
+ end
30
+
31
+ # add other tasks here
32
+
data/lib/assert.rb ADDED
@@ -0,0 +1,16 @@
1
+ # Copyright (C) 2006 Caleb Clausen
2
+ # Distributed under the terms of Ruby's license.
3
+
4
+ module Kernel
5
+ def assert(expr,msg="assertion failed")
6
+ defined? $Debug and $Debug and (expr or raise msg)
7
+ end
8
+
9
+ @@printed={}
10
+ def fixme(s)
11
+ unless @@printed[s]
12
+ @@printed[s]=1
13
+ defined? $Debug and $Debug and $stderr.print "FIXME: #{s}\n"
14
+ end
15
+ end
16
+ end
@@ -0,0 +1,57 @@
1
+ # Copyright (C) 2006 Caleb Clausen
2
+ # Distributed under the terms of Ruby's license.
3
+ class Sequence
4
+ module ArrayLike
5
+ def data_class; Array end
6
+ def like; ArrayLike end
7
+
8
+ def scan(pat)
9
+ elem=nil
10
+ more_data? and holding?{pat===(elem=read1)} and return [elem]
11
+ end
12
+
13
+ def scan_until pat
14
+ i=index(pat,pos) or return
15
+ read(i-pos)+scan(pat)
16
+ end
17
+
18
+ def scanback pat
19
+ elem=nil
20
+ was_data? and holding?{pat===(elem=readback1)} and return [elem]
21
+ end
22
+
23
+ def scanback_until pat
24
+ i=rindex(pat,pos) or return
25
+ readback(pos-i)
26
+ end
27
+
28
+ #I ought to have #match and #matchback like in StringLike too
29
+
30
+ def push(*arr)
31
+ append arr
32
+ end
33
+
34
+ def unshift(*arr)
35
+ prepend arr
36
+ end
37
+
38
+ def index pat,pos=0
39
+ pos=_normalize_pos(pos)
40
+ begin
41
+ pat===(slice pos) and return pos
42
+ pos+=1
43
+ end until pos>=size
44
+ nil
45
+ end
46
+
47
+ def rindex pat,pos=-1
48
+ pos=_normalize_pos(pos)
49
+ begin
50
+ pat===(slice pos) and return pos
51
+ pos-=1
52
+ end until pos<0
53
+ nil
54
+ end
55
+
56
+ end
57
+ end
@@ -0,0 +1,188 @@
1
+ # $Id$
2
+ # Copyright (C) 2006 Caleb Clausen
3
+ # Distributed under the terms of Ruby's license.
4
+
5
+ require 'sequence'
6
+ #require 'sequence/split'
7
+
8
+ class Sequence
9
+ # This class gives unidirectional sequences (i.e. IO pipes) some
10
+ # bidirectional capabilities. An input sequence (or input IO) and/or an output
11
+ # sequence (or output IO)
12
+ # can be specified. The #position, #position?, and #position! methods are
13
+ # used to control buffering. Full sequence capability (limited by the size of the buffer
14
+ # sequence) is accessible starting from the first #position. When the end of
15
+ # the buffer is reached more data is read from the input sequence (if not nil) . When no
16
+ # #position is outstanding, everything before the buffer sequence is written
17
+ # to the output sequence (if not nil). If the sequence is attempted
18
+ # to be moved before the buffer, the output sequence is read in reverse (which
19
+ # the output sequence may not like).
20
+ #how much of that should remain true?
21
+ class Buffered < Sequence
22
+ def initialize(input,buffer_size=1024,buffer=nil)
23
+ @input = input
24
+ huh #@input used incorrectly... it should be used kinda like a read-once data store
25
+ #and Buffered should have an independant position
26
+ @buffer_size=buffer_size
27
+ @buffer = buffer||@input.new_data
28
+ # @output_pos = output_pos
29
+ @buffer_pos=@pos=@input.pos
30
+
31
+ @input.on_change_notify self
32
+ end
33
+
34
+ def change_notification
35
+ huh #invalidate (part of) @buffer if it overlaps the changed area
36
+ huh #adjust @buffer_pos as necessary
37
+ end
38
+
39
+ attr_accessor :buffer_size
40
+ attr :pos
41
+
42
+ # :stopdoc:
43
+ def new_data
44
+ @input.new_data
45
+ end
46
+ =begin
47
+ protected
48
+ def _delete1after?
49
+ v0 = @buffer.delete1after?
50
+ v0.nil? && @input && (v0 = @input.read(1)) && (v0 = v0[0])
51
+ v0
52
+ end
53
+ def _delete1before?
54
+ v0 = @buffer.delete1before?
55
+ v0.nil? && @output_pos>0 && (@output_pos -= 1;v0 = @output.read(-1)) && (v0 = v0[0])
56
+ v0
57
+ end
58
+ def _insert1before(v)
59
+ if not position?
60
+ len = @buffer.move!(true)
61
+ if @output
62
+ value = @buffer.read(len,nil)
63
+ value << v
64
+ @output.write(value)
65
+ else
66
+ @buffer.read(len,nil) if len
67
+ end
68
+ @output_pos += (len||0)+1
69
+ else
70
+ @buffer.insert1before(v)
71
+ end
72
+ true
73
+ end
74
+ def _insert1after(v)
75
+ @buffer.insert1after(v)
76
+ end
77
+ public
78
+ def close
79
+ if @output
80
+ @buffer.move!(true)
81
+ value = @buffer.read!(false) and @output.write(value)
82
+ end
83
+ super
84
+ end
85
+ =end
86
+ def _default_maxmatchlen; @buffer_size/2 end
87
+
88
+ attr :pos
89
+
90
+ def _pos=(pos)
91
+ if pos<@buffer_pos
92
+ @pos=@input.pos=pos #could raise exception, if @input doesn't support #pos=
93
+ elsif pos<=@buffer_pos+@buffer.size
94
+ @pos=pos
95
+ else #@pos > buffer_end_pos
96
+ assert @buffer_pos+@buffer.size==@input.pos
97
+ @buffer<<@input.read(pos-@input.pos)
98
+ buffer_begin_ageout!
99
+ end
100
+ end
101
+
102
+ def history_mode?(pos=@pos)
103
+ pos<@buffer_pos+@buffer.size
104
+ end
105
+
106
+ def_delegators :@input, :size
107
+
108
+ def crude_read(len)
109
+ assert @buffer_pos+@buffer.size==@input.pos
110
+ result=@input.read(len)
111
+ @buffer<<result
112
+ buffer_begin_ageout!
113
+ result
114
+ end
115
+
116
+ def crude_read_before(len)
117
+ assert @buffer_pos+@buffer.size==@input.pos
118
+ result=@input.read(len)
119
+ @buffer.insert(0,*result)
120
+ buffer_end_ageout!
121
+ result
122
+ end
123
+
124
+ def read(len)
125
+ if @pos<@buffer_pos
126
+ if @buffer_pos-@pos >= @buffer_size
127
+ @buffer_pos=@pos
128
+ @buffer=new_data
129
+ return crude_read(len)
130
+ else
131
+ crude_read_before(@buffer_pos-@pos)
132
+ self._pos=@buffer_pos
133
+
134
+ #fall thru
135
+ end
136
+ end
137
+
138
+ if history_mode?
139
+ if history_mode?(pos+len-1)
140
+ result=@buffer[@pos-@buffer_pos,len]
141
+ @pos+=len
142
+ else
143
+ result=@buffer[@pos-@buffer.pos..0]
144
+ result<<crude_read(len-result.size)
145
+ end
146
+
147
+ result
148
+ else
149
+ crude_read len
150
+ end
151
+ end
152
+
153
+ def buffer_begin_ageout!
154
+ diff=@buffer.size-@buffer_size
155
+ if diff>0
156
+ @buffer.slice!(0,diff)
157
+ @buffer_pos+=diff
158
+ end
159
+ end
160
+
161
+ def buffer_end_ageout!
162
+ diff=@buffer.size-@buffer_size
163
+ if diff>0
164
+ @buffer.slice!(-diff..-1)
165
+ end
166
+ end
167
+
168
+ def modify(*args)
169
+ huh "what does it mean to write to a Buffered?"
170
+ repldata=args.pop
171
+ first,len,only1=_parse_slice_args(*args)
172
+ first<pos-@buffer.size and huh
173
+ result=new_data
174
+ if first<pos
175
+ result=@buffer[huh]
176
+ end
177
+ if first+len>pos
178
+ huh
179
+ end
180
+ huh
181
+ end
182
+
183
+
184
+ # :startdoc:
185
+ end
186
+ end
187
+
188
+