sequence 0.1.0

Sign up to get free protection for your applications and to get access to all the features.
data/README.txt ADDED
@@ -0,0 +1,320 @@
1
+ Sequence provides a unified api for access to sequential data types, like
2
+ Strings, Arrays, Files, IOs, and Enumerations. Each sequence encapsulates
3
+ some data and a current position within it. Some operations apply to data
4
+ at (or relative to) the position, others are independant of position. The
5
+ api contains operations for moving the position, reading and writing data
6
+ (with or without moving the position) forward or backward from the current
7
+ position or anywhere, scanning for patterns (like StringScanner, but it
8
+ works in Files too, among others), and saving a position that will remain
9
+ valid even after data is deleted or inserted elsewhere within the
10
+ sequence.
11
+
12
+ There are also some utility classes for making sequences reversed or
13
+ circular, turning one-way sequences into two-way, buffering, and making
14
+ sequences that are subsets or aggregations of existing sequences.
15
+
16
+ Sequence is based on Eric Mahurin's Cursor library. I'd like to thank Eric
17
+ for Cursor, without which Sequence would not have existed; my design is
18
+ very much a derivative of his.
19
+
20
+
21
+ Sequences always fall into one of two broad categories: string-like and
22
+ array- like. String-like cursors contain only character data, whereas
23
+ Array-like cursors contain objects.
24
+
25
+
26
+
27
+
28
+ Stuff that's unworking at the moment is marked with **
29
+
30
+ Reading a single element:
31
+ When any of these operations fail (at beginning/end), nil is returned.
32
+ (Note: nil can also be found inside array-like cursors, so a nil result
33
+ doesn't necessarily mean you're at eof.) A name with -back means
34
+ backwards, -ahead/-behind means lookahead/lookbehind (pos not moved).
35
+
36
+ #read1 #get next element and advance position
37
+ #readback1 #get previous element and move position back
38
+ #readahead1 #get next element, leaving position alone
39
+ #readbehind1 #get previous element, leaving position alone
40
+
41
+
42
+ Read methods:
43
+ These all come in forward and backward forms, and forms that can hold the
44
+ position in place. The element sequences are passed/returned in
45
+ Array/String like things.
46
+
47
+ A note on backwards operations: some methods move the position backward
48
+ instead of forward. They operate on data immediately before the position
49
+ instead of immediately after. Data is still read or processed in normal
50
+ order. To get data in backwards order, use Sequence::Reversed.
51
+
52
+ #read(len) #read +len+ elements. leaving position after the data read
53
+ #readback(len) #read data behind current position,
54
+ #leaving position before the data read.
55
+
56
+ #readahead(len) #like read, but position is left alone.
57
+ #readbehind(len) #like readback, but position is left alone.
58
+ #read!(reverse=false) # read the remaining elements.
59
+
60
+ Numeric position methods:
61
+ The methods below deal with numeric positions (a pos) to represent the
62
+ sequence location. A non-negative number represents the number of elements
63
+ from the beginning. A negative number represents the location relative to
64
+ the end.
65
+
66
+ #pos # number of elements from the beginning (0 is at the beginning).
67
+ #pos?(p) # this checks to see if p is a valid numeric position.
68
+
69
+ #pos=(p) # Set #pos to be +p+.
70
+ #goto p #go to an absolute position; identical to #pos=
71
+
72
+ #move(len) # move len elements, relative to the current position
73
+ #and return distance moved
74
+
75
+ #move!(reverse=false) # move to end of the remaining elements
76
+ # and return distance moved
77
+ #begin! #go to beginning
78
+ #end! #go to end
79
+
80
+ #rest_size #number of data items remaining
81
+ #eof? #are we at past the end of the data,
82
+ #with no more data ever to arrive?
83
+
84
+ Sequence::Position methods:
85
+
86
+ The position methods below use a Sequence::Position to hold the position
87
+ rather than simply a numeric position. These position objects hold a
88
+ #pos, which does not change when the parent sequence's position changes.
89
+ (Also, the #pos in these objects adjust based on insertions and
90
+ deletions.)
91
+
92
+ #position(_pos=pos) #returns a Sequence::Position to
93
+ #represent the current location.
94
+ #position=(p) #C Set the position to a Position +p+ (from #position)
95
+ #position?(p) # this queries whether a particular #position +p+
96
+ #is valid (is a child or self).
97
+
98
+ #-(other) #return new Position decreased by a length or
99
+ #distance between 2 positions.
100
+ #+(len) #Returns a new #position increased by +len+
101
+ #(positive or negative).
102
+
103
+ #succ # Return a new #position for next location
104
+ # or +nil+ if we are at the end.
105
+ #pred # Return a new #position for previous location
106
+ # or +nil+ if we are at the beginning.
107
+
108
+ #begin # Return a new #position for the beginning.
109
+ #end # Return a new #position for the end.
110
+
111
+ #<=>(other) # Compare +other+ (a #position) to the current position.
112
+
113
+
114
+ Access the entire collection:
115
+ #data, #return the data underlying the sequence.
116
+ #the type of the result depends on the sequence type
117
+
118
+ #data_class #return Array if this sequence can contain any object,
119
+ #String if it contains only characters
120
+ #new_data # Return an empty String or Array,
121
+ #depending on what #data_class is
122
+
123
+
124
+ #all_data #return a String or Array
125
+ #containing all the data of the sequence
126
+
127
+ #size/length # Returns the number of elements.
128
+
129
+ #empty? # is there any data in the sequence?
130
+
131
+ #each # Performs each just to make this class Enumerable.
132
+
133
+
134
+ Random access:
135
+ #<<(elem) #append to the end
136
+
137
+ #slice/[](index) # random access to sequence data like in Array/String
138
+ #slice/[](index,len) # random access to sequence data like in Array/String
139
+ #slice/[](range) # random access to sequence data like in Array/String
140
+
141
+ Modifying data:
142
+ #slice! # slice and delete data
143
+ #[]=/modify(sliceargs,newdata) #replace an arbitrary subsequence
144
+ #with a different one
145
+
146
+ #modify has a number of special subcases:
147
+ #insert -- len is 0 (all existing element are retained)
148
+ #delete -- newdata.size is 0
149
+ #append -- insert data after end
150
+ #prepend -- insert data before start
151
+ #push/pop -- insert/delete element(s) at end
152
+ #shift/unshift -- insert/delete elements at start
153
+ #overwrite -- replacedata.size == len, no shifting needed
154
+
155
+ #subcases that use the position:
156
+ #(over)write -- overwrite after current position and move ahead
157
+ #(over)writeback -- overwrite before current position and move back
158
+ #(over)writeahead/behind -- overwrite near location without moving it
159
+ #deletebehind/#deleteahead -- delete before or after the location **
160
+ #insertbehind/#insertahead -- insert before or after the location **
161
+
162
+
163
+
164
+
165
+ Taken from/inspired by StringScanner:
166
+ See the StringScanner documentation (ri StringScanner) for a description
167
+ of these methods. I have extended StringScanner's interface to take
168
+ Strings and character literals (Integers) as well as Regexps.
169
+
170
+ About anchors:
171
+
172
+ When going forward:
173
+ \A ^ match current position
174
+ \Z $ match end of data (String, File, whatever)
175
+ When going backward:
176
+ \A ^ match beginning of data
177
+ \Z $ match current position
178
+
179
+ ^ and $ also match at line edges, as usual
180
+
181
+
182
+ My strategy is to rewrite the anchors in the regexp to make them conform
183
+ to the desired definition. For instance, \Z is replaced with (?!), unless
184
+ the last byte of the file is within the buffer to be compared against, in
185
+ which case it is left alone.
186
+
187
+ To counter the speed problem, there's a cache so the same regexp doesn't
188
+ have to be rewritten more than once.
189
+
190
+ about matchdata:
191
+ #pre_match/#post_match may not be what you expect; they are Sequences.
192
+ #offset contains numeric positions from the very beginning of the Sequence.
193
+ #anchored, forwards
194
+ #scan(pat) #if pat is right after current position,
195
+ #advance position and return what pat matched, else nil
196
+ #skip(pat) #like scan, but returns length instead of match data
197
+ #check(pat) #like scan, but doesn't move position
198
+ #match?(pat)#like scan, but returns length and doesn't move position
199
+
200
+ #unanchored, forwards
201
+ #scan_until(pat) #scan for pat somewhere after position
202
+ #(not necessarily right after)
203
+ #skip_until(pat) #skip til pattern somewhere after position
204
+ #check_until(pat) #check til pattern somewhere after position
205
+ #exist?(pat) #does pat exist somewhere after position? if so where?
206
+
207
+ #anchored, backwards
208
+ #scanback(pat) #scan for pat right before pos
209
+ #skipback(pat) #skip pat before pos
210
+ #checkback(pat) #check pat before pos
211
+ #matchback?(pat) #match pat before pos
212
+
213
+ #unanchored, backwards
214
+ #scanback_until(pat) #scan for pat somewhere before pos
215
+ #skipback_until(pat) #skip til pat somewhere before pos
216
+ #checkback_until(pat) #check for pat somewhere before pos
217
+ #existback?(pat) #does pat exist somewhere before pos? if so where?
218
+
219
+ #skip_literal
220
+ #skip_literals
221
+
222
+ #skip_until_literal **
223
+ #skip_until_literals **
224
+
225
+ #last_match
226
+
227
+ #maxmatchlen #query scan buffer size
228
+ #maxmatchlen= len #set scan buffer size
229
+
230
+ #split **
231
+
232
+
233
+ #index/#rindex
234
+
235
+ #stride **
236
+
237
+ #holding #hold current position while executing a block.
238
+ #The current pos is passed in.
239
+ #holding? #like #holding, but position is reset only if
240
+ #block returns false or nil
241
+ #holding! &block #like #holding, but block is instance_eval'd
242
+ #in the sequence.
243
+
244
+
245
+ #subseq(*args) #make a new seq out of a subrange of current seq data.
246
+
247
+ #reverse #make a new seq that reverses the order of data.
248
+
249
+ #close # Close the seq. This will also close/invalidate
250
+ #every child #position and derived sequence attached to
251
+ #this one.
252
+ #closed? # Is the seq closed?
253
+
254
+
255
+
256
+ #nearbegin(len) #is this seq within len elements of the beginning?
257
+ #nearend(len) #is this seq within len elements of the end?
258
+
259
+ #more_data? #is there any more data in the seq?
260
+ #was_data? #has any data been seen so far, or are
261
+ #we still at the beginning?
262
+
263
+
264
+ #first #return first element of data
265
+ #last #return last element of data
266
+
267
+
268
+
269
+ sequence classes:
270
+ base sequences:
271
+ Sequence #ancestor class
272
+ Indexed
273
+ OfArray #over data in an Array
274
+ OfString #over data in a String
275
+ File #over data in a File (no insert/delete)
276
+ IO (R/O) #over data in an IO (pipe/socket/tty/whatever)
277
+ Enum (R/O) #over data in an Enumeration
278
+ SingleItem #over a single scalar item
279
+ OfHash (**)
280
+ OfObjectIvars (**)
281
+ OfObjectMethods (**)
282
+
283
+ derived sequences:
284
+ Shifting #saves a copy of the base sequence data
285
+ #(in another sequence) as it is read
286
+ Buffered (**) #makes unidirectional sequences bidirectional
287
+ #(up to the buffer size)
288
+ Reversed #reverses the order of its base sequence
289
+ Circular #loops base sequence data over and over
290
+ SubSeq #extracts a contiguous subset of base sequence
291
+ #data into a new seq
292
+ List (**) #logically concatenates its base sequences
293
+ Overlay (**) #intercepts writes to the base seq
294
+ Position #an independant position within the base seq
295
+
296
+ Future:
297
+ Scanning (**)
298
+ Splitting (**)
299
+ Striding (**)
300
+ Transforming (**)
301
+ Branching (**)
302
+ LinkedList (**)
303
+ DoubleLinkedList (**)
304
+ BinaryTree (**)
305
+
306
+ Pipe (**) #a bidirectional communication channel
307
+ #with sequence api
308
+ IndexedList (**)
309
+
310
+
311
+ known problems:
312
+ Some unit tests fail
313
+ Buffered does not work at all
314
+ Shifting's modify methods don't work reliably
315
+ Some write operations failing for List
316
+ No unit tests at all for array-like sequences
317
+ (tho Reg's unit test does test OfArray at least somewhat...)
318
+
319
+ Copyright (C) 2006 Caleb Clausen
320
+ Distributed under the terms of Ruby's license.
data/Rakefile ADDED
@@ -0,0 +1,32 @@
1
+ # Copyright (C) 2006 Caleb Clausen
2
+ # Distributed under the terms of Ruby's license.
3
+ require 'rubygems'
4
+ require 'hoe'
5
+ require 'lib/sequence/version.rb'
6
+
7
+
8
+
9
+
10
+ Hoe.new("sequence", Sequence::VERSION) do |_|
11
+
12
+ _.author = "Caleb Clausen"
13
+ _.email = "sequence-owner @at@ inforadical .dot. net"
14
+ _.url = "http://sequence.rubyforge.org/"
15
+ _.summary = "A single api for reading and writing sequential data types."
16
+ _.description = <<-END
17
+ A unified wrapper api for accessing data in Strings, Arrays, Files, IOs,
18
+ and Enumerations. Each sequence encapsulates some data and a current position
19
+ within it. There are methods for moving the position, reading and writing data
20
+ (with or without moving the position) forward or backward from the current
21
+ position (or anywhere at all), scanning for patterns (like StringScanner, but
22
+ it works in Files too, among others), and saving a position that will remain
23
+ valid even after data is deleted or inserted elsewhere within the sequence.
24
+
25
+ There are also some utility classes for making sequences reversed or
26
+ circular, turning one-way sequences into two-way, buffering, and making
27
+ sequences that are subsets or aggregations of existing sequences.
28
+ END
29
+ end
30
+
31
+ # add other tasks here
32
+
data/lib/assert.rb ADDED
@@ -0,0 +1,16 @@
1
+ # Copyright (C) 2006 Caleb Clausen
2
+ # Distributed under the terms of Ruby's license.
3
+
4
+ module Kernel
5
+ def assert(expr,msg="assertion failed")
6
+ defined? $Debug and $Debug and (expr or raise msg)
7
+ end
8
+
9
+ @@printed={}
10
+ def fixme(s)
11
+ unless @@printed[s]
12
+ @@printed[s]=1
13
+ defined? $Debug and $Debug and $stderr.print "FIXME: #{s}\n"
14
+ end
15
+ end
16
+ end
@@ -0,0 +1,57 @@
1
+ # Copyright (C) 2006 Caleb Clausen
2
+ # Distributed under the terms of Ruby's license.
3
+ class Sequence
4
+ module ArrayLike
5
+ def data_class; Array end
6
+ def like; ArrayLike end
7
+
8
+ def scan(pat)
9
+ elem=nil
10
+ more_data? and holding?{pat===(elem=read1)} and return [elem]
11
+ end
12
+
13
+ def scan_until pat
14
+ i=index(pat,pos) or return
15
+ read(i-pos)+scan(pat)
16
+ end
17
+
18
+ def scanback pat
19
+ elem=nil
20
+ was_data? and holding?{pat===(elem=readback1)} and return [elem]
21
+ end
22
+
23
+ def scanback_until pat
24
+ i=rindex(pat,pos) or return
25
+ readback(pos-i)
26
+ end
27
+
28
+ #I ought to have #match and #matchback like in StringLike too
29
+
30
+ def push(*arr)
31
+ append arr
32
+ end
33
+
34
+ def unshift(*arr)
35
+ prepend arr
36
+ end
37
+
38
+ def index pat,pos=0
39
+ pos=_normalize_pos(pos)
40
+ begin
41
+ pat===(slice pos) and return pos
42
+ pos+=1
43
+ end until pos>=size
44
+ nil
45
+ end
46
+
47
+ def rindex pat,pos=-1
48
+ pos=_normalize_pos(pos)
49
+ begin
50
+ pat===(slice pos) and return pos
51
+ pos-=1
52
+ end until pos<0
53
+ nil
54
+ end
55
+
56
+ end
57
+ end
@@ -0,0 +1,188 @@
1
+ # $Id$
2
+ # Copyright (C) 2006 Caleb Clausen
3
+ # Distributed under the terms of Ruby's license.
4
+
5
+ require 'sequence'
6
+ #require 'sequence/split'
7
+
8
+ class Sequence
9
+ # This class gives unidirectional sequences (i.e. IO pipes) some
10
+ # bidirectional capabilities. An input sequence (or input IO) and/or an output
11
+ # sequence (or output IO)
12
+ # can be specified. The #position, #position?, and #position! methods are
13
+ # used to control buffering. Full sequence capability (limited by the size of the buffer
14
+ # sequence) is accessible starting from the first #position. When the end of
15
+ # the buffer is reached more data is read from the input sequence (if not nil) . When no
16
+ # #position is outstanding, everything before the buffer sequence is written
17
+ # to the output sequence (if not nil). If the sequence is attempted
18
+ # to be moved before the buffer, the output sequence is read in reverse (which
19
+ # the output sequence may not like).
20
+ #how much of that should remain true?
21
+ class Buffered < Sequence
22
+ def initialize(input,buffer_size=1024,buffer=nil)
23
+ @input = input
24
+ huh #@input used incorrectly... it should be used kinda like a read-once data store
25
+ #and Buffered should have an independant position
26
+ @buffer_size=buffer_size
27
+ @buffer = buffer||@input.new_data
28
+ # @output_pos = output_pos
29
+ @buffer_pos=@pos=@input.pos
30
+
31
+ @input.on_change_notify self
32
+ end
33
+
34
+ def change_notification
35
+ huh #invalidate (part of) @buffer if it overlaps the changed area
36
+ huh #adjust @buffer_pos as necessary
37
+ end
38
+
39
+ attr_accessor :buffer_size
40
+ attr :pos
41
+
42
+ # :stopdoc:
43
+ def new_data
44
+ @input.new_data
45
+ end
46
+ =begin
47
+ protected
48
+ def _delete1after?
49
+ v0 = @buffer.delete1after?
50
+ v0.nil? && @input && (v0 = @input.read(1)) && (v0 = v0[0])
51
+ v0
52
+ end
53
+ def _delete1before?
54
+ v0 = @buffer.delete1before?
55
+ v0.nil? && @output_pos>0 && (@output_pos -= 1;v0 = @output.read(-1)) && (v0 = v0[0])
56
+ v0
57
+ end
58
+ def _insert1before(v)
59
+ if not position?
60
+ len = @buffer.move!(true)
61
+ if @output
62
+ value = @buffer.read(len,nil)
63
+ value << v
64
+ @output.write(value)
65
+ else
66
+ @buffer.read(len,nil) if len
67
+ end
68
+ @output_pos += (len||0)+1
69
+ else
70
+ @buffer.insert1before(v)
71
+ end
72
+ true
73
+ end
74
+ def _insert1after(v)
75
+ @buffer.insert1after(v)
76
+ end
77
+ public
78
+ def close
79
+ if @output
80
+ @buffer.move!(true)
81
+ value = @buffer.read!(false) and @output.write(value)
82
+ end
83
+ super
84
+ end
85
+ =end
86
+ def _default_maxmatchlen; @buffer_size/2 end
87
+
88
+ attr :pos
89
+
90
+ def _pos=(pos)
91
+ if pos<@buffer_pos
92
+ @pos=@input.pos=pos #could raise exception, if @input doesn't support #pos=
93
+ elsif pos<=@buffer_pos+@buffer.size
94
+ @pos=pos
95
+ else #@pos > buffer_end_pos
96
+ assert @buffer_pos+@buffer.size==@input.pos
97
+ @buffer<<@input.read(pos-@input.pos)
98
+ buffer_begin_ageout!
99
+ end
100
+ end
101
+
102
+ def history_mode?(pos=@pos)
103
+ pos<@buffer_pos+@buffer.size
104
+ end
105
+
106
+ def_delegators :@input, :size
107
+
108
+ def crude_read(len)
109
+ assert @buffer_pos+@buffer.size==@input.pos
110
+ result=@input.read(len)
111
+ @buffer<<result
112
+ buffer_begin_ageout!
113
+ result
114
+ end
115
+
116
+ def crude_read_before(len)
117
+ assert @buffer_pos+@buffer.size==@input.pos
118
+ result=@input.read(len)
119
+ @buffer.insert(0,*result)
120
+ buffer_end_ageout!
121
+ result
122
+ end
123
+
124
+ def read(len)
125
+ if @pos<@buffer_pos
126
+ if @buffer_pos-@pos >= @buffer_size
127
+ @buffer_pos=@pos
128
+ @buffer=new_data
129
+ return crude_read(len)
130
+ else
131
+ crude_read_before(@buffer_pos-@pos)
132
+ self._pos=@buffer_pos
133
+
134
+ #fall thru
135
+ end
136
+ end
137
+
138
+ if history_mode?
139
+ if history_mode?(pos+len-1)
140
+ result=@buffer[@pos-@buffer_pos,len]
141
+ @pos+=len
142
+ else
143
+ result=@buffer[@pos-@buffer.pos..0]
144
+ result<<crude_read(len-result.size)
145
+ end
146
+
147
+ result
148
+ else
149
+ crude_read len
150
+ end
151
+ end
152
+
153
+ def buffer_begin_ageout!
154
+ diff=@buffer.size-@buffer_size
155
+ if diff>0
156
+ @buffer.slice!(0,diff)
157
+ @buffer_pos+=diff
158
+ end
159
+ end
160
+
161
+ def buffer_end_ageout!
162
+ diff=@buffer.size-@buffer_size
163
+ if diff>0
164
+ @buffer.slice!(-diff..-1)
165
+ end
166
+ end
167
+
168
+ def modify(*args)
169
+ huh "what does it mean to write to a Buffered?"
170
+ repldata=args.pop
171
+ first,len,only1=_parse_slice_args(*args)
172
+ first<pos-@buffer.size and huh
173
+ result=new_data
174
+ if first<pos
175
+ result=@buffer[huh]
176
+ end
177
+ if first+len>pos
178
+ huh
179
+ end
180
+ huh
181
+ end
182
+
183
+
184
+ # :startdoc:
185
+ end
186
+ end
187
+
188
+