reg 0.4.6

Sign up to get free protection for your applications and to get access to all the features.
data/README ADDED
@@ -0,0 +1,404 @@
1
+ Reg is a library for pattern matching in ruby data structures. Reg provides
2
+ Regexp-like match and match-and-replace for all data structures (particularly
3
+ Arrays, Objects, and Hashes), not just Strings.
4
+
5
+ Reg is best thought of in analogy to regular expressions; Regexps are special
6
+ data structures for matching Strings; Regs are special data structures for
7
+ matching ANY type of ruby data (Strings included, using Regexps).
8
+
9
+ If you have any questions, comments, problems, new feature requests, or just
10
+ want to figure out how to make it work for what you need to do, contact me:
11
+ reg _at_ inforadical _dot_ net
12
+
13
+ Reg is a RubyForge project. RubyForge is another good place to send your
14
+ bug reports or whatever: http://rubyforge.org/projects/reg/
15
+
16
+ (There aren't any bug filed against Reg there yet, but don't be afraid
17
+ that your report will get lonely.)
18
+
19
+
20
+
21
+ The implementation:
22
+ The engine (according to what I can tell from Friedl's book,
23
+ _Mastering_Regular_Expressions_,) is a traditional DFA with non-greedy alternation.
24
+ For performance, I'd like to move to a more NFA-oriented approach (trying many
25
+ different alternatives in parallel).
26
+
27
+ Status:
28
+ The only real (public) matching operator implemented thus far is:
29
+ Reg::Reg#=== (and descendants). It doesn't return a normalized boolean;
30
+ it will return a false value on no match or a true val if there was a match
31
+ but beyond that, nothing is guaranteed.
32
+
33
+ A number of important features are unimplemented at this point, most notably
34
+ backreferences and substitutions.
35
+
36
+ The backtracking engine appears to be completely functional now. Vector
37
+ Reg::And doesn't work.
38
+
39
+
40
+ This table compares syntax of Reg and Regexp for various constructs. Keep
41
+ in mind that all Regs are ordinary ruby expressions. The special syntax
42
+ is acheived by overriding ruby operators.
43
+
44
+ In the following examples,
45
+ re,re1,re2 represent arbitrary regexp subexpressions,
46
+ r,r1,r2 represent arbitrary reg subexpressions
47
+ s,t represent any single character (perhaps appropriately escaped, if the char is magical)
48
+
49
+ reg regexp reg class #description
50
+
51
+ +[r1,r2,r3] /re1re2re3/ Reg::Array #sequence
52
+ -[r1,r2] (re1re2) Reg::Subseq #subsequence
53
+ r.lit \re Reg::Literal #escaping a magical
54
+ regproc{r} #{re} (not really named) #dynamic inclusion
55
+ r1|r2 or :OR (re1|re2) or [st] Reg::Or #alternation
56
+ ~r [^s] Reg::Not #negation (for scalar r and s)
57
+ r.* re* Reg::Repeat #zero or more matches
58
+ r.+ re+ Reg::Repeat #one or more matches
59
+ r.- re? Reg::Repeat #zero or one matches
60
+ r*n re{n} Reg::Repeat #exactly n matches
61
+ r*(n..m) re{n,m} Reg::Repeat #at least n, at most m matches
62
+ r-n re{n,} Reg::Repeat #at most n matches
63
+ r+m re{,m} Reg::Repeat #at least m matches
64
+ OB . Reg::Any #a single item
65
+ OBS .* Reg::AnyMultiple #zero or more items
66
+ BR[1,2] \1,\2 Reg::Backref #backreference ***
67
+ r>>x or sub sub,gsub Reg::Transform #search and replace ***
68
+
69
+
70
+ here are features of reg that don't have an equivalent in regexp
71
+ r.la Reg::Lookahead #lookahead ***
72
+ ~-[] Reg::Lookahead #subsequence negation w/lookahead ***
73
+ & or :AND Reg::And #all alternatives match
74
+ ^ or :XOR Reg::Xor #exactly one of alternatives matches
75
+ +{r1=>r2} Reg::Hash #hash matcher
76
+ -{name=>r} Reg::Object #object matcher
77
+ obj.reg Reg::Fixed #turn any ruby object into a reg that matches if obj.=== succeeds
78
+ /re/.sym Reg::Symbol #a symbol regex
79
+ item_that{|x|rcode} Reg::ItemThat #a proc{} that responds to === by invoking the proc's call
80
+ OBS as un-anchor Reg::AnyMultiple #opposite of ^ and $ when placed at edges of a reg array (kinda cheesy)
81
+ name=r (just a var assign) #named subexpressions
82
+ Reg::var Reg::Variable #recursive matches via Reg::Variable & Reg::Constants
83
+ Reg::const Reg::Constant
84
+
85
+
86
+ *** = not implemented yet.
87
+
88
+ "... the effect of drinking a Pan Galactic Gargle Blaster is like having
89
+ your brains smashed out by a slice of lemon wrapped round a large gold
90
+ brick."
91
+ -- Douglas Adams, _Hitchhiker's_Guide_to_the_Galaxy_
92
+
93
+ Reg is kind of hard to bend your brain around, so here are some examples:
94
+
95
+ 0.4.6 examples:
96
+
97
+ Matches a single item whose method 'length' returns a Fixnum:
98
+ item_that.length.is_a? Fixnum
99
+
100
+ There's a new way to match hashes; it looks more-or-less like the old way and
101
+ behaves a little differently. The old type of hash matcher (now called an
102
+ unordered hash matcher) looked like:
103
+
104
+ +{/fo+/=>8, /ba+r/=>9}
105
+
106
+ The new syntax uses +[] instead of +{} and ** instead of =>. It's called an
107
+ ordered hash matcher. The order of filter pairs given in an ordered matcher
108
+ is the order comparisons are done in. The same is not true within unordered
109
+ matchers, where order is inferred from the nature of the key matchers. The
110
+ ordered equivalent of the last example is:
111
+
112
+ +[/fo+/**8, /ba+r/**9]
113
+
114
+ Both match hashes whose keys match /fo+/ with value of 8 or match /ba+r/ with
115
+ value of 9 (and nothing else). But if the data looks like: {"foobar"=>8},
116
+ then it is guaranteed to match the second (because /fo+/ is always given a
117
+ chance first), but might or might not match the first (because the order is
118
+ unspecified).
119
+
120
+ Here's an example of a Reg::Knows matcher, which matches objects that have the
121
+ #slice method:
122
+ -:slice
123
+
124
+ 0.4.5 examples:
125
+
126
+ Matches array containing exactly 2 elements; 1st is another array, 2nd is
127
+ integer:
128
+ +[Array,Integer]
129
+
130
+ Like above, but 1st is array of arrays of symbol
131
+ +[+[+[Symbol+0]+0],Integer]
132
+
133
+ Matches array of at least 3 consecutive symbols and nothing else:
134
+ +[Symbol+3]
135
+
136
+ Matches array with at least 3 symbols in it somewhere:
137
+ +[OBS, Symbol+3, OBS]
138
+
139
+ Matches array of at most 6 strings starting with 'g'
140
+ +[/^g/-6] #no .reg necessary for regexp
141
+
142
+ Matches array of between 5 and 9 hashes containing a key :k pointing to
143
+ something non-nil:
144
+ +[ +{:k=>~nil.reg}*(5..9) ]
145
+
146
+ Matches an object with Integer instance variable @k and property (ie method)
147
+ foobar that returns a string with 'baz' somewhere in it:
148
+ -{:@k=>Integer, :foobar=>/baz/}
149
+
150
+ Matches array of 6 hashes with 6 as a value of every key, followed by
151
+ 18 objects with an attribute @s which is a String:
152
+ +[ +{OB=>6}*6, -{:@s=>String}*18 ]
153
+
154
+
155
+
156
+ Api changes since 0.4.5:
157
+ Reg::Hash semantics have been changing recently.... Reg::Object may be changed
158
+ to suit.
159
+
160
+ Api changes since 0.4.0:
161
+ Reg() makes Reg::Arrays now, not hash matchers; use Rah to make hashes.
162
+ Array#reg and Hash#reg no longer return a Reg::Array or Reg::Hash. In fact
163
+ the names of most classes have changed; they've been moved into the Reg
164
+ namespace (aka module). The previous Reg module is now named Reg::Reg. The
165
+ other names have changed in the obvious way. RegArray is now Reg::Array, etc.
166
+ For the most part these changes don't affect users (if any) because they
167
+ leave the shortest representation (the mini-language) unaffected. The one
168
+ exception (where you have to refer to a reg module name) is the name of the
169
+ module Reg, which is now Reg::Reg. If anyone has 'include Reg' in their
170
+ class or module to get all of Reg's yummy operators, look out it's changed
171
+ to 'include Reg::Reg' instead. Aliases are mostly provided from the new to the
172
+ old class names... but an alias from Reg to Reg::Reg obviously creates
173
+ a conflict.
174
+
175
+
176
+
177
+ the api (mostly unimplemented):
178
+ r represents a reg
179
+ t represents a transform
180
+ o represents an object
181
+ a represents an array
182
+ s represents a string
183
+ h represents a hash
184
+ scan represents the entire stringscanner interface...
185
+ -(scan,skip,match?,check and their unanchored and backward forms)
186
+ c represents a cursor
187
+ ! implies in-place modification
188
+
189
+ r===o #v
190
+ r=~o #v
191
+ sach=~r #v-
192
+ r.match o #result contains changes
193
+ r.match! o
194
+ c.sub!(r[,t]) #in-place modification only with cursors
195
+ c.gsub!(r[,t])
196
+ oah.sub(r[,t]) #modifies in result
197
+ oah.gsub(r[,t])
198
+ oah.sub!(r[,t]) #inplace modify
199
+ oah.gsub!(r[,t])
200
+ a.scan(r) #modifies in result
201
+
202
+ c.index/rindex r #use exist?/existback?...?
203
+ c.slice! r #modifies in result
204
+ c.split r
205
+ c.find_all r #like String#scan
206
+ c.find r
207
+ ho.find_all [r-key,] r-value
208
+ ho.find [r-key,] r-value
209
+
210
+ ho.index r
211
+ a.split r
212
+ s.find_all r
213
+ s.find r
214
+ s.delete r
215
+ s.delete! r
216
+ s.delete_all r
217
+ s.delete_all! r
218
+
219
+ #these require wrapping library methods to also take different args
220
+ a.slice r
221
+ ahoc.slice! r
222
+ o=~r
223
+ oahc[r]
224
+ oahc[r]=t
225
+ c.scan(r)
226
+ a.find_all r
227
+ a.find r
228
+
229
+ #i'd like to have these, but they can't safely be wrapped,
230
+ #so i'll have to think of different names.
231
+ as.index/rindex r #=> offset/roffset ...use exist?/existback? instead
232
+ s.slice r #=> rslice
233
+ s.slice! r
234
+ s.split r #=> rsplit
235
+ s[r] #=> s-[r]
236
+ s[r]=t #=> s-[r,t]
237
+ s.sub(r[,t]) #=> rsub
238
+ s.gsub(r[,t]) #=> grsub
239
+ s.sub!(r[,t]) #etc
240
+ s.gsub!(r[,t])
241
+ s.scan(r) #=> rscan... note scan only conflicts; the rest of the stringscanner interface
242
+ # can be unchanged.
243
+
244
+ #maybe stuff from Enumerable?
245
+
246
+ Reg::Progress work list:
247
+
248
+ phase 1: array only
249
+ fill out backtrack
250
+ import asserts from backtrace=>backtrack
251
+ disable backtrace
252
+ backtrack should respect update_di
253
+ callers of backtrace must use a progress instead
254
+ call backtrack on progress instead of backtrace...
255
+ matchsets unmodified as yet (ok, except repeat and subseq matchsets)
256
+ push_match and push_matchset need to be called in right places in Reg::Array (what else?)
257
+ note which parts of regarray.rb have been obsoleted by regprogress.rb
258
+
259
+ phase 2:
260
+ eventually, MatchSet#next_match will take a cursor parameter, and return a number of items consumed or progress or nil
261
+ entering some types of subreg creates a subprogress
262
+ arrange for process_deferreds to be called in the right places
263
+ create Reg::Bound (for vars) and Reg::SideEffect, Reg::Undo, Reg::Eventually with sugar
264
+ -Reg#bind, Reg#side_effect, Reg#undo, Reg#eventually
265
+ -and of course Reg::Transform and Reg::Replace
266
+ -Reg::Reg#>>(Reg::Replace) makes a Transform, and certain things can mix in module Replace
267
+ should Reg::Bound be a module?
268
+ should Reg::Bound be a Deferred?
269
+ Reg::Transform calls Reg::Progress#eventually
270
+ implicit progress needs to be made when doing standalone compare of
271
+ -Reg::Object, Reg::Hash, Reg::Array, Reg::logicals, Reg::Bound, Reg::Transform, maybe others
272
+
273
+ these are stubbed at least now:
274
+ Backtrace.clean_result and Backtrace.check_result should operate on progresses instead
275
+ need Reg::Progress#bt_match,last_next_match,to_result,check_result,clean_result
276
+ need Reg::Progress#deep_copy for use in repeat and subseq matchsets
277
+ need MatchSet#clean_result which delegates to the internal Progress, if any
278
+ rewrite repeat and subseq to use progress internally? (in progress only...)
279
+ Reg::(and,repeat,subseq,array) require progress help
280
+
281
+
282
+ varieties of Reg::Replace:
283
+ Reg::Backref and Reg::Bound
284
+ Reg::RepProc
285
+ Reg::ItemThat
286
+ Reg::Fixed
287
+ Object (as if Reg::Fixed)
288
+ Reg::Array and Reg::Subseq?
289
+ Array (as if Reg::Array)
290
+ Reg::Transform?
291
+
292
+
293
+
294
+
295
+
296
+ not implemented yet:
297
+ Reg::Anchor? (or more efficient unanchor?)
298
+ Reg::Backref should be multiple if the items it backreferences to were multiple
299
+ Reg::NumberSet
300
+
301
+
302
+ There are a few optimizations implemented currently, but none of them are
303
+ particularly significant. Some things will probably be quite slow.
304
+ All of the optimizations Friedl lists for regular expressions are pertinant to
305
+ Regs as well. Hopefully, someday, they will be implemented. For the record, they
306
+ are:
307
+ first item discrimination (special case of match cognizance)
308
+ fixed-sequence check
309
+ simple repetition (some is implemented)
310
+ fixed qualifier reduction (??)
311
+ length cognizance
312
+ match cognizance
313
+ need cognizance
314
+ anchors (edge cognizance)
315
+
316
+
317
+
318
+ todo:
319
+ vector Reg::Proc,Reg::ItemThat,Reg::Block,Reg::Variable,Reg::Constant
320
+ convert mmatch_multiple to mmatch in another class (or module) in logicals, subseq, repeat, etc
321
+ performance
322
+ variable binding
323
+ variable tracking... keeping each value assigned to a variable during the match in an array
324
+ compare string or file to Reg::Array (lexing)
325
+ rename Reg::Multiple to Reg::Vector
326
+ v rename proceq to item_that (& change conventions... item_that will return a CallLater)
327
+ ?implement Object#[](Reg::Reg) and Object#[]=(Reg::Reg, Reg::Replacement)
328
+ in-place substitutions should not be performed when Reg::Reg#=== or Reg::Reg#match called
329
+ -only when Array#sub or Array#[]=(Reg::Reg,Reg::Replacement)
330
+ perhaps Reg::Reg#match! does substitutions...
331
+ substitutions are applied to result of Array#[], but orig is not modified
332
+ v what about =~?
333
+ v research Txl and BURGs
334
+ more/better docs
335
+ expand user-level documentation
336
+ document operator, three-letter, and long version of everything
337
+ need an interface for returning a partial match if partial input given
338
+ array matcher should match array-like things like enum or (especially) two-way enum (cursor!)
339
+ How should Reg::Array match cursors?
340
+ arguments (including backref's) in property matchers
341
+ discontinuous number sets (and reg multipliers for them)
342
+ lookahead (including negated regmultiples)
343
+ laziness
344
+ inspect (mostly implemented... but maybe needs another name)
345
+ fix all the warnings
346
+ document sep and splitter
347
+ rdoc output on rubyforge
348
+ other docs on rubyforge
349
+ v borrow jim weirich's deferred
350
+ need interface to get all possible matches
351
+ alias +@ to reg in ItemThatLike?
352
+ x reg-nature needs be infectious in ItemThat
353
+ v or have a reg_that constructor like item_that, which makes an item_that extended by reg?
354
+ v reg::hash must not descend from ::hash
355
+ depth-mostly matches via ++[/pathkey/**/pathval/,...]
356
+ need a way to constrain the types of matcher that are allowed
357
+ -in a particular Reg::Array and (some of) its children
358
+ -eg in lex mode, String|Regexp|Integer|:to_s[]|OB|OBS
359
+ - in type mode, Class|Module|Reg::Knows|nil|true|false|OB|OBS
360
+ - in depth-mostly mode, Reg::Pair|Symbol|Symbol::WithArgs|Integer|Reg::Reg
361
+ - in ordered hash mode, Reg::Pair|Symbol|Reg::Reg|String
362
+ - in ordered obj mode, Reg::Pair|Symbol|Symbol::WithArgs
363
+ -Subseq, Not, And, Or, Xor, and the like are allowed in all modes if conforming to the same restrictions
364
+ Pair and Knows::WithArgs need constraint parameterization this way too.
365
+ v what is the meaning of :meth[]? no parameters for parameterlessness, use +:meth
366
+ all reg classes and matchers need to implement #==, #eql?, and #hash
367
+ -defaults only check object ids, so for instance: +[] != +[]
368
+ Reg::Array should be renamed Reg::Sequence (or something...) it's not just for arrays anymore...
369
+ when extending existing classes, check for func names already existing and chain to them
370
+ -(or just abort if the wrong ones are defined.)
371
+ v conflict in Set#&,|,^,+,-
372
+ allow expressions like this in hash and object matchers: +{:foo=>/bar/.-} to mean that the
373
+ -value is optional, but if non-default, must match /bar/.
374
+ v potentially confusing name conflict: const vs Const (in regsugar.rb vs regdeferred.rb)
375
+ sugar is too complicated. need to split into many small files in their own
376
+ -directory, ala the nano gem. (makes set piracy easier too.)
377
+
378
+
379
+ known bugs:
380
+ no backreferences
381
+ no substitutions
382
+ vector & and ^ wont work
383
+ explicit duck-typing (on mmatch) is used to distinguish regs and literals... should be is_a? Reg::Reg instead.
384
+ 0*INFINITY should at least cause a warning
385
+ some test cases are so slow as to be effectively unusable.
386
+
387
+
388
+
389
+ reg - the ruby extended grammar
390
+ Copyright (C) 2005 Caleb Clausen
391
+
392
+ This library is free software; you can redistribute it and/or
393
+ modify it under the terms of the GNU Lesser General Public
394
+ License as published by the Free Software Foundation; either
395
+ version 2.1 of the License, or (at your option) any later version.
396
+
397
+ This library is distributed in the hope that it will be useful,
398
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
399
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
400
+ Lesser General Public License for more details.
401
+
402
+ You should have received a copy of the GNU Lesser General Public
403
+ License along with this library; if not, write to the Free Software
404
+ Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
@@ -0,0 +1,31 @@
1
+ =begin copyright
2
+ Copyright (C) 2004,2005 Caleb Clausen
3
+
4
+ This library is free software; you can redistribute it and/or
5
+ modify it under the terms of the GNU Lesser General Public
6
+ License as published by the Free Software Foundation; either
7
+ version 2.1 of the License, or (at your option) any later version.
8
+
9
+ This library is distributed in the hope that it will be useful,
10
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
11
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
12
+ Lesser General Public License for more details.
13
+
14
+ You should have received a copy of the GNU Lesser General Public
15
+ License along with this library; if not, write to the Free Software
16
+ Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
17
+ =end
18
+
19
+ module Kernel
20
+ def assert(expr,msg="assertion failed")
21
+ $Debug and (expr or raise msg)
22
+ end
23
+
24
+ @@printed={}
25
+ def fixme(s)
26
+ unless @@printed[s]
27
+ @@printed[s]=1
28
+ $Debug and $stderr.print "FIXME: #{s}\n"
29
+ end
30
+ end
31
+ end
@@ -0,0 +1,73 @@
1
+ =begin copyright
2
+ reg - the ruby extended grammar
3
+ Copyright (C) 2005 Caleb Clausen
4
+
5
+ This library is free software; you can redistribute it and/or
6
+ modify it under the terms of the GNU Lesser General Public
7
+ License as published by the Free Software Foundation; either
8
+ version 2.1 of the License, or (at your option) any later version.
9
+
10
+ This library is distributed in the hope that it will be useful,
11
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
12
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
13
+ Lesser General Public License for more details.
14
+
15
+ You should have received a copy of the GNU Lesser General Public
16
+ License along with this library; if not, write to the Free Software
17
+ Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
18
+ =end
19
+ require 'reg'
20
+
21
+ #warning: this code is untested
22
+ #currently, it will not work because it depends on
23
+ #features of reg which do not exist (backreferences and substitutions). in addition,
24
+ #it is likely to contain serious bugs, as it has
25
+ #not been thoroughly tested or assured in any way.
26
+ #nevertheless, it should give you a good idea of
27
+ #how this sort of thing works.
28
+
29
+
30
+ precedence={
31
+ :'('=>10, :p=>10,
32
+ :* =>9, :/ =>9,
33
+ :+ =>8, :- =>8,
34
+ :'='=>7,
35
+ :';'=>6
36
+ }
37
+ name=String.reg
38
+ exp=name|PrintExp|OpExp|AssignExp|Number #definitions of the expression classes ommitted for brevity
39
+ leftop=/^[*\/;+-]$/
40
+ rightop=/^=$/
41
+ op=leftop|rightop
42
+ def lowerop opname
43
+ regproc{
44
+ leftop & proceq(Symbol) {|v| precedence[opname] >= precedence[v] }
45
+ }
46
+ end
47
+
48
+ #last element is always lookahead
49
+ Reduce=
50
+ -[ -[:p, '(', exp, ')'].sub {PrintExp.new BR[2]}, OB ] | # p(exp)
51
+ -[ -['(', exp, ')'] .sub {BR[1]}, OB ] | # (exp)
52
+ -[ -[exp, leftop, exp] .sub {OpExp.new *BR[0..2]}, lowerop(BR[1]) ] | # exp+exp
53
+ -[ exp, -[';'] .sub [], :EOI ] | #elide final trailing ;
54
+ -[ -[name, '=', exp] .sub {AssignExp.new BR[0],BR[2]}, lowerop('=') ] #name=exp
55
+
56
+ #last element of stack is always lookahead
57
+ def reduceloop(stack)
58
+ old_stack=stack
59
+ while stack.match +[OBS, Reduce]
60
+ end
61
+ stack.equal? old_stack or raise 'error'
62
+ end
63
+
64
+ #last element of stack is always lookahead
65
+ def parse(input)
66
+ input<<:EOI
67
+ stack=[input.shift]
68
+ until input.empty? and +[OB,:EOI]===stack
69
+ stack.push input.shift #shift
70
+ reduceloop stack
71
+ end
72
+ return stack.first
73
+ end