reg 0.4.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/README ADDED
@@ -0,0 +1,404 @@
1
+ Reg is a library for pattern matching in ruby data structures. Reg provides
2
+ Regexp-like match and match-and-replace for all data structures (particularly
3
+ Arrays, Objects, and Hashes), not just Strings.
4
+
5
+ Reg is best thought of in analogy to regular expressions; Regexps are special
6
+ data structures for matching Strings; Regs are special data structures for
7
+ matching ANY type of ruby data (Strings included, using Regexps).
8
+
9
+ If you have any questions, comments, problems, new feature requests, or just
10
+ want to figure out how to make it work for what you need to do, contact me:
11
+ reg _at_ inforadical _dot_ net
12
+
13
+ Reg is a RubyForge project. RubyForge is another good place to send your
14
+ bug reports or whatever: http://rubyforge.org/projects/reg/
15
+
16
+ (There aren't any bug filed against Reg there yet, but don't be afraid
17
+ that your report will get lonely.)
18
+
19
+
20
+
21
+ The implementation:
22
+ The engine (according to what I can tell from Friedl's book,
23
+ _Mastering_Regular_Expressions_,) is a traditional DFA with non-greedy alternation.
24
+ For performance, I'd like to move to a more NFA-oriented approach (trying many
25
+ different alternatives in parallel).
26
+
27
+ Status:
28
+ The only real (public) matching operator implemented thus far is:
29
+ Reg::Reg#=== (and descendants). It doesn't return a normalized boolean;
30
+ it will return a false value on no match or a true val if there was a match
31
+ but beyond that, nothing is guaranteed.
32
+
33
+ A number of important features are unimplemented at this point, most notably
34
+ backreferences and substitutions.
35
+
36
+ The backtracking engine appears to be completely functional now. Vector
37
+ Reg::And doesn't work.
38
+
39
+
40
+ This table compares syntax of Reg and Regexp for various constructs. Keep
41
+ in mind that all Regs are ordinary ruby expressions. The special syntax
42
+ is acheived by overriding ruby operators.
43
+
44
+ In the following examples,
45
+ re,re1,re2 represent arbitrary regexp subexpressions,
46
+ r,r1,r2 represent arbitrary reg subexpressions
47
+ s,t represent any single character (perhaps appropriately escaped, if the char is magical)
48
+
49
+ reg regexp reg class #description
50
+
51
+ +[r1,r2,r3] /re1re2re3/ Reg::Array #sequence
52
+ -[r1,r2] (re1re2) Reg::Subseq #subsequence
53
+ r.lit \re Reg::Literal #escaping a magical
54
+ regproc{r} #{re} (not really named) #dynamic inclusion
55
+ r1|r2 or :OR (re1|re2) or [st] Reg::Or #alternation
56
+ ~r [^s] Reg::Not #negation (for scalar r and s)
57
+ r.* re* Reg::Repeat #zero or more matches
58
+ r.+ re+ Reg::Repeat #one or more matches
59
+ r.- re? Reg::Repeat #zero or one matches
60
+ r*n re{n} Reg::Repeat #exactly n matches
61
+ r*(n..m) re{n,m} Reg::Repeat #at least n, at most m matches
62
+ r-n re{n,} Reg::Repeat #at most n matches
63
+ r+m re{,m} Reg::Repeat #at least m matches
64
+ OB . Reg::Any #a single item
65
+ OBS .* Reg::AnyMultiple #zero or more items
66
+ BR[1,2] \1,\2 Reg::Backref #backreference ***
67
+ r>>x or sub sub,gsub Reg::Transform #search and replace ***
68
+
69
+
70
+ here are features of reg that don't have an equivalent in regexp
71
+ r.la Reg::Lookahead #lookahead ***
72
+ ~-[] Reg::Lookahead #subsequence negation w/lookahead ***
73
+ & or :AND Reg::And #all alternatives match
74
+ ^ or :XOR Reg::Xor #exactly one of alternatives matches
75
+ +{r1=>r2} Reg::Hash #hash matcher
76
+ -{name=>r} Reg::Object #object matcher
77
+ obj.reg Reg::Fixed #turn any ruby object into a reg that matches if obj.=== succeeds
78
+ /re/.sym Reg::Symbol #a symbol regex
79
+ item_that{|x|rcode} Reg::ItemThat #a proc{} that responds to === by invoking the proc's call
80
+ OBS as un-anchor Reg::AnyMultiple #opposite of ^ and $ when placed at edges of a reg array (kinda cheesy)
81
+ name=r (just a var assign) #named subexpressions
82
+ Reg::var Reg::Variable #recursive matches via Reg::Variable & Reg::Constants
83
+ Reg::const Reg::Constant
84
+
85
+
86
+ *** = not implemented yet.
87
+
88
+ "... the effect of drinking a Pan Galactic Gargle Blaster is like having
89
+ your brains smashed out by a slice of lemon wrapped round a large gold
90
+ brick."
91
+ -- Douglas Adams, _Hitchhiker's_Guide_to_the_Galaxy_
92
+
93
+ Reg is kind of hard to bend your brain around, so here are some examples:
94
+
95
+ 0.4.6 examples:
96
+
97
+ Matches a single item whose method 'length' returns a Fixnum:
98
+ item_that.length.is_a? Fixnum
99
+
100
+ There's a new way to match hashes; it looks more-or-less like the old way and
101
+ behaves a little differently. The old type of hash matcher (now called an
102
+ unordered hash matcher) looked like:
103
+
104
+ +{/fo+/=>8, /ba+r/=>9}
105
+
106
+ The new syntax uses +[] instead of +{} and ** instead of =>. It's called an
107
+ ordered hash matcher. The order of filter pairs given in an ordered matcher
108
+ is the order comparisons are done in. The same is not true within unordered
109
+ matchers, where order is inferred from the nature of the key matchers. The
110
+ ordered equivalent of the last example is:
111
+
112
+ +[/fo+/**8, /ba+r/**9]
113
+
114
+ Both match hashes whose keys match /fo+/ with value of 8 or match /ba+r/ with
115
+ value of 9 (and nothing else). But if the data looks like: {"foobar"=>8},
116
+ then it is guaranteed to match the second (because /fo+/ is always given a
117
+ chance first), but might or might not match the first (because the order is
118
+ unspecified).
119
+
120
+ Here's an example of a Reg::Knows matcher, which matches objects that have the
121
+ #slice method:
122
+ -:slice
123
+
124
+ 0.4.5 examples:
125
+
126
+ Matches array containing exactly 2 elements; 1st is another array, 2nd is
127
+ integer:
128
+ +[Array,Integer]
129
+
130
+ Like above, but 1st is array of arrays of symbol
131
+ +[+[+[Symbol+0]+0],Integer]
132
+
133
+ Matches array of at least 3 consecutive symbols and nothing else:
134
+ +[Symbol+3]
135
+
136
+ Matches array with at least 3 symbols in it somewhere:
137
+ +[OBS, Symbol+3, OBS]
138
+
139
+ Matches array of at most 6 strings starting with 'g'
140
+ +[/^g/-6] #no .reg necessary for regexp
141
+
142
+ Matches array of between 5 and 9 hashes containing a key :k pointing to
143
+ something non-nil:
144
+ +[ +{:k=>~nil.reg}*(5..9) ]
145
+
146
+ Matches an object with Integer instance variable @k and property (ie method)
147
+ foobar that returns a string with 'baz' somewhere in it:
148
+ -{:@k=>Integer, :foobar=>/baz/}
149
+
150
+ Matches array of 6 hashes with 6 as a value of every key, followed by
151
+ 18 objects with an attribute @s which is a String:
152
+ +[ +{OB=>6}*6, -{:@s=>String}*18 ]
153
+
154
+
155
+
156
+ Api changes since 0.4.5:
157
+ Reg::Hash semantics have been changing recently.... Reg::Object may be changed
158
+ to suit.
159
+
160
+ Api changes since 0.4.0:
161
+ Reg() makes Reg::Arrays now, not hash matchers; use Rah to make hashes.
162
+ Array#reg and Hash#reg no longer return a Reg::Array or Reg::Hash. In fact
163
+ the names of most classes have changed; they've been moved into the Reg
164
+ namespace (aka module). The previous Reg module is now named Reg::Reg. The
165
+ other names have changed in the obvious way. RegArray is now Reg::Array, etc.
166
+ For the most part these changes don't affect users (if any) because they
167
+ leave the shortest representation (the mini-language) unaffected. The one
168
+ exception (where you have to refer to a reg module name) is the name of the
169
+ module Reg, which is now Reg::Reg. If anyone has 'include Reg' in their
170
+ class or module to get all of Reg's yummy operators, look out it's changed
171
+ to 'include Reg::Reg' instead. Aliases are mostly provided from the new to the
172
+ old class names... but an alias from Reg to Reg::Reg obviously creates
173
+ a conflict.
174
+
175
+
176
+
177
+ the api (mostly unimplemented):
178
+ r represents a reg
179
+ t represents a transform
180
+ o represents an object
181
+ a represents an array
182
+ s represents a string
183
+ h represents a hash
184
+ scan represents the entire stringscanner interface...
185
+ -(scan,skip,match?,check and their unanchored and backward forms)
186
+ c represents a cursor
187
+ ! implies in-place modification
188
+
189
+ r===o #v
190
+ r=~o #v
191
+ sach=~r #v-
192
+ r.match o #result contains changes
193
+ r.match! o
194
+ c.sub!(r[,t]) #in-place modification only with cursors
195
+ c.gsub!(r[,t])
196
+ oah.sub(r[,t]) #modifies in result
197
+ oah.gsub(r[,t])
198
+ oah.sub!(r[,t]) #inplace modify
199
+ oah.gsub!(r[,t])
200
+ a.scan(r) #modifies in result
201
+
202
+ c.index/rindex r #use exist?/existback?...?
203
+ c.slice! r #modifies in result
204
+ c.split r
205
+ c.find_all r #like String#scan
206
+ c.find r
207
+ ho.find_all [r-key,] r-value
208
+ ho.find [r-key,] r-value
209
+
210
+ ho.index r
211
+ a.split r
212
+ s.find_all r
213
+ s.find r
214
+ s.delete r
215
+ s.delete! r
216
+ s.delete_all r
217
+ s.delete_all! r
218
+
219
+ #these require wrapping library methods to also take different args
220
+ a.slice r
221
+ ahoc.slice! r
222
+ o=~r
223
+ oahc[r]
224
+ oahc[r]=t
225
+ c.scan(r)
226
+ a.find_all r
227
+ a.find r
228
+
229
+ #i'd like to have these, but they can't safely be wrapped,
230
+ #so i'll have to think of different names.
231
+ as.index/rindex r #=> offset/roffset ...use exist?/existback? instead
232
+ s.slice r #=> rslice
233
+ s.slice! r
234
+ s.split r #=> rsplit
235
+ s[r] #=> s-[r]
236
+ s[r]=t #=> s-[r,t]
237
+ s.sub(r[,t]) #=> rsub
238
+ s.gsub(r[,t]) #=> grsub
239
+ s.sub!(r[,t]) #etc
240
+ s.gsub!(r[,t])
241
+ s.scan(r) #=> rscan... note scan only conflicts; the rest of the stringscanner interface
242
+ # can be unchanged.
243
+
244
+ #maybe stuff from Enumerable?
245
+
246
+ Reg::Progress work list:
247
+
248
+ phase 1: array only
249
+ fill out backtrack
250
+ import asserts from backtrace=>backtrack
251
+ disable backtrace
252
+ backtrack should respect update_di
253
+ callers of backtrace must use a progress instead
254
+ call backtrack on progress instead of backtrace...
255
+ matchsets unmodified as yet (ok, except repeat and subseq matchsets)
256
+ push_match and push_matchset need to be called in right places in Reg::Array (what else?)
257
+ note which parts of regarray.rb have been obsoleted by regprogress.rb
258
+
259
+ phase 2:
260
+ eventually, MatchSet#next_match will take a cursor parameter, and return a number of items consumed or progress or nil
261
+ entering some types of subreg creates a subprogress
262
+ arrange for process_deferreds to be called in the right places
263
+ create Reg::Bound (for vars) and Reg::SideEffect, Reg::Undo, Reg::Eventually with sugar
264
+ -Reg#bind, Reg#side_effect, Reg#undo, Reg#eventually
265
+ -and of course Reg::Transform and Reg::Replace
266
+ -Reg::Reg#>>(Reg::Replace) makes a Transform, and certain things can mix in module Replace
267
+ should Reg::Bound be a module?
268
+ should Reg::Bound be a Deferred?
269
+ Reg::Transform calls Reg::Progress#eventually
270
+ implicit progress needs to be made when doing standalone compare of
271
+ -Reg::Object, Reg::Hash, Reg::Array, Reg::logicals, Reg::Bound, Reg::Transform, maybe others
272
+
273
+ these are stubbed at least now:
274
+ Backtrace.clean_result and Backtrace.check_result should operate on progresses instead
275
+ need Reg::Progress#bt_match,last_next_match,to_result,check_result,clean_result
276
+ need Reg::Progress#deep_copy for use in repeat and subseq matchsets
277
+ need MatchSet#clean_result which delegates to the internal Progress, if any
278
+ rewrite repeat and subseq to use progress internally? (in progress only...)
279
+ Reg::(and,repeat,subseq,array) require progress help
280
+
281
+
282
+ varieties of Reg::Replace:
283
+ Reg::Backref and Reg::Bound
284
+ Reg::RepProc
285
+ Reg::ItemThat
286
+ Reg::Fixed
287
+ Object (as if Reg::Fixed)
288
+ Reg::Array and Reg::Subseq?
289
+ Array (as if Reg::Array)
290
+ Reg::Transform?
291
+
292
+
293
+
294
+
295
+
296
+ not implemented yet:
297
+ Reg::Anchor? (or more efficient unanchor?)
298
+ Reg::Backref should be multiple if the items it backreferences to were multiple
299
+ Reg::NumberSet
300
+
301
+
302
+ There are a few optimizations implemented currently, but none of them are
303
+ particularly significant. Some things will probably be quite slow.
304
+ All of the optimizations Friedl lists for regular expressions are pertinant to
305
+ Regs as well. Hopefully, someday, they will be implemented. For the record, they
306
+ are:
307
+ first item discrimination (special case of match cognizance)
308
+ fixed-sequence check
309
+ simple repetition (some is implemented)
310
+ fixed qualifier reduction (??)
311
+ length cognizance
312
+ match cognizance
313
+ need cognizance
314
+ anchors (edge cognizance)
315
+
316
+
317
+
318
+ todo:
319
+ vector Reg::Proc,Reg::ItemThat,Reg::Block,Reg::Variable,Reg::Constant
320
+ convert mmatch_multiple to mmatch in another class (or module) in logicals, subseq, repeat, etc
321
+ performance
322
+ variable binding
323
+ variable tracking... keeping each value assigned to a variable during the match in an array
324
+ compare string or file to Reg::Array (lexing)
325
+ rename Reg::Multiple to Reg::Vector
326
+ v rename proceq to item_that (& change conventions... item_that will return a CallLater)
327
+ ?implement Object#[](Reg::Reg) and Object#[]=(Reg::Reg, Reg::Replacement)
328
+ in-place substitutions should not be performed when Reg::Reg#=== or Reg::Reg#match called
329
+ -only when Array#sub or Array#[]=(Reg::Reg,Reg::Replacement)
330
+ perhaps Reg::Reg#match! does substitutions...
331
+ substitutions are applied to result of Array#[], but orig is not modified
332
+ v what about =~?
333
+ v research Txl and BURGs
334
+ more/better docs
335
+ expand user-level documentation
336
+ document operator, three-letter, and long version of everything
337
+ need an interface for returning a partial match if partial input given
338
+ array matcher should match array-like things like enum or (especially) two-way enum (cursor!)
339
+ How should Reg::Array match cursors?
340
+ arguments (including backref's) in property matchers
341
+ discontinuous number sets (and reg multipliers for them)
342
+ lookahead (including negated regmultiples)
343
+ laziness
344
+ inspect (mostly implemented... but maybe needs another name)
345
+ fix all the warnings
346
+ document sep and splitter
347
+ rdoc output on rubyforge
348
+ other docs on rubyforge
349
+ v borrow jim weirich's deferred
350
+ need interface to get all possible matches
351
+ alias +@ to reg in ItemThatLike?
352
+ x reg-nature needs be infectious in ItemThat
353
+ v or have a reg_that constructor like item_that, which makes an item_that extended by reg?
354
+ v reg::hash must not descend from ::hash
355
+ depth-mostly matches via ++[/pathkey/**/pathval/,...]
356
+ need a way to constrain the types of matcher that are allowed
357
+ -in a particular Reg::Array and (some of) its children
358
+ -eg in lex mode, String|Regexp|Integer|:to_s[]|OB|OBS
359
+ - in type mode, Class|Module|Reg::Knows|nil|true|false|OB|OBS
360
+ - in depth-mostly mode, Reg::Pair|Symbol|Symbol::WithArgs|Integer|Reg::Reg
361
+ - in ordered hash mode, Reg::Pair|Symbol|Reg::Reg|String
362
+ - in ordered obj mode, Reg::Pair|Symbol|Symbol::WithArgs
363
+ -Subseq, Not, And, Or, Xor, and the like are allowed in all modes if conforming to the same restrictions
364
+ Pair and Knows::WithArgs need constraint parameterization this way too.
365
+ v what is the meaning of :meth[]? no parameters for parameterlessness, use +:meth
366
+ all reg classes and matchers need to implement #==, #eql?, and #hash
367
+ -defaults only check object ids, so for instance: +[] != +[]
368
+ Reg::Array should be renamed Reg::Sequence (or something...) it's not just for arrays anymore...
369
+ when extending existing classes, check for func names already existing and chain to them
370
+ -(or just abort if the wrong ones are defined.)
371
+ v conflict in Set#&,|,^,+,-
372
+ allow expressions like this in hash and object matchers: +{:foo=>/bar/.-} to mean that the
373
+ -value is optional, but if non-default, must match /bar/.
374
+ v potentially confusing name conflict: const vs Const (in regsugar.rb vs regdeferred.rb)
375
+ sugar is too complicated. need to split into many small files in their own
376
+ -directory, ala the nano gem. (makes set piracy easier too.)
377
+
378
+
379
+ known bugs:
380
+ no backreferences
381
+ no substitutions
382
+ vector & and ^ wont work
383
+ explicit duck-typing (on mmatch) is used to distinguish regs and literals... should be is_a? Reg::Reg instead.
384
+ 0*INFINITY should at least cause a warning
385
+ some test cases are so slow as to be effectively unusable.
386
+
387
+
388
+
389
+ reg - the ruby extended grammar
390
+ Copyright (C) 2005 Caleb Clausen
391
+
392
+ This library is free software; you can redistribute it and/or
393
+ modify it under the terms of the GNU Lesser General Public
394
+ License as published by the Free Software Foundation; either
395
+ version 2.1 of the License, or (at your option) any later version.
396
+
397
+ This library is distributed in the hope that it will be useful,
398
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
399
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
400
+ Lesser General Public License for more details.
401
+
402
+ You should have received a copy of the GNU Lesser General Public
403
+ License along with this library; if not, write to the Free Software
404
+ Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
@@ -0,0 +1,31 @@
1
+ =begin copyright
2
+ Copyright (C) 2004,2005 Caleb Clausen
3
+
4
+ This library is free software; you can redistribute it and/or
5
+ modify it under the terms of the GNU Lesser General Public
6
+ License as published by the Free Software Foundation; either
7
+ version 2.1 of the License, or (at your option) any later version.
8
+
9
+ This library is distributed in the hope that it will be useful,
10
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
11
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
12
+ Lesser General Public License for more details.
13
+
14
+ You should have received a copy of the GNU Lesser General Public
15
+ License along with this library; if not, write to the Free Software
16
+ Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
17
+ =end
18
+
19
+ module Kernel
20
+ def assert(expr,msg="assertion failed")
21
+ $Debug and (expr or raise msg)
22
+ end
23
+
24
+ @@printed={}
25
+ def fixme(s)
26
+ unless @@printed[s]
27
+ @@printed[s]=1
28
+ $Debug and $stderr.print "FIXME: #{s}\n"
29
+ end
30
+ end
31
+ end
@@ -0,0 +1,73 @@
1
+ =begin copyright
2
+ reg - the ruby extended grammar
3
+ Copyright (C) 2005 Caleb Clausen
4
+
5
+ This library is free software; you can redistribute it and/or
6
+ modify it under the terms of the GNU Lesser General Public
7
+ License as published by the Free Software Foundation; either
8
+ version 2.1 of the License, or (at your option) any later version.
9
+
10
+ This library is distributed in the hope that it will be useful,
11
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
12
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
13
+ Lesser General Public License for more details.
14
+
15
+ You should have received a copy of the GNU Lesser General Public
16
+ License along with this library; if not, write to the Free Software
17
+ Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
18
+ =end
19
+ require 'reg'
20
+
21
+ #warning: this code is untested
22
+ #currently, it will not work because it depends on
23
+ #features of reg which do not exist (backreferences and substitutions). in addition,
24
+ #it is likely to contain serious bugs, as it has
25
+ #not been thoroughly tested or assured in any way.
26
+ #nevertheless, it should give you a good idea of
27
+ #how this sort of thing works.
28
+
29
+
30
+ precedence={
31
+ :'('=>10, :p=>10,
32
+ :* =>9, :/ =>9,
33
+ :+ =>8, :- =>8,
34
+ :'='=>7,
35
+ :';'=>6
36
+ }
37
+ name=String.reg
38
+ exp=name|PrintExp|OpExp|AssignExp|Number #definitions of the expression classes ommitted for brevity
39
+ leftop=/^[*\/;+-]$/
40
+ rightop=/^=$/
41
+ op=leftop|rightop
42
+ def lowerop opname
43
+ regproc{
44
+ leftop & proceq(Symbol) {|v| precedence[opname] >= precedence[v] }
45
+ }
46
+ end
47
+
48
+ #last element is always lookahead
49
+ Reduce=
50
+ -[ -[:p, '(', exp, ')'].sub {PrintExp.new BR[2]}, OB ] | # p(exp)
51
+ -[ -['(', exp, ')'] .sub {BR[1]}, OB ] | # (exp)
52
+ -[ -[exp, leftop, exp] .sub {OpExp.new *BR[0..2]}, lowerop(BR[1]) ] | # exp+exp
53
+ -[ exp, -[';'] .sub [], :EOI ] | #elide final trailing ;
54
+ -[ -[name, '=', exp] .sub {AssignExp.new BR[0],BR[2]}, lowerop('=') ] #name=exp
55
+
56
+ #last element of stack is always lookahead
57
+ def reduceloop(stack)
58
+ old_stack=stack
59
+ while stack.match +[OBS, Reduce]
60
+ end
61
+ stack.equal? old_stack or raise 'error'
62
+ end
63
+
64
+ #last element of stack is always lookahead
65
+ def parse(input)
66
+ input<<:EOI
67
+ stack=[input.shift]
68
+ until input.empty? and +[OB,:EOI]===stack
69
+ stack.push input.shift #shift
70
+ reduceloop stack
71
+ end
72
+ return stack.first
73
+ end