cannonbol 1.3.0 → 2.0.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +5 -5
- data/README.md +273 -184
- data/cannonbol.gemspec +4 -4
- data/lib/cannonbol/cannonbol.rb +217 -204
- data/lib/cannonbol/version.rb +1 -1
- data/snobol4-language-reference.pdf +0 -0
- metadata +6 -48
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
|
-
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
2
|
+
SHA256:
|
3
|
+
metadata.gz: 62a3b9c14f45c3694de243dee4883528c9dbd6f2ddc45e05784cb16f5841747f
|
4
|
+
data.tar.gz: b07d05835793f4f5d6ffe018c2c62f5adc64e59e1a49d11a5f25481d604ff89e
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 5742f625d9be04482f83e261f17260f24dd78e88ceb750eb436457d0dcf2cde39ead41b95fb872a28fc248faaeadd02304489ecf1b66dbcd7bd056450acc14dd
|
7
|
+
data.tar.gz: e7839e4136869b91cf53a3c28d9bdb5b9aa4822482365e12599436be910abc9792a4ccea5a924449775d4e7c143ba89d53f35648fc03ad8a91ad814d5852df57
|
data/README.md
CHANGED
@@ -27,107 +27,133 @@ Or install it yourself as:
|
|
27
27
|
|
28
28
|
$ gem install cannonbol
|
29
29
|
|
30
|
+
## Compatibility With Ruby 2.5
|
31
|
+
|
32
|
+
> In Ruby 2.5 the methods `#match?` and `#-@` were added to strings and regex patterns. These methods were also used to build
|
33
|
+
patterns in Cannonbol. The latest version of Cannonbol uses the `#matches?` and `#insensitive` methods instead. See the
|
34
|
+
end of the readme for using your old patterns with the latest Cannonbol.
|
35
|
+
|
30
36
|
## Lets Go!
|
31
37
|
|
32
38
|
### Basic Matching `- &, |, capture?, match_any, match_all`
|
33
39
|
|
34
|
-
Strings, Regexes and primitives are combined using & (concatenation) and | (alternation) operators.
|
40
|
+
Strings, Regexes and primitives are combined using & (concatenation) and | (alternation) operators.
|
35
41
|
|
36
42
|
Here is a simple pattern that matches a simple noun clause:
|
37
43
|
|
38
|
-
|
39
|
-
|
44
|
+
```ruby
|
45
|
+
> ("a" | "the") & /\s+/ & ("boy" | "girl")
|
46
|
+
```
|
47
|
+
|
40
48
|
This will match either "a" or "the" followed white space and then by "boy or "girl". Okay! Lets use it!
|
41
49
|
|
42
|
-
|
43
|
-
|
44
|
-
|
45
|
-
|
50
|
+
```ruby
|
51
|
+
> ("a" | "the") & /\s+/ & ("boy" | "girl").matches?("he saw a boy going home")
|
52
|
+
=> "a boy"
|
53
|
+
> ("a" | "the") & /\s+/ & ("boy" | "girl").matches?("he saw a big boy going home")
|
54
|
+
=> nil
|
55
|
+
```
|
46
56
|
|
47
57
|
Now let's save the pieces of the match using the `capture?` (pronounced _capture IF_) method:
|
48
58
|
|
49
|
-
|
50
|
-
|
51
|
-
|
52
|
-
|
53
|
-
|
54
|
-
|
55
|
-
|
56
|
-
|
59
|
+
```ruby
|
60
|
+
> article, noun = nil, nil;
|
61
|
+
* pattern = ("a" | "the").capture? { |m| article = m } & /\s+/ & ("boy" | "girl").capture? { |m| noun = m };
|
62
|
+
* pattern.matches?("he saw the girl going home")
|
63
|
+
=> the girl
|
64
|
+
> noun
|
65
|
+
=> girl
|
66
|
+
> article
|
67
|
+
=> the
|
68
|
+
```
|
57
69
|
|
58
|
-
The `capture?` method and its friend `capture!` (pronounced _capture NOW_) have many powerful features.
|
70
|
+
The `capture?` method and its friend `capture!` (pronounced _capture NOW_) have many powerful features.
|
59
71
|
As shown above it can take a block which is passed the matching substring, _IF the match succeeds_.
|
60
|
-
The other features of the capture method will be detailed [below.](
|
72
|
+
The other features of the capture method will be detailed [below.](#advanced-capture-techniques)
|
61
73
|
|
62
74
|
Arrays can be turned into patterns using the `match_any` and `match_all` methods:
|
63
75
|
|
64
|
-
|
65
|
-
|
66
|
-
|
67
|
-
|
68
|
-
|
69
|
-
|
70
|
-
|
76
|
+
```ruby
|
77
|
+
ARTICLES = ["a", "the"]
|
78
|
+
NOUNS = ["boy", "girl", "dog", "cat"]
|
79
|
+
ADJECTIVES = ["big", "small", "fierce", "friendly"]
|
80
|
+
WS = /\s+/
|
81
|
+
[ARTICLES.match_any, [WS, [WS, ADJECTIVES.match_any, WS].match_all].match_any, NOUNS.match_any].match_all
|
82
|
+
```
|
83
|
+
|
84
|
+
This is equivilent to
|
85
|
+
|
86
|
+
```ruby
|
87
|
+
("a" | "the") & (WS | (WS & ("big" | "small" | "fierce" | "friendly") & WS)) & ("boy" | "girl" | "dog" | "cat")
|
88
|
+
```
|
71
89
|
|
72
|
-
|
73
|
-
|
74
|
-
### match? options
|
90
|
+
### The matches? options
|
75
91
|
|
76
|
-
The
|
92
|
+
The `matches?` method shown above takes a couple of options to globally control the match process:
|
77
93
|
|
78
94
|
option | default | meaning
|
79
95
|
------|-----|-----
|
80
|
-
|
81
|
-
anchor | false | When on pattern matching must begin at the first character. Normally the matcher will keep moving the starting character to the right, until the match suceeds.
|
82
|
-
raise_error | false | When on, a match failure will raise Cannonbol::MatchFailed.
|
83
|
-
replace_with | nil | When a non-falsy value is supplied, the value will replace the matched portion of the string, and the entire string will be returned. Normally only the matched portion of the string is returned.
|
96
|
+
`insensitive` | `false` | When on, the basic regex and string pattern will NOT be case sensitive. Note you can also use `ignore_case`
|
97
|
+
`anchor` | `false` | When on, pattern matching must begin at the first character. Normally the matcher will keep moving the starting character to the right, until the match suceeds.
|
98
|
+
`raise_error` | `false` | When on, a match failure will raise Cannonbol::MatchFailed.
|
99
|
+
`replace_with` | `nil` | When a non-falsy value is supplied, the value will replace the matched portion of the string, and the entire string will be returned. Normally only the matched portion of the string is returned.
|
84
100
|
|
85
101
|
Example of replace with:
|
86
102
|
|
87
|
-
|
88
|
-
|
89
|
-
|
90
|
-
|
91
|
-
|
103
|
+
```ruby
|
104
|
+
> "hello".matches?("She said hello!")
|
105
|
+
=> hello
|
106
|
+
> "hello".matches?("She said hello!", replace_with => "goodby")
|
107
|
+
=> She said goodby!
|
108
|
+
```
|
109
|
+
|
92
110
|
#### Ignore case on a subpattern
|
93
111
|
|
94
|
-
Sometimes
|
95
|
-
|
112
|
+
Sometimes it's useful to run the matcher in the default case sensitive mode, and only turn off matching for one part of the pattern. To do this
|
113
|
+
use the `#insensitive` method. For example
|
96
114
|
|
97
|
-
|
98
|
-
|
99
|
-
|
100
|
-
|
101
|
-
|
102
|
-
|
115
|
+
```ruby
|
116
|
+
> ("GIRL".insensitive | "boy").matches?("A big girl!")
|
117
|
+
=> girl
|
118
|
+
> ("GIRL".insensitive | "boy").matches?("A big BOY!")
|
119
|
+
=> nil
|
120
|
+
```
|
103
121
|
|
104
|
-
|
122
|
+
### Patterns, Subjects, Cursors, Alternatives, and Backtracking
|
105
123
|
|
106
|
-
|
107
|
-
|
108
|
-
|
109
|
-
|
110
|
-
|
111
|
-
|
124
|
+
A pattern is an object that responds to the matches? method. Cannonbol adds the matches? method to Ruby strings, and regexes, and provides a number of _primitive_ patterns. A pattern can be combined with another pattern using the &, and | operators. There are also several primitive patterns that take a pattern and create a new pattern. Here are some example patterns:
|
125
|
+
|
126
|
+
```ruby
|
127
|
+
"hello" # matches any string containing hello
|
128
|
+
/\s+/ # matches one or more white space characters
|
129
|
+
"hello" & /\s+/ & "there" # matches "hello" and "there" seperated by white space
|
130
|
+
"hello" | "goodby" # matches EITHER "hello" or "goodby"
|
131
|
+
ARB # a primitive pattern that matches anything (similar to /.*/)
|
132
|
+
("hello" | "goodby") & ARB & "Fred" # matches "hello" or "goodby" followed by any characters and finally "Fred"
|
133
|
+
```
|
112
134
|
|
113
135
|
Patterns are just objects, so they can be assigned to variables:
|
114
136
|
|
115
|
-
|
116
|
-
|
117
|
-
|
118
|
-
|
137
|
+
```ruby
|
138
|
+
greeting = "hello" | "goodby"
|
139
|
+
names = "Fred" | "Suzy"
|
140
|
+
ws = /\s+/
|
141
|
+
greeting & ws & names # matches "hello Fred" or "goodby Suzy"
|
142
|
+
```
|
119
143
|
|
120
|
-
The first parameter of the
|
121
|
-
matching begins again one character to the right. This continues until a match is made, or there are insufficient characters to make a match. This behavior can be turned off by specifying `anchor: true` in the
|
144
|
+
The first parameter of the matches? method is the subject string. The subject string is matched left to right driven by the pattern object. Normally the matcher will attempt to match starting at the first character. If no match is found, then
|
145
|
+
matching begins again one character to the right. This continues until a match is made, or there are insufficient characters to make a match. This behavior can be turned off by specifying `anchor: true` in the matches? options hash.
|
122
146
|
|
123
147
|
The current position of the matcher in the string is the _cursor_. The cursor begins at zero and as each character is matched it moves to the right. If the match fails (and anchor is false) then the match is restarted with the cursor at position 1, etc.
|
124
148
|
|
125
|
-
Alternatives are considered left to right as specified in the pattern. Once an alternative is matched, the matcher moves on to the next
|
149
|
+
Alternatives are considered left to right as specified in the pattern. Once an alternative is matched, the matcher moves on to the next element in the pattern, but it does remember the alternative, and if matching fails at a later pattern, the matcher will back up and try the next alternative. For example:
|
126
150
|
|
127
|
-
|
128
|
-
|
129
|
-
|
130
|
-
|
151
|
+
```ruby
|
152
|
+
a_pattern = "a" | "aaa" | "aa"
|
153
|
+
b_pattern = "b" | "aaabb" | "abbbc"
|
154
|
+
c_pattern = "cc"
|
155
|
+
(a_pattern & b_pattern & c_pattern).matches?("aaabbbccc")
|
156
|
+
```
|
131
157
|
|
132
158
|
* "a" is matched from a_pattern, and then we move to b_pattern.
|
133
159
|
* None of the alternatives in b_pattern can match, so we backtrack and try the next alterntive in the a_pattern,
|
@@ -138,7 +164,7 @@ Alternatives are considered left to right as specified in the pattern. Once an
|
|
138
164
|
* "aa" now matches, and so we move to the b_pattern, which can only match its last alternative, and
|
139
165
|
* finally we complete the match!
|
140
166
|
|
141
|
-
For a more complete explanation see the [SNOBOL4 manual Chapter 2](
|
167
|
+
For a more complete explanation see the [SNOBOL4 manual Chapter 2](https://github.com/catprintlabs/cannonbol/blob/master/snobol4-language-reference.pdf)
|
142
168
|
|
143
169
|
Bottom line is the matcher will try every possible option until a match is made or the match fails.
|
144
170
|
|
@@ -148,21 +174,23 @@ Cannonbol includes the complete set of SNOBOL4 + SPITBOL primitive patterns and
|
|
148
174
|
|
149
175
|
`REM` Match 0 or more characters to the end of the subject string.
|
150
176
|
|
151
|
-
`("the" & REM).
|
177
|
+
`("the" & REM).matches?("he saw the small boy") === "the small boy"`
|
152
178
|
|
153
179
|
`ARB` Match 0 or more characters. ARB first tries to match zero characters, then 1 character, then 2 until the match succeeds. It is roughly equivilent to `\.*\`, except the regex will NOT backtrack like ARB will.
|
154
180
|
|
155
|
-
`("the" & ARB & "boy").
|
181
|
+
`("the" & ARB & "boy").matches?("he saw the small boy running") === "the small boy"`
|
156
182
|
|
157
183
|
`LEN(n)` Match any n characters. Equivilent to `\.{n}\`
|
158
184
|
|
159
185
|
`POS(x)` Match ONLY if current cursor is at x. POS(0) is the start of the string.
|
160
186
|
|
161
|
-
`(POS(5) & ARB & POS(7)).
|
187
|
+
`(POS(5) & ARB & POS(7)).matches?("01234567") === "56"`
|
162
188
|
|
163
189
|
`RPOS(x)` Just like POS except measured from the end of the string. I.e. RPOS(0) is just after the last character.
|
164
190
|
|
165
|
-
`("hello" & RPOS(0)).
|
191
|
+
`("hello" & RPOS(0)).matches?("she said hello!")` would fail.
|
192
|
+
|
193
|
+
`("hello" & RPOS(1)).matches?("she said hello!")` would succeed.
|
166
194
|
|
167
195
|
`TAB(x)` Is equivilent to `ARB & POS(x)`. In otherwords match zero or more characters up to the x'th character. Fails if x < the current cursor.
|
168
196
|
|
@@ -184,57 +212,66 @@ Cannonbol includes the complete set of SNOBOL4 + SPITBOL primitive patterns and
|
|
184
212
|
|
185
213
|
### Delayed Evaluation of Primitive Pattern Parameters
|
186
214
|
|
187
|
-
There are several cases where it is useful to delay the evaluation of a primitive pattern arguments until the match is
|
215
|
+
There are several cases where it is useful to delay the evaluation of a primitive pattern arguments until the match is
|
188
216
|
being made, rather than when the pattern is created.
|
189
217
|
|
190
218
|
To allow for this all primitive patterns can take a block. The block is evaluated when the matcher encounters the primitive, and the result of the block is used as the argument to the pattern.
|
191
219
|
|
192
220
|
Here is a method that will parse a set of fixed width fields, where the widths are supplied as arguments to the method:
|
193
221
|
|
194
|
-
|
222
|
+
```ruby
|
223
|
+
def parse(s, *widths)
|
195
224
|
fields = []
|
196
|
-
(ARBNO(LEN {widths.shift}.capture? {|field| fields << field}) & RPOS(0)).
|
225
|
+
(ARBNO(LEN {widths.shift}.capture? {|field| fields << field}) & RPOS(0)).matches?(s)
|
197
226
|
fields
|
198
227
|
end
|
228
|
+
```
|
199
229
|
|
200
230
|
To really get into the power of delayed evaluation however we need to add two more concepts:
|
201
231
|
|
202
|
-
The MATCH primitive, and the capture
|
232
|
+
The `MATCH` primitive, and the `capture!` (pronounced _capture NOW_) method.
|
233
|
+
|
234
|
+
The `capture?` (pronounced _capture IF_) method executes _IF_ the match has completed successfully. In contrast the `capture!` method calls its block as soon as its sub-pattern matches. Using `capture!` allows you to pick up values during one phase of the match and then use those values later.
|
203
235
|
|
204
|
-
|
236
|
+
Meanwhile `MATCH` takes a pattern as its argument (like `ARBNO`) but will only match the pattern once. The power in `MATCH` is when it is used with a delayed evaluation block. Together `MATCH` and `capture!` allow for patterns that are much more powerful than simple regexes. For example here is a palindrome matcher:
|
205
237
|
|
206
|
-
|
207
|
-
|
208
|
-
|
209
|
-
|
210
|
-
|
238
|
+
```ruby
|
239
|
+
palindrome = MATCH do | ; c|
|
240
|
+
/\W*/ & LEN(1).capture! { |m| c = m } & /\W*/ & ( palindrome | LEN(1) | LEN(0)) & /\W*/ & MATCH { c }
|
241
|
+
end
|
242
|
+
```
|
211
243
|
|
212
244
|
Lets see it again with some comments
|
213
245
|
|
214
|
-
|
215
|
-
|
216
|
-
|
217
|
-
|
218
|
-
|
219
|
-
|
220
|
-
|
221
|
-
|
222
|
-
|
223
|
-
|
224
|
-
|
225
|
-
|
226
|
-
|
227
|
-
|
228
|
-
|
229
|
-
|
230
|
-
|
231
|
-
|
246
|
+
```ruby
|
247
|
+
palindrome = MATCH do | ; c |
|
248
|
+
|
249
|
+
# By putting the MATCH pattern in a block to be evaluated later
|
250
|
+
# we can use palindrome in its own definition.
|
251
|
+
# Just to keep things clean and robust we declare c
|
252
|
+
# (the character matched) as local to the block.
|
253
|
+
|
254
|
+
/\W*/ & # skip any white space
|
255
|
+
LEN(1).capture! { |m| c = m } & # grab the next character now and save it in c
|
256
|
+
/\W*/ & # skip more white space
|
257
|
+
( # now there are three possibilities:
|
258
|
+
palindrome | # there are more characters on the left side of the palindrome OR
|
259
|
+
LEN(1) | # we are at the middle ODD character OR
|
260
|
+
LEN(0) # the palindrome has an even number of characters
|
261
|
+
) & # now that we have the left half matched, we match the right half
|
262
|
+
/\W*/ & # skip any white space and finally
|
263
|
+
MATCH { c } # match the same character on the left now on the far right
|
264
|
+
|
265
|
+
end
|
266
|
+
|
267
|
+
palindrome.matches?('A man, a plan, a canal, Panama!") # succeeds!
|
268
|
+
```
|
232
269
|
|
233
|
-
Using MATCH to define recursive patterns makes Cannonbol into a full blown BNF parser. See the example [email address parser](
|
270
|
+
Using `MATCH` to define recursive patterns makes Cannonbol into a full blown BNF parser. See the example [email address parser](#a-complete-real-world-example)
|
234
271
|
|
235
272
|
### Advanced capture techniques
|
236
273
|
|
237
|
-
Both capture
|
274
|
+
Both `capture?` and `capture!` have a number of useful features.
|
238
275
|
|
239
276
|
* They can take a block which is passed the matching substring.
|
240
277
|
* As well as the current match, they can pass the current cursor position and the current value of capture variable.
|
@@ -246,85 +283,110 @@ Both capture? and capture! have a number of useful features.
|
|
246
283
|
|
247
284
|
This is the most general way of capturing a submatch. For example
|
248
285
|
|
249
|
-
|
286
|
+
```ruby
|
287
|
+
word = /\W*/ & /\w+/.capture? { |match| words << match } & /\W*/
|
288
|
+
```
|
250
289
|
|
251
290
|
will shovel each word it matches into the words array. You could use it like this:
|
252
291
|
|
253
|
-
|
254
|
-
|
292
|
+
```ruby
|
293
|
+
words = []
|
294
|
+
(ARBNO(word).matches?("a big strange, long sentence!")
|
295
|
+
```
|
255
296
|
|
256
297
|
Using `capture? { |m| puts m }` is handy for debugging your patterns.
|
257
298
|
|
258
299
|
#### Current cursor position
|
259
300
|
|
260
|
-
The second parameter of the capture block will
|
301
|
+
The second parameter of the capture block will receive the current cursor position. For example
|
302
|
+
|
303
|
+
```ruby
|
304
|
+
("i".capture! { |m, p| puts "i found at #{p-1}"} & RPOS(0)).matches("I said hello!", insensitive: true)
|
305
|
+
=> i found at 0
|
306
|
+
=> i found at 4
|
307
|
+
```
|
261
308
|
|
262
|
-
|
263
|
-
=> i found at 0
|
264
|
-
=> i found at 4
|
265
|
-
|
266
|
-
Notice the use of RPOS(0) which will force the pattern to look at every character in the subject, until the pattern finally fails. By using capture! (capture NOW) we record every hit, even though the pattern fails in the end.
|
309
|
+
Notice the use of `RPOS(0)` which will force the pattern to look at every character in the subject, until the pattern finally fails. By using `capture!` (capture _now_) we record every hit, even though the pattern fails in the end.
|
267
310
|
|
268
311
|
#### Using capture variables
|
269
312
|
|
270
313
|
If the capture methods are supplied with a symbol, then the captured value will be saved in an internal capture variable. For example:
|
271
314
|
|
272
|
-
|
273
|
-
|
274
|
-
|
315
|
+
```ruby
|
316
|
+
some_pattern.capture!(:value)
|
317
|
+
```
|
318
|
+
|
319
|
+
would save the string matched by `some_pattern` into the capture variable called `:value`.
|
275
320
|
|
276
321
|
There are a couple of ways to retrieve the capture variables:
|
277
322
|
|
278
|
-
Any primitive pattern that takes a parameter can use the value of a capture variable. So for example `LEN(:foo)` means
|
279
|
-
take the current value of the capture variable
|
323
|
+
Any primitive pattern that takes a parameter can use the value of a capture variable. So for example `LEN(:foo)` means
|
324
|
+
take the current value of the capture variable `:foo` as the parameter to `LEN`.
|
280
325
|
|
281
326
|
We can use this to clean up the palindrome pattern a little bit:
|
282
327
|
|
283
|
-
|
328
|
+
```ruby
|
329
|
+
palindrome = /\W*/ & LEN(1).capture!(:c) & /\W*/ & ( MATCH{palindrome} | LEN(1) | LEN(0) ) & /\W*/ & MATCH(:c)
|
330
|
+
```
|
284
331
|
|
285
|
-
Another way to get the capture variables is to interogate the value returned by
|
332
|
+
Another way to get the capture variables is to interogate the value returned by matches?. The value returned by matches? is a subclass of string, that has some extra methods. One of these is the captured method which gives a hash of all the captured variables. For example:
|
286
333
|
|
287
|
-
|
288
|
-
|
334
|
+
```ruby
|
335
|
+
> ("dog" | "cat").capture?(:pet).matches?("He had a dog named Spot.").captured[:pet]
|
336
|
+
=> dog
|
337
|
+
```
|
289
338
|
|
290
|
-
You can also give a block to the
|
339
|
+
You can also give a block to the matches? method which will be called whether the block passes or not. For example:
|
291
340
|
|
292
|
-
|
293
|
-
|
294
|
-
|
295
|
-
|
341
|
+
```ruby
|
342
|
+
> ("dog" | "cat").capture?(:pet).matches?("He had a dog named Spot."){ |match| match.captured[:pet] if match}
|
343
|
+
=> dog
|
344
|
+
```
|
296
345
|
|
297
|
-
|
298
|
-
|
299
|
-
|
300
|
-
|
346
|
+
The matches? block can also explicitly name any capture variables you need to get the values of. So for example:
|
347
|
+
|
348
|
+
```ruby
|
349
|
+
> pet_data = (POS(0) & ARBNO(("big" | "small").capture?(:size) | ("dog" | "cat").capture?(:pet) | LEN(1)) & RPOS(0))
|
350
|
+
=> #<Cannonbol::Concat .... etc
|
351
|
+
> pet_data.matches?("He has a big dog!") { |m, pet, size| "type of pet: #{pet.upcase}, size: #{size.upcase}"}
|
352
|
+
=> type of pet: DOG, size: BIG
|
353
|
+
```
|
301
354
|
|
302
|
-
If the
|
355
|
+
If the matches? block mentions capture variables that were not assigned in the match they get nil.
|
303
356
|
|
304
357
|
#### Initializing capture variables
|
305
358
|
|
306
359
|
When used as a parameter to a primitve the capture variable may be given an initial value. For example:
|
307
360
|
|
308
|
-
|
361
|
+
```ruby
|
362
|
+
LEN(baz: 12)
|
363
|
+
```
|
309
364
|
|
310
365
|
would match `LEN(12)` if :baz had not yet been set.
|
311
366
|
|
312
367
|
A second way to initialize (or update capture variables) is to combine capture variables with a capture block like this:
|
313
368
|
|
314
|
-
|
315
|
-
|
369
|
+
```ruby
|
370
|
+
some_pattern.capture!(:baz) { |match, position, baz| baz || position * 2 } # initializes :baz to position * 2
|
371
|
+
```
|
372
|
+
|
316
373
|
If a symbol is specified in a capture!, and there is a block, then the symbol will be set to the value returned by the block.
|
317
374
|
|
318
375
|
#### Capturing arrays of data
|
319
376
|
|
320
377
|
To capture all the words into a capture variable as an array you could do this:
|
321
378
|
|
322
|
-
|
323
|
-
|
324
|
-
|
325
|
-
|
379
|
+
```ruby
|
380
|
+
words = []
|
381
|
+
word = /\W*/ & /\w+/.capture?(:words) { |match| words << match } & /\W*/
|
382
|
+
ARBNO(word)
|
383
|
+
```
|
384
|
+
|
385
|
+
The `word` pattern can be shortened to:
|
326
386
|
|
327
|
-
|
387
|
+
```ruby
|
388
|
+
word = /\W*/ & /\w+/.capture?(:words => []) & /\W*/
|
389
|
+
```
|
328
390
|
|
329
391
|
This works because anytime there is a 1) capture with a capture variable that is 2) holding an array, 3) that does NOT have a block, capture method will go ahead and shovel the captured value into the capture variable. Note this behavior can be overriden if needed by including a block.
|
330
392
|
|
@@ -333,25 +395,28 @@ This works because anytime there is a 1) capture with a capture variable that is
|
|
333
395
|
Each time MATCH, or ARBNO is called the current state of any known capture variables are saved, and those values will be restored when the MATCH/ARBNO exits. If new capture variables are introduced by the nested pattern, these new values will be merged with the existing set of variables.
|
334
396
|
|
335
397
|
More powerful yet is the fact that every match string sent to a capture variable has access to all the values captured so far via the captured method. For example:
|
336
|
-
|
337
|
-
|
338
|
-
|
339
|
-
|
340
|
-
|
341
|
-
|
342
|
-
|
343
|
-
|
398
|
+
|
399
|
+
```ruby
|
400
|
+
> subject_clause = article & noun.capture!(:subject) ;
|
401
|
+
* object_clause = article & noun.capture!(:object);
|
402
|
+
* verb_clause = ...
|
403
|
+
* sentence = (subject_clause & verb_clause & object_clause & ".");
|
404
|
+
* sentences = ARBNO(sentence.capture?(:sentences => [])) & RPOS(0);
|
405
|
+
* sentences.matches(file_stream).captured[:sentences].collect(&:captured)
|
406
|
+
=> [{:subject => "dog", :object => "man"}, {:subject => "man", :object => "dog} ...]
|
407
|
+
```
|
344
408
|
|
345
409
|
As each noun is matched, it is captured and saved in :subject or :object. When the sentence is captured, the match is shoveled away into the :sentences variable. Because the match value itself responds to the captured method we end up with a all the data collected in a nice array.
|
346
410
|
|
347
|
-
Note that capture! is used for capturing the nouns. This is cheaper and does not hurt anything since the value of
|
411
|
+
Note that capture! is used for capturing the nouns. This is cheaper and does not hurt anything since the value of
|
348
412
|
the capture variable will just be overwritten.
|
349
|
-
|
413
|
+
|
350
414
|
### Advanced PRIMITIVES
|
351
415
|
|
352
416
|
There are few more SNOBOL4 + SPITBOL primitives that are included for completeness.
|
353
417
|
|
354
418
|
`FENCE` matches the empty string, but will fail if there is an attempt to backtrack through the FENCE.
|
419
|
+
|
355
420
|
`FENCE(pattern)` will attempt to match pattern, but if an attempt is made to backtrack through the FENCE the pattern will fail.
|
356
421
|
|
357
422
|
The difference is that FENCE will fail the whole match, but FENCE(pattern) will just fail the subpattern.
|
@@ -360,18 +425,20 @@ The difference is that FENCE will fail the whole match, but FENCE(pattern) will
|
|
360
425
|
|
361
426
|
`FAIL` will never match anything, and will force the matcher to backtrack and retry the next alternative.
|
362
427
|
|
363
|
-
`SUCCEED` will force the match to retry. The only that gets passed `SUCCEED` is `ABORT`.
|
428
|
+
`SUCCEED` will force the match to retry. The only pattern that gets passed `SUCCEED` is `ABORT`.
|
364
429
|
|
365
430
|
These can be used together to do some interesting things. For example
|
366
431
|
|
367
|
-
|
368
|
-
|
369
|
-
|
370
|
-
|
371
|
-
|
372
|
-
|
373
|
-
|
374
|
-
|
432
|
+
```ruby
|
433
|
+
> pattern = POS(0) & SUCCEED & (FENCE(TAB(n: 1).capture!(:n) { |m, p, n| puts m; p+1 } | ABORT)) & FAIL;
|
434
|
+
* pattern.matches?("abcd")
|
435
|
+
a
|
436
|
+
ab
|
437
|
+
abc
|
438
|
+
abcd
|
439
|
+
=> nil
|
440
|
+
```
|
441
|
+
|
375
442
|
The SUCCEED and FAIL primitives keep forcing the matcher to retry. Eventually the TAB will fail causing the ABORT alternative to execute the matcher.
|
376
443
|
|
377
444
|
So it goes like this
|
@@ -379,55 +446,77 @@ So it goes like this
|
|
379
446
|
SUCCEED
|
380
447
|
TAB(1)
|
381
448
|
FAIL
|
382
|
-
|
449
|
+
SUCCEED
|
383
450
|
TAB(2)
|
384
451
|
etc...
|
385
|
-
|
386
|
-
The FENCE keeps the matcher from backtracking into the ABORT option too early.
|
452
|
+
|
453
|
+
The FENCE keeps the matcher from backtracking into the ABORT option too early. Note that FENCE prevents backtracking through the level it is on,
|
454
|
+
however we can backtrack *around* the FENCE and into the SUCCEED, which forces us to retry the FENCE but with a new value of `n`.
|
387
455
|
|
388
456
|
### A complete real world example
|
389
457
|
|
390
458
|
Cannonbol can be used to easily translate the email BNF spec into an email address parser.
|
391
459
|
|
392
|
-
|
393
|
-
|
394
|
-
|
395
|
-
|
396
|
-
|
397
|
-
|
398
|
-
|
399
|
-
|
400
|
-
|
401
|
-
|
402
|
-
|
403
|
-
|
404
|
-
|
405
|
-
|
406
|
-
|
407
|
-
|
408
|
-
|
409
|
-
|
460
|
+
```ruby
|
461
|
+
ws = /\s*/
|
462
|
+
quoted_string = ws & '"' & ARBNO(NOTANY('"\\') | '\\"' | '\\\n' | '\\\\') & '"' & ws
|
463
|
+
atom = ws & SPAN("!#$%&'*+-/0123456789=?@ABCDEFGHIJKLMNOPQRSTUVWXYZ^_`abcdefghijklmnopqrstuvwxyz{|}~") & ws
|
464
|
+
word = (atom | quoted_string)
|
465
|
+
phrase = word & ARBNO(word)
|
466
|
+
domain_ref = atom
|
467
|
+
domain_literal = "[" & /[0-9]+/ & ARBNO(/\.[0-9]+/) & "]"
|
468
|
+
sub_domain = domain_ref | domain_literal
|
469
|
+
domain = (sub_domain & ARBNO("." & sub_domain)).capture?(:domain) { |m| m.strip }
|
470
|
+
local_part = (word & ARBNO("." & word)).capture?(:local_part) { |m| m.strip }
|
471
|
+
addr_spec = (local_part & "@" & domain)
|
472
|
+
route = (ws & "@" & domain & ARBNO("@" & domain)).capture?(:route) { |m| m.strip } & ":"
|
473
|
+
route_addr = "<" & ((route | "") & addr_spec).capture?(:mailbox) { |m| m.strip } & ">"
|
474
|
+
mailbox = (addr_spec.capture?(:mailbox) { |m| m.strip } |
|
475
|
+
(phrase.capture?(:display_name) { |m| m.strip } & route_addr))
|
476
|
+
group = (phrase.capture?(:group_name) { |m| m.strip } & ":" &
|
477
|
+
(( mailbox.capture?(group_mailboxes: []) & ARBNO("," & mailbox.capture?(:group_mailboxes) ) ) | ws)) & ";"
|
478
|
+
address = POS(0) & (mailbox | group ) & RPOS(0)
|
479
|
+
```
|
410
480
|
|
411
481
|
So for example we can even parse an obscure email with groups and routes
|
412
482
|
|
413
|
-
|
414
|
-
|
415
|
-
|
416
|
-
|
417
|
-
|
418
|
-
|
419
|
-
|
420
|
-
|
483
|
+
```ruby
|
484
|
+
> email = 'here is my "big fat \\\n groupen" : someone@catprint.com, Fred Nurph<@sub1.sub2@sub3.sub4:fred.nurph@catprint.com>;'
|
485
|
+
=> here is my "big fat \\\n groupen" : someone@catprint.com, Fred Nurph<@sub1.sub2@sub3.sub4:fred.nurph@catprint.com>;
|
486
|
+
> match = address.matches?(email)
|
487
|
+
=> here is my "big fat \\\n groupen" : someone@catprint.com, Fred Nurph<@sub1.sub2@sub3.sub4:fred.nurph@catprint.com>;
|
488
|
+
> match.captured[:group_mailboxes].first.captured[:mailbox]
|
489
|
+
=> someone@catprint.com
|
490
|
+
> match.captured[:group_name]
|
491
|
+
=> here is my "big fat \\\n groupen
|
492
|
+
```
|
493
|
+
|
494
|
+
### Backward Compatibility (`matches?` and `insensitive` methods)
|
495
|
+
|
496
|
+
If you have existing Cannonbol code and are using Ruby 2.5+ you will need to either replace
|
497
|
+
uses of the `match?` method and the `-` unary operator with `matches?` and `insensitive`.
|
421
498
|
|
499
|
+
Alternatively you can add the following code to your application:
|
500
|
+
|
501
|
+
```ruby
|
502
|
+
class Regexp
|
503
|
+
include Cannonbol::CompatibilityAdapter
|
504
|
+
end
|
505
|
+
|
506
|
+
class String
|
507
|
+
include Cannonbol::CompatibilityAdapter
|
508
|
+
end
|
509
|
+
```
|
422
510
|
|
423
511
|
## Development
|
424
512
|
|
425
|
-
After checking out the repo, run `bundle install` to install dependencies.
|
513
|
+
After checking out the repo, run `bundle install` to install dependencies.
|
426
514
|
|
427
515
|
### Specs
|
428
516
|
|
429
517
|
Run `bundle exec rspec` to run the tests on your server environment
|
430
|
-
|
518
|
+
|
519
|
+
> Testing on Opal is currently broken because of compatibility issues with Opal 1.1 and opal-rspec. However everything should work fine with Opal.
|
431
520
|
|
432
521
|
## Contributing
|
433
522
|
|