cannonbol 1.3.0 → 2.0.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
- SHA1:
3
- metadata.gz: a8b090116c27ac8cf008c2a0b54b5aa2eb259bf4
4
- data.tar.gz: 6899f4b674be76248a133b664870632eae5d48bd
2
+ SHA256:
3
+ metadata.gz: 62a3b9c14f45c3694de243dee4883528c9dbd6f2ddc45e05784cb16f5841747f
4
+ data.tar.gz: b07d05835793f4f5d6ffe018c2c62f5adc64e59e1a49d11a5f25481d604ff89e
5
5
  SHA512:
6
- metadata.gz: 9e820fa3e6b2758aa2e5fd67d7b9d24c6482bc96ad33b4b2763b071f3d73d5b8944fe5da77e56bdfd1fb03df303f4a13c024d3cb0884745fac100ddc4429baec
7
- data.tar.gz: 201b97e9cdf60edbf9b3c5b84d9801b7883f06371f2a23c234616c4e0662ccc7d8eb716d9e52fb01cdf422ce458377881fc71fe550e8e33cff5dfca7d3d5f06a
6
+ metadata.gz: 5742f625d9be04482f83e261f17260f24dd78e88ceb750eb436457d0dcf2cde39ead41b95fb872a28fc248faaeadd02304489ecf1b66dbcd7bd056450acc14dd
7
+ data.tar.gz: e7839e4136869b91cf53a3c28d9bdb5b9aa4822482365e12599436be910abc9792a4ccea5a924449775d4e7c143ba89d53f35648fc03ad8a91ad814d5852df57
data/README.md CHANGED
@@ -27,107 +27,133 @@ Or install it yourself as:
27
27
 
28
28
  $ gem install cannonbol
29
29
 
30
+ ## Compatibility With Ruby 2.5
31
+
32
+ > In Ruby 2.5 the methods `#match?` and `#-@` were added to strings and regex patterns. These methods were also used to build
33
+ patterns in Cannonbol. The latest version of Cannonbol uses the `#matches?` and `#insensitive` methods instead. See the
34
+ end of the readme for using your old patterns with the latest Cannonbol.
35
+
30
36
  ## Lets Go!
31
37
 
32
38
  ### Basic Matching `- &, |, capture?, match_any, match_all`
33
39
 
34
- Strings, Regexes and primitives are combined using & (concatenation) and | (alternation) operators. Once the gem is installed you are good to go.
40
+ Strings, Regexes and primitives are combined using & (concatenation) and | (alternation) operators.
35
41
 
36
42
  Here is a simple pattern that matches a simple noun clause:
37
43
 
38
- > ("a" | "the") & /\s+/ & ("boy" | "girl")
39
-
44
+ ```ruby
45
+ > ("a" | "the") & /\s+/ & ("boy" | "girl")
46
+ ```
47
+
40
48
  This will match either "a" or "the" followed white space and then by "boy or "girl". Okay! Lets use it!
41
49
 
42
- > ("a" | "the") & /\s+/ & ("boy" | "girl").match?("he saw a boy going home")
43
- => "a boy"
44
- > ("a" | "the") & /\s+/ & ("boy" | "girl").match?("he saw a big boy going home")
45
- => nil
50
+ ```ruby
51
+ > ("a" | "the") & /\s+/ & ("boy" | "girl").matches?("he saw a boy going home")
52
+ => "a boy"
53
+ > ("a" | "the") & /\s+/ & ("boy" | "girl").matches?("he saw a big boy going home")
54
+ => nil
55
+ ```
46
56
 
47
57
  Now let's save the pieces of the match using the `capture?` (pronounced _capture IF_) method:
48
58
 
49
- > article, noun = nil, nil;
50
- * pattern = ("a" | "the").capture? { |m| article = m } & /\s+/ & ("boy" | "girl").capture? { |m| noun = m };
51
- * pattern.match?("he saw the girl going home")
52
- => the girl
53
- > noun
54
- => girl
55
- > article
56
- => the
59
+ ```ruby
60
+ > article, noun = nil, nil;
61
+ * pattern = ("a" | "the").capture? { |m| article = m } & /\s+/ & ("boy" | "girl").capture? { |m| noun = m };
62
+ * pattern.matches?("he saw the girl going home")
63
+ => the girl
64
+ > noun
65
+ => girl
66
+ > article
67
+ => the
68
+ ```
57
69
 
58
- The `capture?` method and its friend `capture!` (pronounced _capture NOW_) have many powerful features.
70
+ The `capture?` method and its friend `capture!` (pronounced _capture NOW_) have many powerful features.
59
71
  As shown above it can take a block which is passed the matching substring, _IF the match succeeds_.
60
- The other features of the capture method will be detailed [below.](Advanced capture techniques)
72
+ The other features of the capture method will be detailed [below.](#advanced-capture-techniques)
61
73
 
62
74
  Arrays can be turned into patterns using the `match_any` and `match_all` methods:
63
75
 
64
- ARTICLES = ["a", "the"]
65
- NOUNS = ["boy", "girl", "dog", "cat"]
66
- ADJECTIVES = ["big", "small", "fierce", "friendly"]
67
- WS = /\s+/
68
- [ARTICLES.match_any, [WS, [WS, ADJECTIVES.match_any, WS].match_all].match_any, NOUNS.match_any].match_all
69
-
70
- This is equivilent to
76
+ ```ruby
77
+ ARTICLES = ["a", "the"]
78
+ NOUNS = ["boy", "girl", "dog", "cat"]
79
+ ADJECTIVES = ["big", "small", "fierce", "friendly"]
80
+ WS = /\s+/
81
+ [ARTICLES.match_any, [WS, [WS, ADJECTIVES.match_any, WS].match_all].match_any, NOUNS.match_any].match_all
82
+ ```
83
+
84
+ This is equivilent to
85
+
86
+ ```ruby
87
+ ("a" | "the") & (WS | (WS & ("big" | "small" | "fierce" | "friendly") & WS)) & ("boy" | "girl" | "dog" | "cat")
88
+ ```
71
89
 
72
- ("a" | "the") & (WS | (WS & ("big" | "small" | "fierce" | "friendly") & WS)) & ("boy" | "girl" | "dog" | "cat")
73
-
74
- ### match? options
90
+ ### The matches? options
75
91
 
76
- The match? method shows above takes a couple of options to globally control the match process:
92
+ The `matches?` method shown above takes a couple of options to globally control the match process:
77
93
 
78
94
  option | default | meaning
79
95
  ------|-----|-----
80
- ignore_case | false | When on, the basic regex and string pattern will NOT be case sensitive.
81
- anchor | false | When on pattern matching must begin at the first character. Normally the matcher will keep moving the starting character to the right, until the match suceeds.
82
- raise_error | false | When on, a match failure will raise Cannonbol::MatchFailed.
83
- replace_with | nil | When a non-falsy value is supplied, the value will replace the matched portion of the string, and the entire string will be returned. Normally only the matched portion of the string is returned.
96
+ `insensitive` | `false` | When on, the basic regex and string pattern will NOT be case sensitive. Note you can also use `ignore_case`
97
+ `anchor` | `false` | When on, pattern matching must begin at the first character. Normally the matcher will keep moving the starting character to the right, until the match suceeds.
98
+ `raise_error` | `false` | When on, a match failure will raise Cannonbol::MatchFailed.
99
+ `replace_with` | `nil` | When a non-falsy value is supplied, the value will replace the matched portion of the string, and the entire string will be returned. Normally only the matched portion of the string is returned.
84
100
 
85
101
  Example of replace with:
86
102
 
87
- > "hello".match?("She said hello!")
88
- => hello
89
- > "hello".match?("She said hello!", replace_with => "goodby")
90
- => She said goodby!
91
-
103
+ ```ruby
104
+ > "hello".matches?("She said hello!")
105
+ => hello
106
+ > "hello".matches?("She said hello!", replace_with => "goodby")
107
+ => She said goodby!
108
+ ```
109
+
92
110
  #### Ignore case on a subpattern
93
111
 
94
- Sometimes its useful to run the matcher in the default case sensitive mode, and only turn off matching for one part of the pattern. To do this
95
- prefix a subpattern with a "-". For example
112
+ Sometimes it's useful to run the matcher in the default case sensitive mode, and only turn off matching for one part of the pattern. To do this
113
+ use the `#insensitive` method. For example
96
114
 
97
- > (-"GIRL" | "boy").match?("A big girl!")
98
- => girl
99
- > (-"GIRL" | "boy").match?("A big BOY!")
100
- => nil
101
-
102
- ### Patterns, Subjects, Cursors, Alternatives, and Backtracking
115
+ ```ruby
116
+ > ("GIRL".insensitive | "boy").matches?("A big girl!")
117
+ => girl
118
+ > ("GIRL".insensitive | "boy").matches?("A big BOY!")
119
+ => nil
120
+ ```
103
121
 
104
- A pattern is an object that responds to the match? method. Cannonbol adds the match? method to Ruby strings, and regexes, and provides a number of _primitive_ patterns. A pattern can be combined with another pattern using the &, and | operators. There are also several primitive patterns that take a pattern and create a new pattern. Here are some example patterns:
122
+ ### Patterns, Subjects, Cursors, Alternatives, and Backtracking
105
123
 
106
- "hello" # matches any string containing hello
107
- /\s+/ # matches one or more white space characters
108
- "hello" & /\s+/ & "there" # matches "hello" and "there" seperated by white space
109
- "hello" | "goodby" # matches EITHER "hello" or "there"
110
- ARB # a primitive pattern that matches anything (similar to /.*/)
111
- ("hello" | "goodby") & ARB & "Fred" # matches "hello" or "goodby" followed by any characters and finally "Fred"
124
+ A pattern is an object that responds to the matches? method. Cannonbol adds the matches? method to Ruby strings, and regexes, and provides a number of _primitive_ patterns. A pattern can be combined with another pattern using the &, and | operators. There are also several primitive patterns that take a pattern and create a new pattern. Here are some example patterns:
125
+
126
+ ```ruby
127
+ "hello" # matches any string containing hello
128
+ /\s+/ # matches one or more white space characters
129
+ "hello" & /\s+/ & "there" # matches "hello" and "there" seperated by white space
130
+ "hello" | "goodby" # matches EITHER "hello" or "goodby"
131
+ ARB # a primitive pattern that matches anything (similar to /.*/)
132
+ ("hello" | "goodby") & ARB & "Fred" # matches "hello" or "goodby" followed by any characters and finally "Fred"
133
+ ```
112
134
 
113
135
  Patterns are just objects, so they can be assigned to variables:
114
136
 
115
- greeting = "hello" | "goodby"
116
- names = "Fred" | "Suzy"
117
- ws = /\s+/
118
- greeting & ws & names # matches "hello Fred" or "goodby Suzy"
137
+ ```ruby
138
+ greeting = "hello" | "goodby"
139
+ names = "Fred" | "Suzy"
140
+ ws = /\s+/
141
+ greeting & ws & names # matches "hello Fred" or "goodby Suzy"
142
+ ```
119
143
 
120
- The first parameter of the match? method is the subject string. The subject string is matched left to right driven by the pattern object. Normally the matcher will attempt to match starting at the first character. If no match is found, then
121
- matching begins again one character to the right. This continues until a match is made, or there are insufficient characters to make a match. This behavior can be turned off by specifying `anchor: true` in the match? options hash.
144
+ The first parameter of the matches? method is the subject string. The subject string is matched left to right driven by the pattern object. Normally the matcher will attempt to match starting at the first character. If no match is found, then
145
+ matching begins again one character to the right. This continues until a match is made, or there are insufficient characters to make a match. This behavior can be turned off by specifying `anchor: true` in the matches? options hash.
122
146
 
123
147
  The current position of the matcher in the string is the _cursor_. The cursor begins at zero and as each character is matched it moves to the right. If the match fails (and anchor is false) then the match is restarted with the cursor at position 1, etc.
124
148
 
125
- Alternatives are considered left to right as specified in the pattern. Once an alternative is matched, the matcher moves on to the next part of the match, but it does remember the alternative, and if matching fails at a later component, the matcher will back up and try the next alternative. For example:
149
+ Alternatives are considered left to right as specified in the pattern. Once an alternative is matched, the matcher moves on to the next element in the pattern, but it does remember the alternative, and if matching fails at a later pattern, the matcher will back up and try the next alternative. For example:
126
150
 
127
- a_pattern = "a" | "aaa" | "aa"
128
- b_pattern = "b" | "aaabb" | "abbbc"
129
- c_pattern = "cc"
130
- (a_pattern & b_pattern & c_pattern).match?("aaabbbccc")
151
+ ```ruby
152
+ a_pattern = "a" | "aaa" | "aa"
153
+ b_pattern = "b" | "aaabb" | "abbbc"
154
+ c_pattern = "cc"
155
+ (a_pattern & b_pattern & c_pattern).matches?("aaabbbccc")
156
+ ```
131
157
 
132
158
  * "a" is matched from a_pattern, and then we move to b_pattern.
133
159
  * None of the alternatives in b_pattern can match, so we backtrack and try the next alterntive in the a_pattern,
@@ -138,7 +164,7 @@ Alternatives are considered left to right as specified in the pattern. Once an
138
164
  * "aa" now matches, and so we move to the b_pattern, which can only match its last alternative, and
139
165
  * finally we complete the match!
140
166
 
141
- For a more complete explanation see the [SNOBOL4 manual Chapter 2](http://www.math.bas.bg/bantchev/place/snobol/gpp-2ed.pdf)
167
+ For a more complete explanation see the [SNOBOL4 manual Chapter 2](https://github.com/catprintlabs/cannonbol/blob/master/snobol4-language-reference.pdf)
142
168
 
143
169
  Bottom line is the matcher will try every possible option until a match is made or the match fails.
144
170
 
@@ -148,21 +174,23 @@ Cannonbol includes the complete set of SNOBOL4 + SPITBOL primitive patterns and
148
174
 
149
175
  `REM` Match 0 or more characters to the end of the subject string.
150
176
 
151
- `("the" & REM).match?("he saw the small boy") === "the small boy"`
177
+ `("the" & REM).matches?("he saw the small boy") === "the small boy"`
152
178
 
153
179
  `ARB` Match 0 or more characters. ARB first tries to match zero characters, then 1 character, then 2 until the match succeeds. It is roughly equivilent to `\.*\`, except the regex will NOT backtrack like ARB will.
154
180
 
155
- `("the" & ARB & "boy").match?("he saw the small boy running") === "the small boy"`
181
+ `("the" & ARB & "boy").matches?("he saw the small boy running") === "the small boy"`
156
182
 
157
183
  `LEN(n)` Match any n characters. Equivilent to `\.{n}\`
158
184
 
159
185
  `POS(x)` Match ONLY if current cursor is at x. POS(0) is the start of the string.
160
186
 
161
- `(POS(5) & ARB & POS(7)).match?("01234567") === "567"`
187
+ `(POS(5) & ARB & POS(7)).matches?("01234567") === "56"`
162
188
 
163
189
  `RPOS(x)` Just like POS except measured from the end of the string. I.e. RPOS(0) is just after the last character.
164
190
 
165
- `("hello" & RPOS(0)).match?("she said hello!")` would fail.
191
+ `("hello" & RPOS(0)).matches?("she said hello!")` would fail.
192
+
193
+ `("hello" & RPOS(1)).matches?("she said hello!")` would succeed.
166
194
 
167
195
  `TAB(x)` Is equivilent to `ARB & POS(x)`. In otherwords match zero or more characters up to the x'th character. Fails if x < the current cursor.
168
196
 
@@ -184,57 +212,66 @@ Cannonbol includes the complete set of SNOBOL4 + SPITBOL primitive patterns and
184
212
 
185
213
  ### Delayed Evaluation of Primitive Pattern Parameters
186
214
 
187
- There are several cases where it is useful to delay the evaluation of a primitive pattern arguments until the match is
215
+ There are several cases where it is useful to delay the evaluation of a primitive pattern arguments until the match is
188
216
  being made, rather than when the pattern is created.
189
217
 
190
218
  To allow for this all primitive patterns can take a block. The block is evaluated when the matcher encounters the primitive, and the result of the block is used as the argument to the pattern.
191
219
 
192
220
  Here is a method that will parse a set of fixed width fields, where the widths are supplied as arguments to the method:
193
221
 
194
- def parse(s, *widths)
222
+ ```ruby
223
+ def parse(s, *widths)
195
224
  fields = []
196
- (ARBNO(LEN {widths.shift}.capture? {|field| fields << field}) & RPOS(0)).match?(s)
225
+ (ARBNO(LEN {widths.shift}.capture? {|field| fields << field}) & RPOS(0)).matches?(s)
197
226
  fields
198
227
  end
228
+ ```
199
229
 
200
230
  To really get into the power of delayed evaluation however we need to add two more concepts:
201
231
 
202
- The MATCH primitive, and the capture! (pronounced _capture NOW_) method.
232
+ The `MATCH` primitive, and the `capture!` (pronounced _capture NOW_) method.
233
+
234
+ The `capture?` (pronounced _capture IF_) method executes _IF_ the match has completed successfully. In contrast the `capture!` method calls its block as soon as its sub-pattern matches. Using `capture!` allows you to pick up values during one phase of the match and then use those values later.
203
235
 
204
- The capture? (pronounced _capture IF_) method executes when the match has completed successfully. In contrast the capture! method calls its block as soon as its sub-pattern matches. Using capture! allows you to pick up values during one phase of the match and then use those values later.
236
+ Meanwhile `MATCH` takes a pattern as its argument (like `ARBNO`) but will only match the pattern once. The power in `MATCH` is when it is used with a delayed evaluation block. Together `MATCH` and `capture!` allow for patterns that are much more powerful than simple regexes. For example here is a palindrome matcher:
205
237
 
206
- Meanwhile MATCH takes a pattern as its argument (like ARBNO) but will only match the pattern once. The power in MATCH is when it is used with a delayed evaluation block. Together MATCH and capture! allow for patterns that are much more powerful than simple regexes. For example here is a palindrome matcher:
207
-
208
- palindrome = MATCH do | ; c|
209
- /\W*/ & LEN(1).capture! { |m| c = m } & /\W*/ & ( palindrome | LEN(1) | LEN(0)) & /\W*/ & MATCH { c }
210
- end
238
+ ```ruby
239
+ palindrome = MATCH do | ; c|
240
+ /\W*/ & LEN(1).capture! { |m| c = m } & /\W*/ & ( palindrome | LEN(1) | LEN(0)) & /\W*/ & MATCH { c }
241
+ end
242
+ ```
211
243
 
212
244
  Lets see it again with some comments
213
245
 
214
- palindrome = MATCH do | ; c |
215
- # By putting the MATCH pattern in a block to be evaluated later we can use palindrome in its definition.
216
- # Just to keep things clean and robust we declare c (the character matched) as local to the block.
217
-
218
- /\W*/ & # skip any white space
219
- LEN(1).capture! { |m| c = m } & # grab the next character now and save it in c
220
- /\W*/ & # skip more white space
221
- ( # now there are three possibilities:
222
- palindrome | # there are more characters on the left side of the palindrome OR
223
- LEN(1) | # we are at the middle ODD character OR
224
- LEN(0) # the palindrome has an even number of characters
225
- ) & # now that we have the left half matched, we match the right half
226
- /\W*/ & # skip any white space and finally
227
- MATCH { c } # match the same character on the left now on the far right
228
-
229
- end
230
-
231
- palindrome.match?('A man, a plan, a canal, Panama!")
246
+ ```ruby
247
+ palindrome = MATCH do | ; c |
248
+
249
+ # By putting the MATCH pattern in a block to be evaluated later
250
+ # we can use palindrome in its own definition.
251
+ # Just to keep things clean and robust we declare c
252
+ # (the character matched) as local to the block.
253
+
254
+ /\W*/ & # skip any white space
255
+ LEN(1).capture! { |m| c = m } & # grab the next character now and save it in c
256
+ /\W*/ & # skip more white space
257
+ ( # now there are three possibilities:
258
+ palindrome | # there are more characters on the left side of the palindrome OR
259
+ LEN(1) | # we are at the middle ODD character OR
260
+ LEN(0) # the palindrome has an even number of characters
261
+ ) & # now that we have the left half matched, we match the right half
262
+ /\W*/ & # skip any white space and finally
263
+ MATCH { c } # match the same character on the left now on the far right
264
+
265
+ end
266
+
267
+ palindrome.matches?('A man, a plan, a canal, Panama!") # succeeds!
268
+ ```
232
269
 
233
- Using MATCH to define recursive patterns makes Cannonbol into a full blown BNF parser. See the example [email address parser](A complete real world example)
270
+ Using `MATCH` to define recursive patterns makes Cannonbol into a full blown BNF parser. See the example [email address parser](#a-complete-real-world-example)
234
271
 
235
272
  ### Advanced capture techniques
236
273
 
237
- Both capture? and capture! have a number of useful features.
274
+ Both `capture?` and `capture!` have a number of useful features.
238
275
 
239
276
  * They can take a block which is passed the matching substring.
240
277
  * As well as the current match, they can pass the current cursor position and the current value of capture variable.
@@ -246,85 +283,110 @@ Both capture? and capture! have a number of useful features.
246
283
 
247
284
  This is the most general way of capturing a submatch. For example
248
285
 
249
- word = /\W*/ & /\w+/.capture? { |match| words << match } & /\W*/
286
+ ```ruby
287
+ word = /\W*/ & /\w+/.capture? { |match| words << match } & /\W*/
288
+ ```
250
289
 
251
290
  will shovel each word it matches into the words array. You could use it like this:
252
291
 
253
- words = []
254
- (ARBNO(word).match?("a big strange, long sentence!")
292
+ ```ruby
293
+ words = []
294
+ (ARBNO(word).matches?("a big strange, long sentence!")
295
+ ```
255
296
 
256
297
  Using `capture? { |m| puts m }` is handy for debugging your patterns.
257
298
 
258
299
  #### Current cursor position
259
300
 
260
- The second parameter of the capture block will recieve the current cursor position. For example
301
+ The second parameter of the capture block will receive the current cursor position. For example
302
+
303
+ ```ruby
304
+ ("i".capture! { |m, p| puts "i found at #{p-1}"} & RPOS(0)).matches("I said hello!", insensitive: true)
305
+ => i found at 0
306
+ => i found at 4
307
+ ```
261
308
 
262
- ("i".capture! { |m, p| puts "i found at #{p-1}"} & RPOS(0)).match("I said hello!", ignore_case: true)
263
- => i found at 0
264
- => i found at 4
265
-
266
- Notice the use of RPOS(0) which will force the pattern to look at every character in the subject, until the pattern finally fails. By using capture! (capture NOW) we record every hit, even though the pattern fails in the end.
309
+ Notice the use of `RPOS(0)` which will force the pattern to look at every character in the subject, until the pattern finally fails. By using `capture!` (capture _now_) we record every hit, even though the pattern fails in the end.
267
310
 
268
311
  #### Using capture variables
269
312
 
270
313
  If the capture methods are supplied with a symbol, then the captured value will be saved in an internal capture variable. For example:
271
314
 
272
- some_pattern.capture!(:value)
273
-
274
- would save the string matched by some_pattern into the capture variable called :value.
315
+ ```ruby
316
+ some_pattern.capture!(:value)
317
+ ```
318
+
319
+ would save the string matched by `some_pattern` into the capture variable called `:value`.
275
320
 
276
321
  There are a couple of ways to retrieve the capture variables:
277
322
 
278
- Any primitive pattern that takes a parameter can use the value of a capture variable. So for example `LEN(:foo)` means
279
- take the current value of the capture variable :foo as the parameter to LEN.
323
+ Any primitive pattern that takes a parameter can use the value of a capture variable. So for example `LEN(:foo)` means
324
+ take the current value of the capture variable `:foo` as the parameter to `LEN`.
280
325
 
281
326
  We can use this to clean up the palindrome pattern a little bit:
282
327
 
283
- palindrome = /\W*/ & LEN(1).capture!(:c) & /\W*/ & ( MATCH{palindrome} | LEN(1) | LEN(0) ) & /\W*/ & MATCH(:c)
328
+ ```ruby
329
+ palindrome = /\W*/ & LEN(1).capture!(:c) & /\W*/ & ( MATCH{palindrome} | LEN(1) | LEN(0) ) & /\W*/ & MATCH(:c)
330
+ ```
284
331
 
285
- Another way to get the capture variables is to interogate the value returned by match?. The value returned by match? is a subclass of string, that has some extra methods. One of these is the captured method which gives a hash of all the captured variables. For example:
332
+ Another way to get the capture variables is to interogate the value returned by matches?. The value returned by matches? is a subclass of string, that has some extra methods. One of these is the captured method which gives a hash of all the captured variables. For example:
286
333
 
287
- > ("dog" | "cat").capture?(:pet).match?("He had a dog named Spot.").captured[:pet]
288
- => dog
334
+ ```ruby
335
+ > ("dog" | "cat").capture?(:pet).matches?("He had a dog named Spot.").captured[:pet]
336
+ => dog
337
+ ```
289
338
 
290
- You can also give a block to the match? method which will be called whether the block passes or not. For example:
339
+ You can also give a block to the matches? method which will be called whether the block passes or not. For example:
291
340
 
292
- > ("dog" | "cat").capture?(:pet).match?("He had a dog named Spot."){ |match| match.captured[:pet] if match}
293
- => dog
294
-
295
- The match? block can also explicitly name any capture variables you need to get the values of. So for example:
341
+ ```ruby
342
+ > ("dog" | "cat").capture?(:pet).matches?("He had a dog named Spot."){ |match| match.captured[:pet] if match}
343
+ => dog
344
+ ```
296
345
 
297
- > pet_data = (POS(0) & ARBNO(("big" | "small").capture?(:size) | ("dog" | "cat").capture?(:pet) | LEN(1)) & RPOS(0))
298
- => #<Cannonbol::Concat .... etc
299
- > pet_data.match?("He has a big dog!") { |m, pet, size| "type of pet: #{pet.upcase}, size: #{size.upcase}"}
300
- => type of pet: DOG, size: BIG
346
+ The matches? block can also explicitly name any capture variables you need to get the values of. So for example:
347
+
348
+ ```ruby
349
+ > pet_data = (POS(0) & ARBNO(("big" | "small").capture?(:size) | ("dog" | "cat").capture?(:pet) | LEN(1)) & RPOS(0))
350
+ => #<Cannonbol::Concat .... etc
351
+ > pet_data.matches?("He has a big dog!") { |m, pet, size| "type of pet: #{pet.upcase}, size: #{size.upcase}"}
352
+ => type of pet: DOG, size: BIG
353
+ ```
301
354
 
302
- If the match? block mentions capture variables that were not assigned in the match they get nil.
355
+ If the matches? block mentions capture variables that were not assigned in the match they get nil.
303
356
 
304
357
  #### Initializing capture variables
305
358
 
306
359
  When used as a parameter to a primitve the capture variable may be given an initial value. For example:
307
360
 
308
- LEN(baz: 12)
361
+ ```ruby
362
+ LEN(baz: 12)
363
+ ```
309
364
 
310
365
  would match `LEN(12)` if :baz had not yet been set.
311
366
 
312
367
  A second way to initialize (or update capture variables) is to combine capture variables with a capture block like this:
313
368
 
314
- some_pattern.capture!(:baz) { |match, position, baz| baz || position * 2 } # initializes :baz to position * 2
315
-
369
+ ```ruby
370
+ some_pattern.capture!(:baz) { |match, position, baz| baz || position * 2 } # initializes :baz to position * 2
371
+ ```
372
+
316
373
  If a symbol is specified in a capture!, and there is a block, then the symbol will be set to the value returned by the block.
317
374
 
318
375
  #### Capturing arrays of data
319
376
 
320
377
  To capture all the words into a capture variable as an array you could do this:
321
378
 
322
- words = []
323
- word = /\W*/ & /\w+/.capture?(:words) { |match| words << match } & /\W*/
324
-
325
- This can be shortened to:
379
+ ```ruby
380
+ words = []
381
+ word = /\W*/ & /\w+/.capture?(:words) { |match| words << match } & /\W*/
382
+ ARBNO(word)
383
+ ```
384
+
385
+ The `word` pattern can be shortened to:
326
386
 
327
- word = /\W*/ & /\w+/.capture?(:words => []) & /\W*/
387
+ ```ruby
388
+ word = /\W*/ & /\w+/.capture?(:words => []) & /\W*/
389
+ ```
328
390
 
329
391
  This works because anytime there is a 1) capture with a capture variable that is 2) holding an array, 3) that does NOT have a block, capture method will go ahead and shovel the captured value into the capture variable. Note this behavior can be overriden if needed by including a block.
330
392
 
@@ -333,25 +395,28 @@ This works because anytime there is a 1) capture with a capture variable that is
333
395
  Each time MATCH, or ARBNO is called the current state of any known capture variables are saved, and those values will be restored when the MATCH/ARBNO exits. If new capture variables are introduced by the nested pattern, these new values will be merged with the existing set of variables.
334
396
 
335
397
  More powerful yet is the fact that every match string sent to a capture variable has access to all the values captured so far via the captured method. For example:
336
-
337
- subject_clause = article & noun.capture!(:subject)
338
- object_clause = article & noun.capture!:object)
339
- verb_clause = ...
340
- sentence = (subject_clause & verb_clause & object_clause & ".")
341
- sentences = ARBNO(sentence.capture?(:sentences => [])) & RPOS(0)
342
- sentences.match(file_stream).captured[:sentences].collect(&:captured)
343
- => [{:subject => "dog", :object => "man"}, {:subject => "man", :object => "dog} ...]
398
+
399
+ ```ruby
400
+ > subject_clause = article & noun.capture!(:subject) ;
401
+ * object_clause = article & noun.capture!(:object);
402
+ * verb_clause = ...
403
+ * sentence = (subject_clause & verb_clause & object_clause & ".");
404
+ * sentences = ARBNO(sentence.capture?(:sentences => [])) & RPOS(0);
405
+ * sentences.matches(file_stream).captured[:sentences].collect(&:captured)
406
+ => [{:subject => "dog", :object => "man"}, {:subject => "man", :object => "dog} ...]
407
+ ```
344
408
 
345
409
  As each noun is matched, it is captured and saved in :subject or :object. When the sentence is captured, the match is shoveled away into the :sentences variable. Because the match value itself responds to the captured method we end up with a all the data collected in a nice array.
346
410
 
347
- Note that capture! is used for capturing the nouns. This is cheaper and does not hurt anything since the value of
411
+ Note that capture! is used for capturing the nouns. This is cheaper and does not hurt anything since the value of
348
412
  the capture variable will just be overwritten.
349
-
413
+
350
414
  ### Advanced PRIMITIVES
351
415
 
352
416
  There are few more SNOBOL4 + SPITBOL primitives that are included for completeness.
353
417
 
354
418
  `FENCE` matches the empty string, but will fail if there is an attempt to backtrack through the FENCE.
419
+
355
420
  `FENCE(pattern)` will attempt to match pattern, but if an attempt is made to backtrack through the FENCE the pattern will fail.
356
421
 
357
422
  The difference is that FENCE will fail the whole match, but FENCE(pattern) will just fail the subpattern.
@@ -360,18 +425,20 @@ The difference is that FENCE will fail the whole match, but FENCE(pattern) will
360
425
 
361
426
  `FAIL` will never match anything, and will force the matcher to backtrack and retry the next alternative.
362
427
 
363
- `SUCCEED` will force the match to retry. The only that gets passed `SUCCEED` is `ABORT`.
428
+ `SUCCEED` will force the match to retry. The only pattern that gets passed `SUCCEED` is `ABORT`.
364
429
 
365
430
  These can be used together to do some interesting things. For example
366
431
 
367
- > pattern = POS(0) & SUCCEED & (FENCE(TAB(n: 1).capture!(:n) { |m, p, n| puts m; p+1 } | ABORT)) & FAIL;
368
- * pattern.match?("abcd")
369
- a
370
- ab
371
- abc
372
- abcd
373
- => nil
374
-
432
+ ```ruby
433
+ > pattern = POS(0) & SUCCEED & (FENCE(TAB(n: 1).capture!(:n) { |m, p, n| puts m; p+1 } | ABORT)) & FAIL;
434
+ * pattern.matches?("abcd")
435
+ a
436
+ ab
437
+ abc
438
+ abcd
439
+ => nil
440
+ ```
441
+
375
442
  The SUCCEED and FAIL primitives keep forcing the matcher to retry. Eventually the TAB will fail causing the ABORT alternative to execute the matcher.
376
443
 
377
444
  So it goes like this
@@ -379,55 +446,77 @@ So it goes like this
379
446
  SUCCEED
380
447
  TAB(1)
381
448
  FAIL
382
- SUCEED
449
+ SUCCEED
383
450
  TAB(2)
384
451
  etc...
385
-
386
- The FENCE keeps the matcher from backtracking into the ABORT option too early. Otherwise when the matcher hit fail, it would try different alternatives, and would hit the ABORT.
452
+
453
+ The FENCE keeps the matcher from backtracking into the ABORT option too early. Note that FENCE prevents backtracking through the level it is on,
454
+ however we can backtrack *around* the FENCE and into the SUCCEED, which forces us to retry the FENCE but with a new value of `n`.
387
455
 
388
456
  ### A complete real world example
389
457
 
390
458
  Cannonbol can be used to easily translate the email BNF spec into an email address parser.
391
459
 
392
- ws = /\s*/
393
- quoted_string = ws & '"' & ARBNO(NOTANY('"\\') | '\\"' | '\\\n' | '\\\\') & '"' & ws
394
- atom = ws & SPAN("!#$%&'*+-/0123456789=?@ABCDEFGHIJKLMNOPQRSTUVWXYZ^_`abcdefghijklmnopqrstuvwxyz{|}~") & ws
395
- word = (atom | quoted_string)
396
- phrase = word & ARBNO(word)
397
- domain_ref = atom
398
- domain_literal = "[" & /[0-9]+/ & ARBNO(/\.[0-9]+/) & "]"
399
- sub_domain = domain_ref | domain_literal
400
- domain = (sub_domain & ARBNO("." & sub_domain)).capture?(:domain) { |m| m.strip }
401
- local_part = (word & ARBNO("." & word)).capture?(:local_part) { |m| m.strip }
402
- addr_spec = (local_part & "@" & domain)
403
- route = (ws & "@" & domain & ARBNO("@" & domain)).capture?(:route) { |m| m.strip } & ":"
404
- route_addr = "<" & ((route | "") & addr_spec).capture?(:mailbox) { |m| m.strip } & ">"
405
- mailbox = (addr_spec.capture?(:mailbox) { |m| m.strip } |
406
- (phrase.capture?(:display_name) { |m| m.strip } & route_addr))
407
- group = (phrase.capture?(:group_name) { |m| m.strip } & ":" &
408
- (( mailbox.capture?(group_mailboxes: []) & ARBNO("," & mailbox.capture?(:group_mailboxes) ) ) | ws)) & ";"
409
- address = POS(0) & (mailbox | group ) & RPOS(0)
460
+ ```ruby
461
+ ws = /\s*/
462
+ quoted_string = ws & '"' & ARBNO(NOTANY('"\\') | '\\"' | '\\\n' | '\\\\') & '"' & ws
463
+ atom = ws & SPAN("!#$%&'*+-/0123456789=?@ABCDEFGHIJKLMNOPQRSTUVWXYZ^_`abcdefghijklmnopqrstuvwxyz{|}~") & ws
464
+ word = (atom | quoted_string)
465
+ phrase = word & ARBNO(word)
466
+ domain_ref = atom
467
+ domain_literal = "[" & /[0-9]+/ & ARBNO(/\.[0-9]+/) & "]"
468
+ sub_domain = domain_ref | domain_literal
469
+ domain = (sub_domain & ARBNO("." & sub_domain)).capture?(:domain) { |m| m.strip }
470
+ local_part = (word & ARBNO("." & word)).capture?(:local_part) { |m| m.strip }
471
+ addr_spec = (local_part & "@" & domain)
472
+ route = (ws & "@" & domain & ARBNO("@" & domain)).capture?(:route) { |m| m.strip } & ":"
473
+ route_addr = "<" & ((route | "") & addr_spec).capture?(:mailbox) { |m| m.strip } & ">"
474
+ mailbox = (addr_spec.capture?(:mailbox) { |m| m.strip } |
475
+ (phrase.capture?(:display_name) { |m| m.strip } & route_addr))
476
+ group = (phrase.capture?(:group_name) { |m| m.strip } & ":" &
477
+ (( mailbox.capture?(group_mailboxes: []) & ARBNO("," & mailbox.capture?(:group_mailboxes) ) ) | ws)) & ";"
478
+ address = POS(0) & (mailbox | group ) & RPOS(0)
479
+ ```
410
480
 
411
481
  So for example we can even parse an obscure email with groups and routes
412
482
 
413
- > email = 'here is my "big fat \\\n groupen" : someone@catprint.com, Fred Nurph<@sub1.sub2@sub3.sub4:fred.nurph@catprint.com>;'
414
- => here is my "big fat \\\n groupen" : someone@catprint.com, Fred Nurph<@sub1.sub2@sub3.sub4:fred.nurph@catprint.com>;
415
- > match = address.match?(email)
416
- => here is my "big fat \\\n groupen" : someone@catprint.com, Fred Nurph<@sub1.sub2@sub3.sub4:fred.nurph@catprint.com>;
417
- > match.captured[:group_mailboxes].first.captured[:mailbox]
418
- => someone@catprint.com
419
- > match.captured[:group_name]
420
- => here is my "big fat \\\n groupen
483
+ ```ruby
484
+ > email = 'here is my "big fat \\\n groupen" : someone@catprint.com, Fred Nurph<@sub1.sub2@sub3.sub4:fred.nurph@catprint.com>;'
485
+ => here is my "big fat \\\n groupen" : someone@catprint.com, Fred Nurph<@sub1.sub2@sub3.sub4:fred.nurph@catprint.com>;
486
+ > match = address.matches?(email)
487
+ => here is my "big fat \\\n groupen" : someone@catprint.com, Fred Nurph<@sub1.sub2@sub3.sub4:fred.nurph@catprint.com>;
488
+ > match.captured[:group_mailboxes].first.captured[:mailbox]
489
+ => someone@catprint.com
490
+ > match.captured[:group_name]
491
+ => here is my "big fat \\\n groupen
492
+ ```
493
+
494
+ ### Backward Compatibility (`matches?` and `insensitive` methods)
495
+
496
+ If you have existing Cannonbol code and are using Ruby 2.5+ you will need to either replace
497
+ uses of the `match?` method and the `-` unary operator with `matches?` and `insensitive`.
421
498
 
499
+ Alternatively you can add the following code to your application:
500
+
501
+ ```ruby
502
+ class Regexp
503
+ include Cannonbol::CompatibilityAdapter
504
+ end
505
+
506
+ class String
507
+ include Cannonbol::CompatibilityAdapter
508
+ end
509
+ ```
422
510
 
423
511
  ## Development
424
512
 
425
- After checking out the repo, run `bundle install` to install dependencies.
513
+ After checking out the repo, run `bundle install` to install dependencies.
426
514
 
427
515
  ### Specs
428
516
 
429
517
  Run `bundle exec rspec` to run the tests on your server environment
430
- Run `bundle exec rake rackup` and then point your browser to your machine to run the tests in the opal
518
+
519
+ > Testing on Opal is currently broken because of compatibility issues with Opal 1.1 and opal-rspec. However everything should work fine with Opal.
431
520
 
432
521
  ## Contributing
433
522