regexp_parser 0.1.6 → 0.2.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/ChangeLog +57 -0
- data/Gemfile +8 -0
- data/LICENSE +1 -1
- data/README.md +225 -206
- data/Rakefile +9 -3
- data/lib/regexp_parser.rb +7 -11
- data/lib/regexp_parser/expression.rb +72 -14
- data/lib/regexp_parser/expression/classes/alternation.rb +3 -16
- data/lib/regexp_parser/expression/classes/conditional.rb +57 -0
- data/lib/regexp_parser/expression/classes/free_space.rb +17 -0
- data/lib/regexp_parser/expression/classes/keep.rb +7 -0
- data/lib/regexp_parser/expression/classes/set.rb +28 -7
- data/lib/regexp_parser/expression/methods/strfregexp.rb +113 -0
- data/lib/regexp_parser/expression/methods/tests.rb +116 -0
- data/lib/regexp_parser/expression/methods/traverse.rb +63 -0
- data/lib/regexp_parser/expression/quantifier.rb +10 -0
- data/lib/regexp_parser/expression/sequence.rb +45 -0
- data/lib/regexp_parser/expression/subexpression.rb +29 -1
- data/lib/regexp_parser/lexer.rb +31 -8
- data/lib/regexp_parser/parser.rb +118 -45
- data/lib/regexp_parser/scanner.rb +1745 -1404
- data/lib/regexp_parser/scanner/property.rl +57 -3
- data/lib/regexp_parser/scanner/scanner.rl +161 -34
- data/lib/regexp_parser/syntax.rb +12 -2
- data/lib/regexp_parser/syntax/ruby/1.9.1.rb +3 -3
- data/lib/regexp_parser/syntax/ruby/1.9.3.rb +2 -7
- data/lib/regexp_parser/syntax/ruby/2.0.0.rb +4 -1
- data/lib/regexp_parser/syntax/ruby/2.1.4.rb +13 -0
- data/lib/regexp_parser/syntax/ruby/2.1.5.rb +13 -0
- data/lib/regexp_parser/syntax/ruby/2.1.rb +2 -2
- data/lib/regexp_parser/syntax/ruby/2.2.0.rb +16 -0
- data/lib/regexp_parser/syntax/ruby/2.2.rb +8 -0
- data/lib/regexp_parser/syntax/tokens.rb +19 -2
- data/lib/regexp_parser/syntax/tokens/conditional.rb +22 -0
- data/lib/regexp_parser/syntax/tokens/keep.rb +14 -0
- data/lib/regexp_parser/syntax/tokens/unicode_property.rb +45 -4
- data/lib/regexp_parser/token.rb +23 -8
- data/lib/regexp_parser/version.rb +5 -0
- data/regexp_parser.gemspec +35 -0
- data/test/expression/test_all.rb +6 -1
- data/test/expression/test_base.rb +19 -0
- data/test/expression/test_conditionals.rb +114 -0
- data/test/expression/test_free_space.rb +33 -0
- data/test/expression/test_set.rb +61 -0
- data/test/expression/test_strfregexp.rb +214 -0
- data/test/expression/test_subexpression.rb +24 -0
- data/test/expression/test_tests.rb +99 -0
- data/test/expression/test_to_h.rb +48 -0
- data/test/expression/test_to_s.rb +46 -0
- data/test/expression/test_traverse.rb +164 -0
- data/test/lexer/test_all.rb +16 -3
- data/test/lexer/test_conditionals.rb +101 -0
- data/test/lexer/test_keep.rb +24 -0
- data/test/lexer/test_literals.rb +51 -51
- data/test/lexer/test_nesting.rb +62 -62
- data/test/lexer/test_refcalls.rb +18 -20
- data/test/parser/test_all.rb +18 -3
- data/test/parser/test_alternation.rb +11 -14
- data/test/parser/test_conditionals.rb +148 -0
- data/test/parser/test_escapes.rb +29 -5
- data/test/parser/test_free_space.rb +139 -0
- data/test/parser/test_groups.rb +40 -0
- data/test/parser/test_keep.rb +21 -0
- data/test/scanner/test_all.rb +8 -2
- data/test/scanner/test_conditionals.rb +166 -0
- data/test/scanner/test_escapes.rb +8 -5
- data/test/scanner/test_free_space.rb +133 -0
- data/test/scanner/test_groups.rb +28 -0
- data/test/scanner/test_keep.rb +33 -0
- data/test/scanner/test_properties.rb +4 -0
- data/test/scanner/test_scripts.rb +71 -1
- data/test/syntax/ruby/test_1.9.3.rb +2 -2
- data/test/syntax/ruby/test_2.0.0.rb +38 -0
- data/test/syntax/ruby/test_2.2.0.rb +38 -0
- data/test/syntax/ruby/test_all.rb +1 -8
- data/test/syntax/ruby/test_files.rb +104 -0
- data/test/test_all.rb +2 -1
- data/test/token/test_all.rb +2 -0
- data/test/token/test_token.rb +109 -0
- metadata +75 -21
- data/VERSION.yml +0 -5
- data/lib/regexp_parser/ctype.rb +0 -48
- data/test/syntax/ruby/test_2.x.rb +0 -46
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 231a27b00daf24a41710b45ef92fef5b6963dc5a
|
4
|
+
data.tar.gz: 1cd1f75da74654cd20a0ac7716aed8519490fef5
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 620846b89adb5b8d27efe722af58951951ce0e7362f646fcc397c94f5597dc999d79851a507d13338ca7f59b67a25539a205878209f7787f6d4053ba95b2555c
|
7
|
+
data.tar.gz: 2f2400eede6011229f6690c8230ae1c6b3abdc2e83380cf0e7b8b6e1dd72c2c7035e82f87f808f64e0f47442df961ba13dae04164c35c776f41cf92aefa2f515
|
data/ChangeLog
CHANGED
@@ -1,3 +1,60 @@
|
|
1
|
+
Wed Dec 3 05:21:27 2014 Ammar Ali <ammarabuali@gmail.com>
|
2
|
+
|
3
|
+
* Added expand_members method to CharacterSet, returns traditional
|
4
|
+
or unicode property forms of shothands (\d, \W, \s, etc.)
|
5
|
+
|
6
|
+
Tue Dec 2 02:42:39 2014 Ammar Ali <ammarabuali@gmail.com>
|
7
|
+
|
8
|
+
* Improved meaning and output of %t and %T in strfregexp.
|
9
|
+
|
10
|
+
* Added syntax versions for ruby 2.1.4 and 2.1.5 and updated
|
11
|
+
latest 2.1 version.
|
12
|
+
|
13
|
+
Mon Dec 1 15:52:31 2014 Ammar Ali <ammarabuali@gmail.com>
|
14
|
+
|
15
|
+
* Added to_h methods to Expression, Subexpression, and Quantifier.
|
16
|
+
|
17
|
+
Tue Oct 21 19:14:03 2014 Ammar Ali <ammarabuali@gmail.com>
|
18
|
+
|
19
|
+
* Added traversal methods; traverse, each_expression, and map.
|
20
|
+
|
21
|
+
* Added token/type test methods; type?, is?, and one_of?
|
22
|
+
|
23
|
+
* Added printing method strfregexp, inspired by strftime.
|
24
|
+
|
25
|
+
Mon Oct 20 01:03:46 2014 Ammar Ali <ammarabuali@gmail.com>
|
26
|
+
|
27
|
+
* Added scanning and parsing of free spacing (x mode) expressions.
|
28
|
+
|
29
|
+
* Improved handling of inline options (?mixdau:...)
|
30
|
+
|
31
|
+
Fri Oct 18 14:09:38 2014 Ammar Ali <ammarabuali@gmail.com>
|
32
|
+
|
33
|
+
* Added conditional expressions. Ruby 2.0.
|
34
|
+
|
35
|
+
* Added keep (\K) markers. Ruby 2.0.
|
36
|
+
|
37
|
+
* Added d, a, and u options. Ruby 2.0.
|
38
|
+
|
39
|
+
* Added missing meta sequences to the parser. They were supported
|
40
|
+
by the scanner only.
|
41
|
+
|
42
|
+
* Renamed Lexer's method to lex, added an alias to the old name (scan)
|
43
|
+
|
44
|
+
* Use #map instead of #each to run the block in Lexer.lex.
|
45
|
+
|
46
|
+
* Replaced VERSION.yml file with a constant.
|
47
|
+
|
48
|
+
* Updated README
|
49
|
+
|
50
|
+
Fri Oct 10 11:49:38 2014 Ammar Ali <ammarabuali@gmail.com>
|
51
|
+
|
52
|
+
* Update tokens and scanner with new additions in Unicode 7.0.
|
53
|
+
|
54
|
+
Mon Oct 6 04:30:24 2014 Ammar Ali <ammarabuali@gmail.com>
|
55
|
+
|
56
|
+
* Released version 0.1.6
|
57
|
+
|
1
58
|
Sun Oct 5 19:58:17 2014 Ammar Ali <ammarabuali@gmail.com>
|
2
59
|
|
3
60
|
* Fixed test and gem building rake tasks and extracted the gem
|
data/Gemfile
ADDED
data/LICENSE
CHANGED
data/README.md
CHANGED
@@ -1,57 +1,87 @@
|
|
1
|
-
# Regexp::Parser
|
1
|
+
# Regexp::Parser
|
2
2
|
|
3
|
-
|
3
|
+
[![Gem Version](https://badge.fury.io/rb/regexp_parser.svg)](http://badge.fury.io/rb/regexp_parser) [![Build Status](https://secure.travis-ci.org/ammar/regexp_parser.png?branch=master)](http://travis-ci.org/ammar/regexp_parser) [![Code Climate](https://codeclimate.com/github/ammar/regexp_parser.png)](https://codeclimate.com/github/ammar/regexp_parser/badges)
|
4
|
+
|
5
|
+
A ruby gem for tokenizing, parsing, and transforming regular expressions.
|
4
6
|
|
5
7
|
* Multilayered
|
6
|
-
* A scanner based on [ragel](http://www.
|
7
|
-
* A lexer that produces a "stream" of
|
8
|
-
* A parser that produces a "tree" of
|
9
|
-
*
|
10
|
-
*
|
8
|
+
* A scanner/tokenizer based on [ragel](http://www.colm.net/open-source/ragel/)
|
9
|
+
* A lexer that produces a "stream" of token objects.
|
10
|
+
* A parser that produces a "tree" of Expression objects (OO API)
|
11
|
+
* Runs on ruby 1.8, 1.9, 2.x, and jruby (1.9 mode) runtimes.
|
12
|
+
* Recognizes ruby 1.8, 1.9, and 2.x regular expressions [See Scanner Syntax](#scanner-syntax)
|
13
|
+
|
11
14
|
|
12
15
|
_For an example of regexp_parser in use, see the [meta_re project](https://github.com/ammar/meta_re)_
|
13
16
|
|
17
|
+
|
14
18
|
---
|
15
19
|
## Requirements
|
16
20
|
|
17
|
-
*
|
18
|
-
*
|
21
|
+
* Ruby >= 1.8.7
|
22
|
+
* Ragel >= 6.0, but only if you want to build the gem or work on the scanner.
|
19
23
|
|
20
24
|
|
21
25
|
_Note: See the .travis.yml file for covered versions._
|
22
26
|
|
27
|
+
|
23
28
|
---
|
24
29
|
## Install
|
25
30
|
|
31
|
+
Install the gem with:
|
32
|
+
|
26
33
|
`gem install regexp_parser`
|
27
34
|
|
35
|
+
Or, add it to your project's `Gemfile`:
|
36
|
+
|
37
|
+
```gem 'regexp_parser', '~> X.Y.Z'```
|
38
|
+
|
39
|
+
See rubygems for the the [latest version number](https://rubygems.org/gems/regexp_parser)
|
40
|
+
|
41
|
+
|
28
42
|
---
|
29
43
|
## Usage
|
30
44
|
|
45
|
+
The three main modules are **Scanner**, **Lexer**, and **Parser**. Each of them
|
46
|
+
provides a single method that takes a regular expression (as a RegExp object or
|
47
|
+
a string) and returns its results. The **Lexer** and the **Parser** accept an
|
48
|
+
optional second argument that specifies the syntax version, like 'ruby/2.0',
|
49
|
+
which defaults to the host ruby version (using RUBY_VERSION).
|
50
|
+
|
51
|
+
Here are the basic usage examples:
|
52
|
+
|
31
53
|
```ruby
|
32
|
-
# require the gem, then call one of:
|
33
54
|
require 'regexp_parser'
|
34
55
|
|
35
|
-
|
36
|
-
Regexp::Scanner.scan regexp
|
56
|
+
Regexp::Scanner.scan(regexp)
|
37
57
|
|
38
|
-
|
39
|
-
Regexp::Lexer.scan regexp
|
58
|
+
Regexp::Lexer.lex(regexp)
|
40
59
|
|
41
|
-
|
42
|
-
Regexp::Parser.parse regexp
|
60
|
+
Regexp::Parser.parse(regexp)
|
43
61
|
```
|
44
62
|
|
45
|
-
|
63
|
+
All three methods accept a block as the last argument, which, if given, gets
|
64
|
+
called with the results as follows:
|
65
|
+
|
66
|
+
* **Scanner**: the block gets passed the results as they are scanned. See the
|
67
|
+
example in the next section for details.
|
68
|
+
|
69
|
+
* **Lexer**: after completion, the block gets passed the tokens one by one.
|
70
|
+
_The result of the block is returned._
|
71
|
+
|
72
|
+
* **Parser**: after completion, the block gets passed the root expression.
|
73
|
+
_The result of the block is returned._
|
74
|
+
|
46
75
|
|
47
76
|
---
|
48
77
|
## Components
|
49
78
|
|
50
79
|
### Scanner
|
51
|
-
A ragel generated scanner that recognizes the cumulative syntax of
|
52
|
-
supported
|
53
|
-
their type, token, text, and start/end
|
54
|
-
pattern.
|
80
|
+
A ragel generated scanner that recognizes the cumulative syntax of all
|
81
|
+
supported syntax versions. It breaks a given expression's text into the
|
82
|
+
smallest parts, and identifies their type, token, text, and start/end
|
83
|
+
offsets within the pattern.
|
84
|
+
|
55
85
|
|
56
86
|
#### Example
|
57
87
|
The following scans the given pattern and prints out the type, token, text and
|
@@ -79,7 +109,8 @@ end
|
|
79
109
|
# type: group, token: close, text: ')' [15..16]
|
80
110
|
```
|
81
111
|
|
82
|
-
A one-liner that
|
112
|
+
A one-liner that uses map on the result of the scan to return the textual
|
113
|
+
parts of the pattern:
|
83
114
|
|
84
115
|
```ruby
|
85
116
|
Regexp::Scanner.scan( /(cat?([bhm]at)){3,5}/ ).map {|token| token[2]}
|
@@ -90,17 +121,18 @@ Regexp::Scanner.scan( /(cat?([bhm]at)){3,5}/ ).map {|token| token[2]}
|
|
90
121
|
#### Notes
|
91
122
|
* The scanner performs basic syntax error checking, like detecting missing
|
92
123
|
balancing punctuation and premature end of pattern. Flavor validity checks
|
93
|
-
are performed in the lexer.
|
124
|
+
are performed in the lexer, which uses a syntax object.
|
94
125
|
|
95
|
-
* If the input is a ruby Regexp object, the scanner calls #source on it to
|
126
|
+
* If the input is a ruby **Regexp** object, the scanner calls #source on it to
|
96
127
|
get its string representation. #source does not include the options of
|
97
|
-
expression (m, i, and x) To include the options the scan, #to_s
|
98
|
-
be called on the Regexp before passing it to the scanner
|
99
|
-
|
128
|
+
the expression (m, i, and x) To include the options in the scan, #to_s
|
129
|
+
should be called on the **Regexp** before passing it to the scanner or any
|
130
|
+
of the other modules.
|
100
131
|
|
101
132
|
* To keep the scanner simple(r) and fairly reusable for other purposes, it
|
102
133
|
does not perform lexical analysis on the tokens, sticking to the task
|
103
|
-
of
|
134
|
+
of identifying the smallest possible tokens and leaving lexical analysis
|
135
|
+
to the lexer.
|
104
136
|
|
105
137
|
|
106
138
|
---
|
@@ -110,28 +142,36 @@ flavor). Syntax classes act as lookup tables, and are layered to create
|
|
110
142
|
flavor variations. Syntax only comes into play in the lexer.
|
111
143
|
|
112
144
|
#### Example
|
113
|
-
The following instantiates
|
114
|
-
|
145
|
+
The following instantiates syntax objects for Ruby 2.0, 1.9, 1.8, and
|
146
|
+
checks a few of their implementation features.
|
115
147
|
|
116
148
|
```ruby
|
117
149
|
require 'regexp_parser'
|
118
150
|
|
151
|
+
ruby_20 = Regexp::Syntax.new 'ruby/2.0'
|
152
|
+
ruby_20.implements? :quantifier, :zero_or_one # => true
|
153
|
+
ruby_20.implements? :quantifier, :zero_or_one_reluctant # => true
|
154
|
+
ruby_20.implements? :quantifier, :zero_or_one_possessive # => true
|
155
|
+
ruby_20.implements? :conditional, :condition # => true
|
156
|
+
|
119
157
|
ruby_19 = Regexp::Syntax.new 'ruby/1.9'
|
120
|
-
ruby_19.implements? :quantifier,
|
121
|
-
ruby_19.implements? :quantifier,
|
122
|
-
ruby_19.implements? :quantifier,
|
158
|
+
ruby_19.implements? :quantifier, :zero_or_one # => true
|
159
|
+
ruby_19.implements? :quantifier, :zero_or_one_reluctant # => true
|
160
|
+
ruby_19.implements? :quantifier, :zero_or_one_possessive # => true
|
161
|
+
ruby_19.implements? :conditional, :condition # => false
|
123
162
|
|
124
163
|
ruby_18 = Regexp::Syntax.new 'ruby/1.8'
|
125
|
-
ruby_18.implements? :quantifier,
|
126
|
-
ruby_18.implements? :quantifier,
|
127
|
-
ruby_18.implements? :quantifier,
|
164
|
+
ruby_18.implements? :quantifier, :zero_or_one # => true
|
165
|
+
ruby_18.implements? :quantifier, :zero_or_one_reluctant # => true
|
166
|
+
ruby_18.implements? :quantifier, :zero_or_one_possessive # => false
|
167
|
+
ruby_18.implements? :conditional, :condition # => false
|
128
168
|
```
|
129
169
|
|
130
170
|
|
131
171
|
#### Notes
|
132
|
-
*
|
133
|
-
pair of single quotes, are specified with an underscore followed
|
134
|
-
characters appended to the base token. In the previous named group example,
|
172
|
+
* Variations on a token, for example a named group with angle brackets (< and >)
|
173
|
+
vs one with a pair of single quotes, are specified with an underscore followed
|
174
|
+
by two characters appended to the base token. In the previous named group example,
|
135
175
|
the tokens would be :named_ab (angle brackets) and :named_sq (single quotes).
|
136
176
|
These variations are normalized by the syntax to :named.
|
137
177
|
|
@@ -139,22 +179,23 @@ ruby_18.implements? :quantifier, :zero_or_one_possessive # => false
|
|
139
179
|
---
|
140
180
|
### Lexer
|
141
181
|
Sits on top of the scanner and performs lexical analysis on the tokens that
|
142
|
-
it emits. Among its tasks are breaking quantified literal runs, collecting the
|
143
|
-
emitted token
|
144
|
-
|
145
|
-
|
182
|
+
it emits. Among its tasks are; breaking quantified literal runs, collecting the
|
183
|
+
emitted token attributes into Token objects, calculating their nesting depth,
|
184
|
+
normalizing tokens for the parser, and checkng if the tokens are implemented by
|
185
|
+
the given syntax version.
|
186
|
+
|
187
|
+
See the [Token Objects](https://github.com/ammar/regexp_parser/wiki/Token-Objects)
|
188
|
+
wiki page for more information on Token objects.
|
146
189
|
|
147
|
-
Tokens are Struct objects, with a few helper methods; #next, #previous, #offsets
|
148
|
-
and #length.
|
149
190
|
|
150
191
|
#### Example
|
151
|
-
The following example
|
152
|
-
syntax, and prints the token objects' text.
|
192
|
+
The following example lexes the given pattern, checks it against the ruby 1.9
|
193
|
+
syntax, and prints the token objects' text indented to their level.
|
153
194
|
|
154
195
|
```ruby
|
155
196
|
require 'regexp_parser'
|
156
197
|
|
157
|
-
Regexp::Lexer.scan /a?(b(c))*[d]
|
198
|
+
Regexp::Lexer.scan /a?(b(c))*[d]+/, 'ruby/1.9' do |token|
|
158
199
|
puts "#{' ' * token.level}#{token.text}"
|
159
200
|
end
|
160
201
|
|
@@ -175,8 +216,9 @@ end
|
|
175
216
|
```
|
176
217
|
|
177
218
|
A one-liner that returns an array of the textual parts of the given pattern.
|
178
|
-
Compare the output with that of the one-liner example of the Scanner
|
179
|
-
how the sequence 'cat' is treated.
|
219
|
+
Compare the output with that of the one-liner example of the **Scanner**; notably
|
220
|
+
how the sequence 'cat' is treated. The 't' is seperated because it's followed
|
221
|
+
by a quantifier that only applies to it.
|
180
222
|
|
181
223
|
```ruby
|
182
224
|
Regexp::Lexer.scan( /(cat?([b]at)){3,5}/ ).map {|token| token.text}
|
@@ -184,50 +226,70 @@ Regexp::Lexer.scan( /(cat?([b]at)){3,5}/ ).map {|token| token.text}
|
|
184
226
|
```
|
185
227
|
|
186
228
|
#### Notes
|
187
|
-
* The
|
229
|
+
* The syntax argument is optional. It defaults to the version of the ruby
|
230
|
+
interpreter in use, as returned by RUBY_VERSION.
|
188
231
|
|
189
|
-
* The lexer
|
190
|
-
emitted tokens. This responsibility might be relegated to the scanner
|
191
|
-
in a future release.
|
232
|
+
* The lexer normalizes some tokens, as noted in the Syntax section above.
|
192
233
|
|
193
234
|
|
194
235
|
---
|
195
236
|
### Parser
|
196
237
|
Sits on top of the lexer and transforms the "stream" of Token objects emitted
|
197
238
|
by it into a tree of Expression objects represented by an instance of the
|
198
|
-
Expression::Root class.
|
239
|
+
Expression::Root class.
|
240
|
+
|
241
|
+
See the [Expression Objects](https://github.com/ammar/regexp_parser/wiki/Expression-Objects)
|
242
|
+
wiki page for attributes and methods.
|
243
|
+
|
199
244
|
|
200
245
|
#### Example
|
201
246
|
|
202
247
|
```ruby
|
203
248
|
require 'regexp_parser'
|
204
249
|
|
205
|
-
regex = /a?(b)*[
|
250
|
+
regex = /a?(b+(c)d)*(?<name>[0-9]+)/
|
206
251
|
|
207
|
-
|
208
|
-
# expression into '(?m-ix:a?(b)*[c]+)', thus the Group::Options in the output
|
209
|
-
root = Regexp::Parser.parse( regex.to_s, 'ruby/2.1')
|
252
|
+
tree = Regexp::Parser.parse( regex, 'ruby/2.1' )
|
210
253
|
|
211
|
-
|
212
|
-
|
254
|
+
tree.traverse do |event, exp|
|
255
|
+
puts "#{event}: #{exp.type} `#{exp.to_s}`"
|
256
|
+
end
|
213
257
|
|
214
|
-
#
|
215
|
-
|
216
|
-
|
258
|
+
# Output
|
259
|
+
# visit: literal `a?`
|
260
|
+
# enter: group `(b+(c)d)*`
|
261
|
+
# visit: literal `b+`
|
262
|
+
# enter: group `(c)`
|
263
|
+
# visit: literal `c`
|
264
|
+
# exit: group `(c)`
|
265
|
+
# visit: literal `d`
|
266
|
+
# exit: group `(b+(c)d)*`
|
267
|
+
# enter: group `(?<name>[0-9]+)`
|
268
|
+
# visit: set `[0-9]+`
|
269
|
+
# exit: group `(?<name>[0-9]+)`
|
270
|
+
```
|
217
271
|
|
218
|
-
|
219
|
-
|
220
|
-
|
221
|
-
end
|
272
|
+
Another example, using each_expression and strfregexp to print the object tree.
|
273
|
+
_See the traverse.rb and strfregexp.rb files under `lib/regexp_parser/expression/methods`
|
274
|
+
for more information on these methods._
|
222
275
|
|
223
|
-
|
276
|
+
```ruby
|
277
|
+
include_root = true
|
278
|
+
indent_offset = include_root ? 1 : 0
|
224
279
|
|
225
|
-
|
280
|
+
tree.each_expression(include_root) do |exp, level_index|
|
281
|
+
puts exp.strfregexp("%>> %c", indent_offset)
|
282
|
+
end
|
283
|
+
|
284
|
+
# Output
|
226
285
|
# > Regexp::Expression::Root
|
227
|
-
# > Regexp::Expression::
|
286
|
+
# > Regexp::Expression::Literal
|
287
|
+
# > Regexp::Expression::Group::Capture
|
228
288
|
# > Regexp::Expression::Literal
|
229
289
|
# > Regexp::Expression::Group::Capture
|
230
290
|
# > Regexp::Expression::Literal
|
291
|
+
# > Regexp::Expression::Literal
|
292
|
+
# > Regexp::Expression::Group::Named
|
231
293
|
# > Regexp::Expression::CharacterSet
|
232
294
|
```
|
233
295
|
|
@@ -236,122 +298,84 @@ Expression class. See the next section for details._
|
|
236
298
|
|
237
299
|
|
238
300
|
---
|
239
|
-
|
240
|
-
|
241
|
-
|
242
|
-
|
243
|
-
|
244
|
-
|
245
|
-
|
246
|
-
|
247
|
-
|
248
|
-
|
249
|
-
|
250
|
-
|
251
|
-
|
252
|
-
|
253
|
-
|
254
|
-
|
255
|
-
|
256
|
-
|
257
|
-
|
258
|
-
|
259
|
-
|
260
|
-
|
261
|
-
|
262
|
-
|
263
|
-
|
264
|
-
|
265
|
-
|
266
|
-
|
267
|
-
|
268
|
-
|
269
|
-
|
270
|
-
|
271
|
-
|
272
|
-
|
273
|
-
|
274
|
-
|
275
|
-
|
276
|
-
|
277
|
-
|
278
|
-
|
279
|
-
|
280
|
-
|
281
|
-
|
282
|
-
|
283
|
-
|
284
|
-
|
285
|
-
|
286
|
-
|
287
|
-
|
288
|
-
|
289
|
-
|
290
|
-
|
291
|
-
|
292
|
-
|
293
|
-
|
294
|
-
|
295
|
-
|
296
|
-
|
297
|
-
|
298
|
-
|
299
|
-
|
300
|
-
|
301
|
-
|
302
|
-
|
303
|
-
|
304
|
-
|
305
|
-
|
306
|
-
|
307
|
-
|
308
|
-
|
309
|
-
-
|
310
|
-
-
|
311
|
-
- Options: (?mi-x:abc)
|
312
|
-
- Passive: (?:abc)
|
313
|
-
- Sub-expression Calls: \g<name>, \g<1>
|
314
|
-
- Literals: abc, def?, etc.
|
315
|
-
- POSIX classes: [:alpha:], [:print:], etc.
|
316
|
-
- Quantifiers
|
317
|
-
- Greedy: ?, *, +, {m,M}
|
318
|
-
- Reluctant: ??, *?, +?, {m,M}?
|
319
|
-
- Possessive: ?+, *+, ++, {m,M}+
|
320
|
-
- String Escapes
|
321
|
-
- Control: \C-C, \cD, etc.
|
322
|
-
- Hex: \x20, \x{701230}, etc.
|
323
|
-
- Meta: \M-c, \M-\C-C etc.
|
324
|
-
- Octal: \0, \01, \012
|
325
|
-
- Unicode: \uHHHH, \u{H+ H+}
|
326
|
-
- Traditional Back-references: \1 thru \9
|
327
|
-
- Unicode Properties:
|
328
|
-
- Age: \p{Age=2.1}, \P{age=5.2}, etc.
|
329
|
-
- Classes: \p{Alpha}, \P{Space}, etc.
|
330
|
-
- Derived Properties: \p{Math}, \P{Lowercase}, etc.
|
331
|
-
- General Categories: \p{Lu}, \P{Cs}, etc.
|
332
|
-
- Scripts: \p{Arabic}, \P{Hiragana}, etc.
|
333
|
-
- Simple Properties: \p{Dash}, \p{Extender}, etc.
|
334
|
-
|
335
|
-
|
336
|
-
### Missing Features
|
337
|
-
|
338
|
-
The following were added by the Onigmo regular expression library used by
|
339
|
-
ruby 2.x and are not currently recognized by the scanner:
|
340
|
-
|
341
|
-
- Planned for support
|
342
|
-
- Conditional Expressions: (?(cond)yes-subexp), (?(cond)yes-subexp|no-subexp)
|
343
|
-
- Negative POSIX Brackets: [:^alpha:], [:^digit:]
|
344
|
-
- New Character Set Options: d, a, and u _[see](https://github.com/k-takata/Onigmo/blob/master/doc/RE#L234)_
|
345
|
-
- Not planned for support
|
346
|
-
- Keep: \K _(not enabled for ruby syntax)_
|
347
|
-
- Quotes: \Q...\E _(perl and java syntax only) [see](https://github.com/k-takata/Onigmo/blob/master/doc/RE#L452)_
|
348
|
-
- Capture History: (?@...), (?@<name>...) _(not enabled for ruby syntax) [see](https://github.com/k-takata/Onigmo/blob/master/doc/RE#L499)_
|
301
|
+
|
302
|
+
|
303
|
+
## Supported Syntax
|
304
|
+
The three modules support all the regular expression syntax features of Ruby 1.8
|
305
|
+
, 1.9, and 2.x:
|
306
|
+
|
307
|
+
_Note that not all of these are available in all versions of Ruby_
|
308
|
+
|
309
|
+
|
310
|
+
| Syntax Feature | Examples | ⋯ |
|
311
|
+
| ------------------------------------- | ------------------------------------------------------- |:--------:|
|
312
|
+
| **Alternation** | `a|b|c` | ✓ |
|
313
|
+
| **Anchors** | `^`, `$`, `\b` | ✓ |
|
314
|
+
| **Character Classes** | `[abc]`, `[^\\]`, `[a-d&&g-h]`, `[a=e=b]` | ✓ |
|
315
|
+
| **Character Types** | `\d`, `\H`, `\s` | ✓ |
|
316
|
+
| **Conditional Exps.** | `(?(cond)yes-subexp)`, `(?(cond)yes-subexp|no-subexp)` | ✓ |
|
317
|
+
| **Escape Sequences** | `\t`, `\\+`, `\?` | ✓ |
|
318
|
+
| **Free Space** | whitespace and `# Comments` _(x modifier)_ | ✓ |
|
319
|
+
| **Grouped Exps.** | | ⋱ |
|
320
|
+
|   _**Assertions**_ | | ⋱ |
|
321
|
+
|   _Lookahead_ | `(?=abc)` | ✓ |
|
322
|
+
|   _Negative Lookahead_ | `(?!abc)` | ✓ |
|
323
|
+
|   _Lookbehind_ | `(?<=abc)` | ✓ |
|
324
|
+
|   _Negative Lookbehind_ | `(?<!abc)` | ✓ |
|
325
|
+
|   _**Atomic**_ | `(?>abc)` | ✓ |
|
326
|
+
|   _**Back-references**_ | | ⋱ |
|
327
|
+
|   _Named_ | `\k<name>` | ✓ |
|
328
|
+
|   _Nest Level_ | `\k<n-1>` | ✓ |
|
329
|
+
|   _Numbered_ | `\k<1>` | ✓ |
|
330
|
+
|   _Relative_ | `\k<-2>` | ✓ |
|
331
|
+
|   _Traditional_ | `\1` thru `\9` | ✓ |
|
332
|
+
|   _**Capturing**_ | `(abc)` | ✓ |
|
333
|
+
|   _**Comments**_ | `(?# comment text)` | ✓ |
|
334
|
+
|   _**Named**_ | `(?<name>abc)`, `(?'name'abc)` | ✓ |
|
335
|
+
|   _**Options**_ | `(?mi-x:abc)`, `(?a:\s\w+)` | ✓ |
|
336
|
+
|   _**Passive**_ | `(?:abc)` | ✓ |
|
337
|
+
|   _**Subexp. Calls**_ | `\g<name>`, `\g<1>` | ✓ |
|
338
|
+
| **Keep** | `\K`, `(ab\Kc|d\Ke)f` | ✓ |
|
339
|
+
| **Literals** _(utf-8)_ | `Ruby`, `ルビー`, `روبي` | ✓ |
|
340
|
+
| **POSIX Classes** | `[:alpha:]`, `[:^digit:]` | ✓ |
|
341
|
+
| **Quantifiers** | | ⋱ |
|
342
|
+
|   _**Greedy**_ | `?`, `*`, `+`, `{m,M}` | ✓ |
|
343
|
+
|   _**Reluctant** (Lazy)_ | `??`, `*?`, `+?`, `{m,M}?` | ✓ |
|
344
|
+
|   _**Possessive**_ | `?+`, `*+`, `++`, `{m,M}+` | ✓ |
|
345
|
+
| **String Escapes** | | ⋱ |
|
346
|
+
|   _**Control**_ | `\C-C`, `\cD` | ✓ |
|
347
|
+
|   _**Hex**_ | `\x20`, `\x{701230}` | ✓ |
|
348
|
+
|   _**Meta**_ | `\M-c`, `\M-\C-C` | ✓ |
|
349
|
+
|   _**Octal**_ | `\0`, `\01`, `\012` | ✓ |
|
350
|
+
|   _**Unicode**_ | `\uHHHH`, `\u{H+ H+}` | ✓ |
|
351
|
+
| **Unicode Properties** | _<sub>([Unicode 7.0.0](http://www.unicode.org/versions/Unicode7.0.0/))</sub>_ | ⋱ |
|
352
|
+
|   _**Age**_ | `\p{Age=5.2}`, `\P{age=7.0}` | ✓ |
|
353
|
+
|   _**Classes**_ | `\p{Alpha}`, `\P{Space}` | ✓ |
|
354
|
+
|   _**Derived**_ | `\p{Math}`, `\P{Lowercase}` | ✓ |
|
355
|
+
|   _**General Categories**_ | `\p{Lu}`, `\P{Cs}` | ✓ |
|
356
|
+
|   _**Scripts**_ | `\p{Arabic}`, `\P{Hiragana}` | ✓ |
|
357
|
+
|   _**Simple**_ | `\p{Dash}`, `\p{Extender}` | ✓ |
|
358
|
+
|
359
|
+
|
360
|
+
<br/>
|
361
|
+
##### Inapplicable Features
|
362
|
+
|
363
|
+
Some modifiers, like `o` and `s`, apply to the **Regexp** object itself and do not
|
364
|
+
appear in its source. Others such modifiers include the encoding modifiers `e` and `n`
|
365
|
+
[See](http://www.ruby-doc.org/core-2.1.3/Regexp.html#class-Regexp-label-Encoding).
|
366
|
+
These are not seen by the scanner.
|
367
|
+
|
368
|
+
The following features are not currently enabled for Ruby by its regular
|
369
|
+
expressions library (Onigmo). They are not supported by the scanner.
|
370
|
+
|
371
|
+
- **Quotes**: `\Q...\E` _<a href="https://github.com/k-takata/Onigmo/blob/master/doc/RE#L452/" title="Links to master branch, may change">[See]</a>_
|
372
|
+
- **Capture History**: `(?@...)`, `(?@<name>...)` _<a href="https://github.com/k-takata/Onigmo/blob/master/doc/RE#L499" title="Links to master branch, may change">[See]</a>_
|
349
373
|
|
350
374
|
|
351
375
|
See something else missing? Please submit an [issue](https://github.com/ammar/regexp_parser/issues)
|
352
376
|
|
353
|
-
_**Note**: Attempting to process expressions with
|
354
|
-
|
377
|
+
_**Note**: Attempting to process expressions with unsupported syntax features can raise an error,
|
378
|
+
or incorrectly return tokens/objects as literals._
|
355
379
|
|
356
380
|
|
357
381
|
## Testing
|
@@ -366,38 +390,44 @@ tasks, which only run the tests for one component at a time. These are:
|
|
366
390
|
* test:expression
|
367
391
|
* test:syntax
|
368
392
|
|
369
|
-
_A special task 'test:full'
|
370
|
-
runs all the tests. This requires ragel to be installed._
|
393
|
+
_A special task 'test:full' generates the scanner's code from the ragel source files and
|
394
|
+
runs all the tests. This task requires ragel to be installed._
|
371
395
|
|
372
396
|
|
373
|
-
The tests use ruby's
|
397
|
+
The tests use ruby's test/unit, so they can also be run with:
|
374
398
|
|
375
399
|
```
|
376
|
-
ruby test/test_all.rb
|
400
|
+
ruby -Ilib test/test_all.rb
|
377
401
|
```
|
378
402
|
|
379
403
|
This is useful when there is a need to focus on specific test files, for example:
|
380
404
|
|
381
405
|
```
|
382
|
-
ruby test/scanner/test_properties.rb
|
406
|
+
ruby -Ilib test/scanner/test_properties.rb
|
407
|
+
```
|
408
|
+
|
409
|
+
It is sometimes helpful during development to focus on a specific test case, for example:
|
410
|
+
|
411
|
+
```
|
412
|
+
ruby -Ilib test/expression/test_base.rb -n test_expression_to_re
|
383
413
|
```
|
384
414
|
|
385
415
|
|
386
416
|
## Building
|
387
|
-
Building the scanner and the gem requires [ragel](http://www.
|
417
|
+
Building the scanner and the gem requires [ragel](http://www.colm.net/open-source/ragel/) to be
|
388
418
|
installed. The build tasks will automatically invoke the 'ragel:rb' task to generate the
|
389
419
|
ruby scanner code.
|
390
420
|
|
391
421
|
|
392
|
-
The project uses the standard rubygems package tasks:
|
422
|
+
The project uses the standard rubygems package tasks, so:
|
393
423
|
|
394
424
|
|
395
|
-
To build, run:
|
425
|
+
To build the gem, run:
|
396
426
|
```
|
397
427
|
rake build
|
398
428
|
```
|
399
429
|
|
400
|
-
To install, run:
|
430
|
+
To install the gem from the cloned project, run:
|
401
431
|
```
|
402
432
|
rake install
|
403
433
|
```
|
@@ -408,14 +438,15 @@ Documentation and books used while working on this project.
|
|
408
438
|
|
409
439
|
|
410
440
|
#### Ruby Flavors
|
411
|
-
* Oniguruma Regular Expressions [link](http://www.geocities.jp/kosako3/oniguruma/doc/RE.txt)
|
412
|
-
*
|
441
|
+
* Oniguruma Regular Expressions (Ruby 1.9.x) [link](http://www.geocities.jp/kosako3/oniguruma/doc/RE.txt)
|
442
|
+
* Onigmo Regular Expressions (Ruby >= 2.0) [link](https://github.com/k-takata/Onigmo/blob/master/doc/RE)
|
413
443
|
|
414
444
|
|
415
445
|
#### Regular Expressions
|
416
446
|
* Mastering Regular Expressions, By Jeffrey E.F. Friedl (2nd Edition) [book](http://oreilly.com/catalog/9781565922570/)
|
417
447
|
* Regular Expression Flavor Comparison [link](http://www.regular-expressions.info/refflavors.html)
|
418
448
|
* Enumerating the strings of regular languages [link](http://www.cs.dartmouth.edu/~doug/nfa.ps.gz)
|
449
|
+
* Stack Overflow Regular Expressions FAQ [link](http://stackoverflow.com/questions/22937618/reference-what-does-this-regex-mean/22944075#22944075)
|
419
450
|
|
420
451
|
|
421
452
|
#### Unicode
|
@@ -425,18 +456,6 @@ Documentation and books used while working on this project.
|
|
425
456
|
* Unicode Regular Expressions [link](http://www.unicode.org/reports/tr18/)
|
426
457
|
* Unicode Standard Annex #44 [link](http://www.unicode.org/reports/tr44/)
|
427
458
|
|
428
|
-
## Thanks
|
429
|
-
This work is based on and inspired by the hard work and ideas of many people,
|
430
|
-
directly or indirectly. The following are only a few of those that should be
|
431
|
-
thanked.
|
432
|
-
|
433
|
-
* Adrian Thurston, for developing [ragel](http://www.complang.org/ragel/).
|
434
|
-
* Caleb Clausen, for feedback, which inspired this, valuable insights on structuring the parser,
|
435
|
-
and lots of [cool code](http://github.com/coatl).
|
436
|
-
* Jan Goyvaerts, for his [excellent resource](http://www.regular-expressions.info) on regular expressions.
|
437
|
-
* Run Paint Run Run, for his work on [Read Ruby](https://github.com/runpaint/read-ruby)
|
438
|
-
* Yukihiro Matsumoto, of course! For "The Ruby", of course!
|
439
|
-
|
440
459
|
|
441
460
|
---
|
442
461
|
##### Copyright
|