regexp_parser 1.7.0 → 2.8.3
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/Gemfile +8 -2
- data/LICENSE +1 -1
- data/Rakefile +6 -70
- data/lib/regexp_parser/error.rb +4 -0
- data/lib/regexp_parser/expression/base.rb +76 -0
- data/lib/regexp_parser/expression/classes/alternation.rb +1 -1
- data/lib/regexp_parser/expression/classes/anchor.rb +0 -2
- data/lib/regexp_parser/expression/classes/{backref.rb → backreference.rb} +22 -2
- data/lib/regexp_parser/expression/classes/{set → character_set}/range.rb +4 -8
- data/lib/regexp_parser/expression/classes/{set.rb → character_set.rb} +3 -4
- data/lib/regexp_parser/expression/classes/{type.rb → character_type.rb} +0 -2
- data/lib/regexp_parser/expression/classes/conditional.rb +11 -5
- data/lib/regexp_parser/expression/classes/{escape.rb → escape_sequence.rb} +15 -7
- data/lib/regexp_parser/expression/classes/free_space.rb +5 -5
- data/lib/regexp_parser/expression/classes/group.rb +28 -15
- data/lib/regexp_parser/expression/classes/keep.rb +2 -0
- data/lib/regexp_parser/expression/classes/literal.rb +1 -5
- data/lib/regexp_parser/expression/classes/posix_class.rb +5 -1
- data/lib/regexp_parser/expression/classes/root.rb +4 -19
- data/lib/regexp_parser/expression/classes/{property.rb → unicode_property.rb} +5 -3
- data/lib/regexp_parser/expression/methods/construct.rb +41 -0
- data/lib/regexp_parser/expression/methods/human_name.rb +43 -0
- data/lib/regexp_parser/expression/methods/match_length.rb +11 -7
- data/lib/regexp_parser/expression/methods/parts.rb +23 -0
- data/lib/regexp_parser/expression/methods/printing.rb +26 -0
- data/lib/regexp_parser/expression/methods/strfregexp.rb +1 -1
- data/lib/regexp_parser/expression/methods/tests.rb +47 -1
- data/lib/regexp_parser/expression/methods/traverse.rb +34 -18
- data/lib/regexp_parser/expression/quantifier.rb +57 -17
- data/lib/regexp_parser/expression/sequence.rb +11 -47
- data/lib/regexp_parser/expression/sequence_operation.rb +4 -9
- data/lib/regexp_parser/expression/shared.rb +111 -0
- data/lib/regexp_parser/expression/subexpression.rb +27 -19
- data/lib/regexp_parser/expression.rb +14 -141
- data/lib/regexp_parser/lexer.rb +83 -41
- data/lib/regexp_parser/parser.rb +371 -429
- data/lib/regexp_parser/scanner/char_type.rl +11 -11
- data/lib/regexp_parser/scanner/errors/premature_end_error.rb +8 -0
- data/lib/regexp_parser/scanner/errors/scanner_error.rb +6 -0
- data/lib/regexp_parser/scanner/errors/validation_error.rb +63 -0
- data/lib/regexp_parser/scanner/properties/long.csv +633 -0
- data/lib/regexp_parser/scanner/properties/short.csv +248 -0
- data/lib/regexp_parser/scanner/property.rl +4 -4
- data/lib/regexp_parser/scanner/scanner.rl +303 -368
- data/lib/regexp_parser/scanner.rb +1423 -1674
- data/lib/regexp_parser/syntax/any.rb +2 -7
- data/lib/regexp_parser/syntax/base.rb +92 -67
- data/lib/regexp_parser/syntax/token/anchor.rb +15 -0
- data/lib/regexp_parser/syntax/{tokens → token}/assertion.rb +2 -2
- data/lib/regexp_parser/syntax/token/backreference.rb +33 -0
- data/lib/regexp_parser/syntax/token/character_set.rb +16 -0
- data/lib/regexp_parser/syntax/{tokens → token}/character_type.rb +3 -3
- data/lib/regexp_parser/syntax/{tokens → token}/conditional.rb +3 -3
- data/lib/regexp_parser/syntax/token/escape.rb +33 -0
- data/lib/regexp_parser/syntax/{tokens → token}/group.rb +7 -7
- data/lib/regexp_parser/syntax/{tokens → token}/keep.rb +1 -1
- data/lib/regexp_parser/syntax/token/meta.rb +20 -0
- data/lib/regexp_parser/syntax/{tokens → token}/posix_class.rb +3 -3
- data/lib/regexp_parser/syntax/token/quantifier.rb +35 -0
- data/lib/regexp_parser/syntax/token/unicode_property.rb +733 -0
- data/lib/regexp_parser/syntax/token/virtual.rb +11 -0
- data/lib/regexp_parser/syntax/token.rb +45 -0
- data/lib/regexp_parser/syntax/version_lookup.rb +19 -36
- data/lib/regexp_parser/syntax/versions/1.8.6.rb +13 -20
- data/lib/regexp_parser/syntax/versions/1.9.1.rb +10 -17
- data/lib/regexp_parser/syntax/versions/1.9.3.rb +3 -10
- data/lib/regexp_parser/syntax/versions/2.0.0.rb +8 -15
- data/lib/regexp_parser/syntax/versions/2.2.0.rb +3 -9
- data/lib/regexp_parser/syntax/versions/2.3.0.rb +3 -9
- data/lib/regexp_parser/syntax/versions/2.4.0.rb +3 -9
- data/lib/regexp_parser/syntax/versions/2.4.1.rb +2 -8
- data/lib/regexp_parser/syntax/versions/2.5.0.rb +3 -9
- data/lib/regexp_parser/syntax/versions/2.6.0.rb +3 -9
- data/lib/regexp_parser/syntax/versions/2.6.2.rb +3 -9
- data/lib/regexp_parser/syntax/versions/2.6.3.rb +3 -9
- data/lib/regexp_parser/syntax/versions/3.1.0.rb +4 -0
- data/lib/regexp_parser/syntax/versions/3.2.0.rb +4 -0
- data/lib/regexp_parser/syntax/versions.rb +3 -1
- data/lib/regexp_parser/syntax.rb +8 -6
- data/lib/regexp_parser/token.rb +9 -20
- data/lib/regexp_parser/version.rb +1 -1
- data/lib/regexp_parser.rb +0 -2
- data/regexp_parser.gemspec +19 -23
- metadata +52 -171
- data/CHANGELOG.md +0 -349
- data/README.md +0 -470
- data/lib/regexp_parser/scanner/properties/long.yml +0 -594
- data/lib/regexp_parser/scanner/properties/short.yml +0 -237
- data/lib/regexp_parser/syntax/tokens/anchor.rb +0 -15
- data/lib/regexp_parser/syntax/tokens/backref.rb +0 -24
- data/lib/regexp_parser/syntax/tokens/character_set.rb +0 -13
- data/lib/regexp_parser/syntax/tokens/escape.rb +0 -30
- data/lib/regexp_parser/syntax/tokens/meta.rb +0 -13
- data/lib/regexp_parser/syntax/tokens/quantifier.rb +0 -35
- data/lib/regexp_parser/syntax/tokens/unicode_property.rb +0 -675
- data/lib/regexp_parser/syntax/tokens.rb +0 -45
- data/spec/expression/base_spec.rb +0 -94
- data/spec/expression/clone_spec.rb +0 -120
- data/spec/expression/conditional_spec.rb +0 -89
- data/spec/expression/free_space_spec.rb +0 -27
- data/spec/expression/methods/match_length_spec.rb +0 -161
- data/spec/expression/methods/match_spec.rb +0 -25
- data/spec/expression/methods/strfregexp_spec.rb +0 -224
- data/spec/expression/methods/tests_spec.rb +0 -99
- data/spec/expression/methods/traverse_spec.rb +0 -161
- data/spec/expression/options_spec.rb +0 -128
- data/spec/expression/root_spec.rb +0 -9
- data/spec/expression/sequence_spec.rb +0 -9
- data/spec/expression/subexpression_spec.rb +0 -50
- data/spec/expression/to_h_spec.rb +0 -26
- data/spec/expression/to_s_spec.rb +0 -100
- data/spec/lexer/all_spec.rb +0 -22
- data/spec/lexer/conditionals_spec.rb +0 -53
- data/spec/lexer/escapes_spec.rb +0 -14
- data/spec/lexer/keep_spec.rb +0 -10
- data/spec/lexer/literals_spec.rb +0 -89
- data/spec/lexer/nesting_spec.rb +0 -99
- data/spec/lexer/refcalls_spec.rb +0 -55
- data/spec/parser/all_spec.rb +0 -43
- data/spec/parser/alternation_spec.rb +0 -88
- data/spec/parser/anchors_spec.rb +0 -17
- data/spec/parser/conditionals_spec.rb +0 -179
- data/spec/parser/errors_spec.rb +0 -30
- data/spec/parser/escapes_spec.rb +0 -121
- data/spec/parser/free_space_spec.rb +0 -130
- data/spec/parser/groups_spec.rb +0 -108
- data/spec/parser/keep_spec.rb +0 -6
- data/spec/parser/posix_classes_spec.rb +0 -8
- data/spec/parser/properties_spec.rb +0 -115
- data/spec/parser/quantifiers_spec.rb +0 -51
- data/spec/parser/refcalls_spec.rb +0 -112
- data/spec/parser/set/intersections_spec.rb +0 -127
- data/spec/parser/set/ranges_spec.rb +0 -111
- data/spec/parser/sets_spec.rb +0 -178
- data/spec/parser/types_spec.rb +0 -18
- data/spec/scanner/all_spec.rb +0 -18
- data/spec/scanner/anchors_spec.rb +0 -21
- data/spec/scanner/conditionals_spec.rb +0 -128
- data/spec/scanner/errors_spec.rb +0 -68
- data/spec/scanner/escapes_spec.rb +0 -53
- data/spec/scanner/free_space_spec.rb +0 -133
- data/spec/scanner/groups_spec.rb +0 -52
- data/spec/scanner/keep_spec.rb +0 -10
- data/spec/scanner/literals_spec.rb +0 -49
- data/spec/scanner/meta_spec.rb +0 -18
- data/spec/scanner/properties_spec.rb +0 -64
- data/spec/scanner/quantifiers_spec.rb +0 -20
- data/spec/scanner/refcalls_spec.rb +0 -36
- data/spec/scanner/sets_spec.rb +0 -102
- data/spec/scanner/types_spec.rb +0 -14
- data/spec/spec_helper.rb +0 -15
- data/spec/support/runner.rb +0 -42
- data/spec/support/shared_examples.rb +0 -77
- data/spec/support/warning_extractor.rb +0 -60
- data/spec/syntax/syntax_spec.rb +0 -48
- data/spec/syntax/syntax_token_map_spec.rb +0 -23
- data/spec/syntax/versions/1.8.6_spec.rb +0 -17
- data/spec/syntax/versions/1.9.1_spec.rb +0 -10
- data/spec/syntax/versions/1.9.3_spec.rb +0 -9
- data/spec/syntax/versions/2.0.0_spec.rb +0 -13
- data/spec/syntax/versions/2.2.0_spec.rb +0 -9
- data/spec/syntax/versions/aliases_spec.rb +0 -37
- data/spec/token/token_spec.rb +0 -85
- /data/lib/regexp_parser/expression/classes/{set → character_set}/intersection.rb +0 -0
data/README.md
DELETED
@@ -1,470 +0,0 @@
|
|
1
|
-
# Regexp::Parser
|
2
|
-
|
3
|
-
[![Gem Version](https://badge.fury.io/rb/regexp_parser.svg)](http://badge.fury.io/rb/regexp_parser) [![Build Status](https://secure.travis-ci.org/ammar/regexp_parser.svg?branch=master)](http://travis-ci.org/ammar/regexp_parser) [![Code Climate](https://codeclimate.com/github/ammar/regexp_parser.svg)](https://codeclimate.com/github/ammar/regexp_parser/badges)
|
4
|
-
|
5
|
-
A Ruby gem for tokenizing, parsing, and transforming regular expressions.
|
6
|
-
|
7
|
-
* Multilayered
|
8
|
-
* A scanner/tokenizer based on [Ragel](http://www.colm.net/open-source/ragel/)
|
9
|
-
* A lexer that produces a "stream" of token objects.
|
10
|
-
* A parser that produces a "tree" of Expression objects (OO API)
|
11
|
-
* Runs on Ruby 1.9, 2.x, and JRuby (1.9 mode) runtimes.
|
12
|
-
* Recognizes Ruby 1.8, 1.9, and 2.x regular expressions [See Supported Syntax](#supported-syntax)
|
13
|
-
|
14
|
-
|
15
|
-
_For examples of regexp_parser in use, see [Example Projects](#example-projects)._
|
16
|
-
|
17
|
-
|
18
|
-
---
|
19
|
-
## Requirements
|
20
|
-
|
21
|
-
* Ruby >= 1.9
|
22
|
-
* Ragel >= 6.0, but only if you want to build the gem or work on the scanner.
|
23
|
-
|
24
|
-
|
25
|
-
_Note: See the .travis.yml file for covered versions._
|
26
|
-
|
27
|
-
|
28
|
-
---
|
29
|
-
## Install
|
30
|
-
|
31
|
-
Install the gem with:
|
32
|
-
|
33
|
-
`gem install regexp_parser`
|
34
|
-
|
35
|
-
Or, add it to your project's `Gemfile`:
|
36
|
-
|
37
|
-
```gem 'regexp_parser', '~> X.Y.Z'```
|
38
|
-
|
39
|
-
See rubygems for the the [latest version number](https://rubygems.org/gems/regexp_parser)
|
40
|
-
|
41
|
-
|
42
|
-
---
|
43
|
-
## Usage
|
44
|
-
|
45
|
-
The three main modules are **Scanner**, **Lexer**, and **Parser**. Each of them
|
46
|
-
provides a single method that takes a regular expression (as a RegExp object or
|
47
|
-
a string) and returns its results. The **Lexer** and the **Parser** accept an
|
48
|
-
optional second argument that specifies the syntax version, like 'ruby/2.0',
|
49
|
-
which defaults to the host Ruby version (using RUBY_VERSION).
|
50
|
-
|
51
|
-
Here are the basic usage examples:
|
52
|
-
|
53
|
-
```ruby
|
54
|
-
require 'regexp_parser'
|
55
|
-
|
56
|
-
Regexp::Scanner.scan(regexp)
|
57
|
-
|
58
|
-
Regexp::Lexer.lex(regexp)
|
59
|
-
|
60
|
-
Regexp::Parser.parse(regexp)
|
61
|
-
```
|
62
|
-
|
63
|
-
All three methods accept a block as the last argument, which, if given, gets
|
64
|
-
called with the results as follows:
|
65
|
-
|
66
|
-
* **Scanner**: the block gets passed the results as they are scanned. See the
|
67
|
-
example in the next section for details.
|
68
|
-
|
69
|
-
* **Lexer**: after completion, the block gets passed the tokens one by one.
|
70
|
-
_The result of the block is returned._
|
71
|
-
|
72
|
-
* **Parser**: after completion, the block gets passed the root expression.
|
73
|
-
_The result of the block is returned._
|
74
|
-
|
75
|
-
|
76
|
-
---
|
77
|
-
## Components
|
78
|
-
|
79
|
-
### Scanner
|
80
|
-
A Ragel-generated scanner that recognizes the cumulative syntax of all
|
81
|
-
supported syntax versions. It breaks a given expression's text into the
|
82
|
-
smallest parts, and identifies their type, token, text, and start/end
|
83
|
-
offsets within the pattern.
|
84
|
-
|
85
|
-
|
86
|
-
#### Example
|
87
|
-
The following scans the given pattern and prints out the type, token, text and
|
88
|
-
start/end offsets for each token found.
|
89
|
-
|
90
|
-
```ruby
|
91
|
-
require 'regexp_parser'
|
92
|
-
|
93
|
-
Regexp::Scanner.scan /(ab?(cd)*[e-h]+)/ do |type, token, text, ts, te|
|
94
|
-
puts "type: #{type}, token: #{token}, text: '#{text}' [#{ts}..#{te}]"
|
95
|
-
end
|
96
|
-
|
97
|
-
# output
|
98
|
-
# type: group, token: capture, text: '(' [0..1]
|
99
|
-
# type: literal, token: literal, text: 'ab' [1..3]
|
100
|
-
# type: quantifier, token: zero_or_one, text: '?' [3..4]
|
101
|
-
# type: group, token: capture, text: '(' [4..5]
|
102
|
-
# type: literal, token: literal, text: 'cd' [5..7]
|
103
|
-
# type: group, token: close, text: ')' [7..8]
|
104
|
-
# type: quantifier, token: zero_or_more, text: '*' [8..9]
|
105
|
-
# type: set, token: open, text: '[' [9..10]
|
106
|
-
# type: set, token: range, text: 'e-h' [10..13]
|
107
|
-
# type: set, token: close, text: ']' [13..14]
|
108
|
-
# type: quantifier, token: one_or_more, text: '+' [14..15]
|
109
|
-
# type: group, token: close, text: ')' [15..16]
|
110
|
-
```
|
111
|
-
|
112
|
-
A one-liner that uses map on the result of the scan to return the textual
|
113
|
-
parts of the pattern:
|
114
|
-
|
115
|
-
```ruby
|
116
|
-
Regexp::Scanner.scan( /(cat?([bhm]at)){3,5}/ ).map {|token| token[2]}
|
117
|
-
#=> ["(", "cat", "?", "(", "[", "b", "h", "m", "]", "at", ")", ")", "{3,5}"]
|
118
|
-
```
|
119
|
-
|
120
|
-
|
121
|
-
#### Notes
|
122
|
-
* The scanner performs basic syntax error checking, like detecting missing
|
123
|
-
balancing punctuation and premature end of pattern. Flavor validity checks
|
124
|
-
are performed in the lexer, which uses a syntax object.
|
125
|
-
|
126
|
-
* If the input is a Ruby **Regexp** object, the scanner calls #source on it to
|
127
|
-
get its string representation. #source does not include the options of
|
128
|
-
the expression (m, i, and x). To include the options in the scan, #to_s
|
129
|
-
should be called on the **Regexp** before passing it to the scanner or the
|
130
|
-
lexer. For the parser, however, this is not necessary. It automatically
|
131
|
-
exposes the options of a passed **Regexp** in the returned root expression.
|
132
|
-
|
133
|
-
* To keep the scanner simple(r) and fairly reusable for other purposes, it
|
134
|
-
does not perform lexical analysis on the tokens, sticking to the task
|
135
|
-
of identifying the smallest possible tokens and leaving lexical analysis
|
136
|
-
to the lexer.
|
137
|
-
|
138
|
-
* The MRI implementation may accept expressions that either conflict with
|
139
|
-
the documentation or are undocumented. The scanner does not support such
|
140
|
-
implementation quirks.
|
141
|
-
_(See issues [#3](https://github.com/ammar/regexp_parser/issues/3) and
|
142
|
-
[#15](https://github.com/ammar/regexp_parser/issues/15) for examples)_
|
143
|
-
|
144
|
-
|
145
|
-
---
|
146
|
-
### Syntax
|
147
|
-
Defines the supported tokens for a specific engine implementation (aka a
|
148
|
-
flavor). Syntax classes act as lookup tables, and are layered to create
|
149
|
-
flavor variations. Syntax only comes into play in the lexer.
|
150
|
-
|
151
|
-
#### Example
|
152
|
-
The following instantiates syntax objects for Ruby 2.0, 1.9, 1.8, and
|
153
|
-
checks a few of their implementation features.
|
154
|
-
|
155
|
-
```ruby
|
156
|
-
require 'regexp_parser'
|
157
|
-
|
158
|
-
ruby_20 = Regexp::Syntax.new 'ruby/2.0'
|
159
|
-
ruby_20.implements? :quantifier, :zero_or_one # => true
|
160
|
-
ruby_20.implements? :quantifier, :zero_or_one_reluctant # => true
|
161
|
-
ruby_20.implements? :quantifier, :zero_or_one_possessive # => true
|
162
|
-
ruby_20.implements? :conditional, :condition # => true
|
163
|
-
|
164
|
-
ruby_19 = Regexp::Syntax.new 'ruby/1.9'
|
165
|
-
ruby_19.implements? :quantifier, :zero_or_one # => true
|
166
|
-
ruby_19.implements? :quantifier, :zero_or_one_reluctant # => true
|
167
|
-
ruby_19.implements? :quantifier, :zero_or_one_possessive # => true
|
168
|
-
ruby_19.implements? :conditional, :condition # => false
|
169
|
-
|
170
|
-
ruby_18 = Regexp::Syntax.new 'ruby/1.8'
|
171
|
-
ruby_18.implements? :quantifier, :zero_or_one # => true
|
172
|
-
ruby_18.implements? :quantifier, :zero_or_one_reluctant # => true
|
173
|
-
ruby_18.implements? :quantifier, :zero_or_one_possessive # => false
|
174
|
-
ruby_18.implements? :conditional, :condition # => false
|
175
|
-
```
|
176
|
-
|
177
|
-
|
178
|
-
#### Notes
|
179
|
-
* Variations on a token, for example a named group with angle brackets (< and >)
|
180
|
-
vs one with a pair of single quotes, are specified with an underscore followed
|
181
|
-
by two characters appended to the base token. In the previous named group example,
|
182
|
-
the tokens would be :named_ab (angle brackets) and :named_sq (single quotes).
|
183
|
-
These variations are normalized by the syntax to :named.
|
184
|
-
|
185
|
-
|
186
|
-
---
|
187
|
-
### Lexer
|
188
|
-
Sits on top of the scanner and performs lexical analysis on the tokens that
|
189
|
-
it emits. Among its tasks are; breaking quantified literal runs, collecting the
|
190
|
-
emitted token attributes into Token objects, calculating their nesting depth,
|
191
|
-
normalizing tokens for the parser, and checking if the tokens are implemented by
|
192
|
-
the given syntax version.
|
193
|
-
|
194
|
-
See the [Token Objects](https://github.com/ammar/regexp_parser/wiki/Token-Objects)
|
195
|
-
wiki page for more information on Token objects.
|
196
|
-
|
197
|
-
|
198
|
-
#### Example
|
199
|
-
The following example lexes the given pattern, checks it against the Ruby 1.9
|
200
|
-
syntax, and prints the token objects' text indented to their level.
|
201
|
-
|
202
|
-
```ruby
|
203
|
-
require 'regexp_parser'
|
204
|
-
|
205
|
-
Regexp::Lexer.lex /a?(b(c))*[d]+/, 'ruby/1.9' do |token|
|
206
|
-
puts "#{' ' * token.level}#{token.text}"
|
207
|
-
end
|
208
|
-
|
209
|
-
# output
|
210
|
-
# a
|
211
|
-
# ?
|
212
|
-
# (
|
213
|
-
# b
|
214
|
-
# (
|
215
|
-
# c
|
216
|
-
# )
|
217
|
-
# )
|
218
|
-
# *
|
219
|
-
# [
|
220
|
-
# d
|
221
|
-
# ]
|
222
|
-
# +
|
223
|
-
```
|
224
|
-
|
225
|
-
A one-liner that returns an array of the textual parts of the given pattern.
|
226
|
-
Compare the output with that of the one-liner example of the **Scanner**; notably
|
227
|
-
how the sequence 'cat' is treated. The 't' is separated because it's followed
|
228
|
-
by a quantifier that only applies to it.
|
229
|
-
|
230
|
-
```ruby
|
231
|
-
Regexp::Lexer.scan( /(cat?([b]at)){3,5}/ ).map {|token| token.text}
|
232
|
-
#=> ["(", "ca", "t", "?", "(", "[", "b", "]", "at", ")", ")", "{3,5}"]
|
233
|
-
```
|
234
|
-
|
235
|
-
#### Notes
|
236
|
-
* The syntax argument is optional. It defaults to the version of the Ruby
|
237
|
-
interpreter in use, as returned by RUBY_VERSION.
|
238
|
-
|
239
|
-
* The lexer normalizes some tokens, as noted in the Syntax section above.
|
240
|
-
|
241
|
-
|
242
|
-
---
|
243
|
-
### Parser
|
244
|
-
Sits on top of the lexer and transforms the "stream" of Token objects emitted
|
245
|
-
by it into a tree of Expression objects represented by an instance of the
|
246
|
-
Expression::Root class.
|
247
|
-
|
248
|
-
See the [Expression Objects](https://github.com/ammar/regexp_parser/wiki/Expression-Objects)
|
249
|
-
wiki page for attributes and methods.
|
250
|
-
|
251
|
-
|
252
|
-
#### Example
|
253
|
-
|
254
|
-
```ruby
|
255
|
-
require 'regexp_parser'
|
256
|
-
|
257
|
-
regex = /a?(b+(c)d)*(?<name>[0-9]+)/
|
258
|
-
|
259
|
-
tree = Regexp::Parser.parse( regex, 'ruby/2.1' )
|
260
|
-
|
261
|
-
tree.traverse do |event, exp|
|
262
|
-
puts "#{event}: #{exp.type} `#{exp.to_s}`"
|
263
|
-
end
|
264
|
-
|
265
|
-
# Output
|
266
|
-
# visit: literal `a?`
|
267
|
-
# enter: group `(b+(c)d)*`
|
268
|
-
# visit: literal `b+`
|
269
|
-
# enter: group `(c)`
|
270
|
-
# visit: literal `c`
|
271
|
-
# exit: group `(c)`
|
272
|
-
# visit: literal `d`
|
273
|
-
# exit: group `(b+(c)d)*`
|
274
|
-
# enter: group `(?<name>[0-9]+)`
|
275
|
-
# visit: set `[0-9]+`
|
276
|
-
# exit: group `(?<name>[0-9]+)`
|
277
|
-
```
|
278
|
-
|
279
|
-
Another example, using each_expression and strfregexp to print the object tree.
|
280
|
-
_See the traverse.rb and strfregexp.rb files under `lib/regexp_parser/expression/methods`
|
281
|
-
for more information on these methods._
|
282
|
-
|
283
|
-
```ruby
|
284
|
-
include_root = true
|
285
|
-
indent_offset = include_root ? 1 : 0
|
286
|
-
|
287
|
-
tree.each_expression(include_root) do |exp, level_index|
|
288
|
-
puts exp.strfregexp("%>> %c", indent_offset)
|
289
|
-
end
|
290
|
-
|
291
|
-
# Output
|
292
|
-
# > Regexp::Expression::Root
|
293
|
-
# > Regexp::Expression::Literal
|
294
|
-
# > Regexp::Expression::Group::Capture
|
295
|
-
# > Regexp::Expression::Literal
|
296
|
-
# > Regexp::Expression::Group::Capture
|
297
|
-
# > Regexp::Expression::Literal
|
298
|
-
# > Regexp::Expression::Literal
|
299
|
-
# > Regexp::Expression::Group::Named
|
300
|
-
# > Regexp::Expression::CharacterSet
|
301
|
-
```
|
302
|
-
|
303
|
-
_Note: quantifiers do not appear in the output because they are members of the
|
304
|
-
Expression class. See the next section for details._
|
305
|
-
|
306
|
-
|
307
|
-
---
|
308
|
-
|
309
|
-
|
310
|
-
## Supported Syntax
|
311
|
-
The three modules support all the regular expression syntax features of Ruby 1.8,
|
312
|
-
1.9, and 2.x:
|
313
|
-
|
314
|
-
_Note that not all of these are available in all versions of Ruby_
|
315
|
-
|
316
|
-
|
317
|
-
| Syntax Feature | Examples | ⋯ |
|
318
|
-
| ------------------------------------- | ------------------------------------------------------- |:--------:|
|
319
|
-
| **Alternation** | `a\|b\|c` | ✓ |
|
320
|
-
| **Anchors** | `\A`, `^`, `\b` | ✓ |
|
321
|
-
| **Character Classes** | `[abc]`, `[^\\]`, `[a-d&&aeiou]`, `[a=e=b]` | ✓ |
|
322
|
-
| **Character Types** | `\d`, `\H`, `\s` | ✓ |
|
323
|
-
| **Cluster Types** | `\R`, `\X` | ✓ |
|
324
|
-
| **Conditional Exps.** | `(?(cond)yes-subexp)`, `(?(cond)yes-subexp\|no-subexp)` | ✓ |
|
325
|
-
| **Escape Sequences** | `\t`, `\\+`, `\?` | ✓ |
|
326
|
-
| **Free Space** | whitespace and `# Comments` _(x modifier)_ | ✓ |
|
327
|
-
| **Grouped Exps.** | | ⋱ |
|
328
|
-
|   _**Assertions**_ | | ⋱ |
|
329
|
-
|   _Lookahead_ | `(?=abc)` | ✓ |
|
330
|
-
|   _Negative Lookahead_ | `(?!abc)` | ✓ |
|
331
|
-
|   _Lookbehind_ | `(?<=abc)` | ✓ |
|
332
|
-
|   _Negative Lookbehind_ | `(?<!abc)` | ✓ |
|
333
|
-
|   _**Atomic**_ | `(?>abc)` | ✓ |
|
334
|
-
|   _**Absence**_ | `(?~abc)` | ✓ |
|
335
|
-
|   _**Back-references**_ | | ⋱ |
|
336
|
-
|   _Named_ | `\k<name>` | ✓ |
|
337
|
-
|   _Nest Level_ | `\k<n-1>` | ✓ |
|
338
|
-
|   _Numbered_ | `\k<1>` | ✓ |
|
339
|
-
|   _Relative_ | `\k<-2>` | ✓ |
|
340
|
-
|   _Traditional_ | `\1` thru `\9` | ✓ |
|
341
|
-
|   _**Capturing**_ | `(abc)` | ✓ |
|
342
|
-
|   _**Comments**_ | `(?# comment text)` | ✓ |
|
343
|
-
|   _**Named**_ | `(?<name>abc)`, `(?'name'abc)` | ✓ |
|
344
|
-
|   _**Options**_ | `(?mi-x:abc)`, `(?a:\s\w+)`, `(?i)` | ✓ |
|
345
|
-
|   _**Passive**_ | `(?:abc)` | ✓ |
|
346
|
-
|   _**Subexp. Calls**_ | `\g<name>`, `\g<1>` | ✓ |
|
347
|
-
| **Keep** | `\K`, `(ab\Kc\|d\Ke)f` | ✓ |
|
348
|
-
| **Literals** _(utf-8)_ | `Ruby`, `ルビー`, `روبي` | ✓ |
|
349
|
-
| **POSIX Classes** | `[:alpha:]`, `[:^digit:]` | ✓ |
|
350
|
-
| **Quantifiers** | | ⋱ |
|
351
|
-
|   _**Greedy**_ | `?`, `*`, `+`, `{m,M}` | ✓ |
|
352
|
-
|   _**Reluctant** (Lazy)_ | `??`, `*?`, `+?`, `{m,M}?` | ✓ |
|
353
|
-
|   _**Possessive**_ | `?+`, `*+`, `++`, `{m,M}+` | ✓ |
|
354
|
-
| **String Escapes** | | ⋱ |
|
355
|
-
|   _**Control**_ | `\C-C`, `\cD` | ✓ |
|
356
|
-
|   _**Hex**_ | `\x20`, `\x{701230}` | ✓ |
|
357
|
-
|   _**Meta**_ | `\M-c`, `\M-\C-C`, `\M-\cC`, `\C-\M-C`, `\c\M-C` | ✓ |
|
358
|
-
|   _**Octal**_ | `\0`, `\01`, `\012` | ✓ |
|
359
|
-
|   _**Unicode**_ | `\uHHHH`, `\u{H+ H+}` | ✓ |
|
360
|
-
| **Unicode Properties** | _<sub>([Unicode 11.0.0](http://www.unicode.org/versions/Unicode11.0.0/))</sub>_ | ⋱ |
|
361
|
-
|   _**Age**_ | `\p{Age=5.2}`, `\P{age=7.0}`, `\p{^age=8.0}` | ✓ |
|
362
|
-
|   _**Blocks**_ | `\p{InArmenian}`, `\P{InKhmer}`, `\p{^InThai}` | ✓ |
|
363
|
-
|   _**Classes**_ | `\p{Alpha}`, `\P{Space}`, `\p{^Alnum}` | ✓ |
|
364
|
-
|   _**Derived**_ | `\p{Math}`, `\P{Lowercase}`, `\p{^Cased}` | ✓ |
|
365
|
-
|   _**General Categories**_ | `\p{Lu}`, `\P{Cs}`, `\p{^sc}` | ✓ |
|
366
|
-
|   _**Scripts**_ | `\p{Arabic}`, `\P{Hiragana}`, `\p{^Greek}` | ✓ |
|
367
|
-
|   _**Simple**_ | `\p{Dash}`, `\p{Extender}`, `\p{^Hyphen}` | ✓ |
|
368
|
-
|
369
|
-
##### Inapplicable Features
|
370
|
-
|
371
|
-
Some modifiers, like `o` and `s`, apply to the **Regexp** object itself and do not
|
372
|
-
appear in its source. Other such modifiers include the encoding modifiers `e` and `n`
|
373
|
-
[See](http://www.ruby-doc.org/core-2.5.0/Regexp.html#class-Regexp-label-Encoding).
|
374
|
-
These are not seen by the scanner.
|
375
|
-
|
376
|
-
The following features are not currently enabled for Ruby by its regular
|
377
|
-
expressions library (Onigmo). They are not supported by the scanner.
|
378
|
-
|
379
|
-
- **Quotes**: `\Q...\E` _[[See]](https://github.com/k-takata/Onigmo/blob/7911409/doc/RE#L499)_
|
380
|
-
- **Capture History**: `(?@...)`, `(?@<name>...)` _[[See]](https://github.com/k-takata/Onigmo/blob/7911409/doc/RE#L550)_
|
381
|
-
|
382
|
-
|
383
|
-
See something missing? Please submit an [issue](https://github.com/ammar/regexp_parser/issues)
|
384
|
-
|
385
|
-
_**Note**: Attempting to process expressions with unsupported syntax features can raise an error,
|
386
|
-
or incorrectly return tokens/objects as literals._
|
387
|
-
|
388
|
-
|
389
|
-
## Testing
|
390
|
-
To run the tests simply run rake from the root directory, as 'test' is the default task.
|
391
|
-
|
392
|
-
It generates the scanner's code from the Ragel source files and runs all the tests, thus it requires Ragel to be installed.
|
393
|
-
|
394
|
-
The tests use RSpec. They can also be run with the test runner that whitelists some warnings:
|
395
|
-
|
396
|
-
```
|
397
|
-
bin/test
|
398
|
-
```
|
399
|
-
|
400
|
-
You can run a specific test like so:
|
401
|
-
|
402
|
-
```
|
403
|
-
bin/test spec/scanner/properties_spec.rb
|
404
|
-
```
|
405
|
-
|
406
|
-
Note that changes to Ragel files will not be reflected when running `rspec` or `bin/test`, so you might want to run:
|
407
|
-
|
408
|
-
```
|
409
|
-
rake ragel:rb && bin/test spec/scanner/properties_spec.rb
|
410
|
-
```
|
411
|
-
|
412
|
-
## Building
|
413
|
-
Building the scanner and the gem requires [Ragel](http://www.colm.net/open-source/ragel/) to be
|
414
|
-
installed. The build tasks will automatically invoke the 'ragel:rb' task to generate the
|
415
|
-
Ruby scanner code.
|
416
|
-
|
417
|
-
|
418
|
-
The project uses the standard rubygems package tasks, so:
|
419
|
-
|
420
|
-
|
421
|
-
To build the gem, run:
|
422
|
-
```
|
423
|
-
rake build
|
424
|
-
```
|
425
|
-
|
426
|
-
To install the gem from the cloned project, run:
|
427
|
-
```
|
428
|
-
rake install
|
429
|
-
```
|
430
|
-
|
431
|
-
|
432
|
-
## Example Projects
|
433
|
-
Projects using regexp_parser.
|
434
|
-
|
435
|
-
- [meta_re](https://github.com/ammar/meta_re) is a regular expression preprocessor with alias support.
|
436
|
-
|
437
|
-
- [mutant](https://github.com/mbj/mutant) (before v0.9.0) manipulates your regular expressions (amongst others) to see if your tests cover their behavior.
|
438
|
-
|
439
|
-
- [twitter-cldr-rb](https://github.com/twitter/twitter-cldr-rb) uses regexp_parser to generate examples of postal codes.
|
440
|
-
|
441
|
-
- [js_regex](https://github.com/janosch-x/js_regex) converts Ruby regular expressions to JavaScript-compatible regular expressions.
|
442
|
-
|
443
|
-
|
444
|
-
## References
|
445
|
-
Documentation and books used while working on this project.
|
446
|
-
|
447
|
-
|
448
|
-
#### Ruby Flavors
|
449
|
-
* Oniguruma Regular Expressions (Ruby 1.9.x) [link](https://github.com/kkos/oniguruma/blob/master/doc/RE)
|
450
|
-
* Onigmo Regular Expressions (Ruby >= 2.0) [link](https://github.com/k-takata/Onigmo/blob/master/doc/RE)
|
451
|
-
|
452
|
-
|
453
|
-
#### Regular Expressions
|
454
|
-
* Mastering Regular Expressions, By Jeffrey E.F. Friedl (2nd Edition) [book](http://oreilly.com/catalog/9781565922570/)
|
455
|
-
* Regular Expression Flavor Comparison [link](http://www.regular-expressions.info/refflavors.html)
|
456
|
-
* Enumerating the strings of regular languages [link](http://www.cs.dartmouth.edu/~doug/nfa.ps.gz)
|
457
|
-
* Stack Overflow Regular Expressions FAQ [link](http://stackoverflow.com/questions/22937618/reference-what-does-this-regex-mean/22944075#22944075)
|
458
|
-
|
459
|
-
|
460
|
-
#### Unicode
|
461
|
-
* Unicode Explained, By Jukka K. Korpela. [book](http://oreilly.com/catalog/9780596101213)
|
462
|
-
* Unicode Derived Properties [link](http://www.unicode.org/Public/UNIDATA/DerivedCoreProperties.txt)
|
463
|
-
* Unicode Property Aliases [link](http://www.unicode.org/Public/UNIDATA/PropertyAliases.txt)
|
464
|
-
* Unicode Regular Expressions [link](http://www.unicode.org/reports/tr18/)
|
465
|
-
* Unicode Standard Annex #44 [link](http://www.unicode.org/reports/tr44/)
|
466
|
-
|
467
|
-
|
468
|
-
---
|
469
|
-
##### Copyright
|
470
|
-
_Copyright (c) 2010-2019 Ammar Ali. See LICENSE file for details._
|