regexp_parser 1.7.0 → 2.9.0

Sign up to get free protection for your applications and to get access to all the features.
Files changed (166) hide show
  1. checksums.yaml +4 -4
  2. data/Gemfile +9 -3
  3. data/LICENSE +1 -1
  4. data/Rakefile +6 -70
  5. data/lib/regexp_parser/error.rb +4 -0
  6. data/lib/regexp_parser/expression/base.rb +76 -0
  7. data/lib/regexp_parser/expression/classes/alternation.rb +1 -1
  8. data/lib/regexp_parser/expression/classes/anchor.rb +0 -2
  9. data/lib/regexp_parser/expression/classes/{backref.rb → backreference.rb} +22 -2
  10. data/lib/regexp_parser/expression/classes/{set → character_set}/range.rb +4 -8
  11. data/lib/regexp_parser/expression/classes/{set.rb → character_set.rb} +4 -8
  12. data/lib/regexp_parser/expression/classes/{type.rb → character_type.rb} +0 -2
  13. data/lib/regexp_parser/expression/classes/conditional.rb +11 -5
  14. data/lib/regexp_parser/expression/classes/{escape.rb → escape_sequence.rb} +15 -7
  15. data/lib/regexp_parser/expression/classes/free_space.rb +5 -5
  16. data/lib/regexp_parser/expression/classes/group.rb +28 -15
  17. data/lib/regexp_parser/expression/classes/keep.rb +2 -0
  18. data/lib/regexp_parser/expression/classes/literal.rb +1 -5
  19. data/lib/regexp_parser/expression/classes/posix_class.rb +5 -5
  20. data/lib/regexp_parser/expression/classes/root.rb +4 -19
  21. data/lib/regexp_parser/expression/classes/{property.rb → unicode_property.rb} +11 -12
  22. data/lib/regexp_parser/expression/methods/construct.rb +41 -0
  23. data/lib/regexp_parser/expression/methods/human_name.rb +43 -0
  24. data/lib/regexp_parser/expression/methods/match_length.rb +11 -7
  25. data/lib/regexp_parser/expression/methods/negative.rb +20 -0
  26. data/lib/regexp_parser/expression/methods/parts.rb +23 -0
  27. data/lib/regexp_parser/expression/methods/printing.rb +26 -0
  28. data/lib/regexp_parser/expression/methods/strfregexp.rb +1 -1
  29. data/lib/regexp_parser/expression/methods/tests.rb +47 -1
  30. data/lib/regexp_parser/expression/methods/traverse.rb +34 -18
  31. data/lib/regexp_parser/expression/quantifier.rb +57 -17
  32. data/lib/regexp_parser/expression/sequence.rb +11 -47
  33. data/lib/regexp_parser/expression/sequence_operation.rb +4 -9
  34. data/lib/regexp_parser/expression/shared.rb +111 -0
  35. data/lib/regexp_parser/expression/subexpression.rb +27 -19
  36. data/lib/regexp_parser/expression.rb +15 -141
  37. data/lib/regexp_parser/lexer.rb +83 -41
  38. data/lib/regexp_parser/parser.rb +372 -429
  39. data/lib/regexp_parser/scanner/char_type.rl +11 -11
  40. data/lib/regexp_parser/scanner/errors/premature_end_error.rb +8 -0
  41. data/lib/regexp_parser/scanner/errors/scanner_error.rb +6 -0
  42. data/lib/regexp_parser/scanner/errors/validation_error.rb +63 -0
  43. data/lib/regexp_parser/scanner/properties/long.csv +651 -0
  44. data/lib/regexp_parser/scanner/properties/short.csv +249 -0
  45. data/lib/regexp_parser/scanner/property.rl +4 -4
  46. data/lib/regexp_parser/scanner/scanner.rl +303 -368
  47. data/lib/regexp_parser/scanner.rb +1423 -1674
  48. data/lib/regexp_parser/syntax/any.rb +2 -7
  49. data/lib/regexp_parser/syntax/base.rb +92 -67
  50. data/lib/regexp_parser/syntax/token/anchor.rb +15 -0
  51. data/lib/regexp_parser/syntax/{tokens → token}/assertion.rb +2 -2
  52. data/lib/regexp_parser/syntax/token/backreference.rb +33 -0
  53. data/lib/regexp_parser/syntax/token/character_set.rb +16 -0
  54. data/lib/regexp_parser/syntax/{tokens → token}/character_type.rb +3 -3
  55. data/lib/regexp_parser/syntax/{tokens → token}/conditional.rb +3 -3
  56. data/lib/regexp_parser/syntax/token/escape.rb +33 -0
  57. data/lib/regexp_parser/syntax/{tokens → token}/group.rb +7 -7
  58. data/lib/regexp_parser/syntax/{tokens → token}/keep.rb +1 -1
  59. data/lib/regexp_parser/syntax/token/meta.rb +20 -0
  60. data/lib/regexp_parser/syntax/{tokens → token}/posix_class.rb +3 -3
  61. data/lib/regexp_parser/syntax/token/quantifier.rb +35 -0
  62. data/lib/regexp_parser/syntax/token/unicode_property.rb +751 -0
  63. data/lib/regexp_parser/syntax/token/virtual.rb +11 -0
  64. data/lib/regexp_parser/syntax/token.rb +45 -0
  65. data/lib/regexp_parser/syntax/version_lookup.rb +19 -36
  66. data/lib/regexp_parser/syntax/versions/1.8.6.rb +13 -20
  67. data/lib/regexp_parser/syntax/versions/1.9.1.rb +10 -17
  68. data/lib/regexp_parser/syntax/versions/1.9.3.rb +3 -10
  69. data/lib/regexp_parser/syntax/versions/2.0.0.rb +8 -15
  70. data/lib/regexp_parser/syntax/versions/2.2.0.rb +3 -9
  71. data/lib/regexp_parser/syntax/versions/2.3.0.rb +3 -9
  72. data/lib/regexp_parser/syntax/versions/2.4.0.rb +3 -9
  73. data/lib/regexp_parser/syntax/versions/2.4.1.rb +2 -8
  74. data/lib/regexp_parser/syntax/versions/2.5.0.rb +3 -9
  75. data/lib/regexp_parser/syntax/versions/2.6.0.rb +3 -9
  76. data/lib/regexp_parser/syntax/versions/2.6.2.rb +3 -9
  77. data/lib/regexp_parser/syntax/versions/2.6.3.rb +3 -9
  78. data/lib/regexp_parser/syntax/versions/3.1.0.rb +4 -0
  79. data/lib/regexp_parser/syntax/versions/3.2.0.rb +4 -0
  80. data/lib/regexp_parser/syntax/versions.rb +3 -1
  81. data/lib/regexp_parser/syntax.rb +8 -6
  82. data/lib/regexp_parser/token.rb +9 -20
  83. data/lib/regexp_parser/version.rb +1 -1
  84. data/lib/regexp_parser.rb +0 -2
  85. data/regexp_parser.gemspec +19 -23
  86. metadata +53 -171
  87. data/CHANGELOG.md +0 -349
  88. data/README.md +0 -470
  89. data/lib/regexp_parser/scanner/properties/long.yml +0 -594
  90. data/lib/regexp_parser/scanner/properties/short.yml +0 -237
  91. data/lib/regexp_parser/syntax/tokens/anchor.rb +0 -15
  92. data/lib/regexp_parser/syntax/tokens/backref.rb +0 -24
  93. data/lib/regexp_parser/syntax/tokens/character_set.rb +0 -13
  94. data/lib/regexp_parser/syntax/tokens/escape.rb +0 -30
  95. data/lib/regexp_parser/syntax/tokens/meta.rb +0 -13
  96. data/lib/regexp_parser/syntax/tokens/quantifier.rb +0 -35
  97. data/lib/regexp_parser/syntax/tokens/unicode_property.rb +0 -675
  98. data/lib/regexp_parser/syntax/tokens.rb +0 -45
  99. data/spec/expression/base_spec.rb +0 -94
  100. data/spec/expression/clone_spec.rb +0 -120
  101. data/spec/expression/conditional_spec.rb +0 -89
  102. data/spec/expression/free_space_spec.rb +0 -27
  103. data/spec/expression/methods/match_length_spec.rb +0 -161
  104. data/spec/expression/methods/match_spec.rb +0 -25
  105. data/spec/expression/methods/strfregexp_spec.rb +0 -224
  106. data/spec/expression/methods/tests_spec.rb +0 -99
  107. data/spec/expression/methods/traverse_spec.rb +0 -161
  108. data/spec/expression/options_spec.rb +0 -128
  109. data/spec/expression/root_spec.rb +0 -9
  110. data/spec/expression/sequence_spec.rb +0 -9
  111. data/spec/expression/subexpression_spec.rb +0 -50
  112. data/spec/expression/to_h_spec.rb +0 -26
  113. data/spec/expression/to_s_spec.rb +0 -100
  114. data/spec/lexer/all_spec.rb +0 -22
  115. data/spec/lexer/conditionals_spec.rb +0 -53
  116. data/spec/lexer/escapes_spec.rb +0 -14
  117. data/spec/lexer/keep_spec.rb +0 -10
  118. data/spec/lexer/literals_spec.rb +0 -89
  119. data/spec/lexer/nesting_spec.rb +0 -99
  120. data/spec/lexer/refcalls_spec.rb +0 -55
  121. data/spec/parser/all_spec.rb +0 -43
  122. data/spec/parser/alternation_spec.rb +0 -88
  123. data/spec/parser/anchors_spec.rb +0 -17
  124. data/spec/parser/conditionals_spec.rb +0 -179
  125. data/spec/parser/errors_spec.rb +0 -30
  126. data/spec/parser/escapes_spec.rb +0 -121
  127. data/spec/parser/free_space_spec.rb +0 -130
  128. data/spec/parser/groups_spec.rb +0 -108
  129. data/spec/parser/keep_spec.rb +0 -6
  130. data/spec/parser/posix_classes_spec.rb +0 -8
  131. data/spec/parser/properties_spec.rb +0 -115
  132. data/spec/parser/quantifiers_spec.rb +0 -51
  133. data/spec/parser/refcalls_spec.rb +0 -112
  134. data/spec/parser/set/intersections_spec.rb +0 -127
  135. data/spec/parser/set/ranges_spec.rb +0 -111
  136. data/spec/parser/sets_spec.rb +0 -178
  137. data/spec/parser/types_spec.rb +0 -18
  138. data/spec/scanner/all_spec.rb +0 -18
  139. data/spec/scanner/anchors_spec.rb +0 -21
  140. data/spec/scanner/conditionals_spec.rb +0 -128
  141. data/spec/scanner/errors_spec.rb +0 -68
  142. data/spec/scanner/escapes_spec.rb +0 -53
  143. data/spec/scanner/free_space_spec.rb +0 -133
  144. data/spec/scanner/groups_spec.rb +0 -52
  145. data/spec/scanner/keep_spec.rb +0 -10
  146. data/spec/scanner/literals_spec.rb +0 -49
  147. data/spec/scanner/meta_spec.rb +0 -18
  148. data/spec/scanner/properties_spec.rb +0 -64
  149. data/spec/scanner/quantifiers_spec.rb +0 -20
  150. data/spec/scanner/refcalls_spec.rb +0 -36
  151. data/spec/scanner/sets_spec.rb +0 -102
  152. data/spec/scanner/types_spec.rb +0 -14
  153. data/spec/spec_helper.rb +0 -15
  154. data/spec/support/runner.rb +0 -42
  155. data/spec/support/shared_examples.rb +0 -77
  156. data/spec/support/warning_extractor.rb +0 -60
  157. data/spec/syntax/syntax_spec.rb +0 -48
  158. data/spec/syntax/syntax_token_map_spec.rb +0 -23
  159. data/spec/syntax/versions/1.8.6_spec.rb +0 -17
  160. data/spec/syntax/versions/1.9.1_spec.rb +0 -10
  161. data/spec/syntax/versions/1.9.3_spec.rb +0 -9
  162. data/spec/syntax/versions/2.0.0_spec.rb +0 -13
  163. data/spec/syntax/versions/2.2.0_spec.rb +0 -9
  164. data/spec/syntax/versions/aliases_spec.rb +0 -37
  165. data/spec/token/token_spec.rb +0 -85
  166. /data/lib/regexp_parser/expression/classes/{set → character_set}/intersection.rb +0 -0
data/README.md DELETED
@@ -1,470 +0,0 @@
1
- # Regexp::Parser
2
-
3
- [![Gem Version](https://badge.fury.io/rb/regexp_parser.svg)](http://badge.fury.io/rb/regexp_parser) [![Build Status](https://secure.travis-ci.org/ammar/regexp_parser.svg?branch=master)](http://travis-ci.org/ammar/regexp_parser) [![Code Climate](https://codeclimate.com/github/ammar/regexp_parser.svg)](https://codeclimate.com/github/ammar/regexp_parser/badges)
4
-
5
- A Ruby gem for tokenizing, parsing, and transforming regular expressions.
6
-
7
- * Multilayered
8
- * A scanner/tokenizer based on [Ragel](http://www.colm.net/open-source/ragel/)
9
- * A lexer that produces a "stream" of token objects.
10
- * A parser that produces a "tree" of Expression objects (OO API)
11
- * Runs on Ruby 1.9, 2.x, and JRuby (1.9 mode) runtimes.
12
- * Recognizes Ruby 1.8, 1.9, and 2.x regular expressions [See Supported Syntax](#supported-syntax)
13
-
14
-
15
- _For examples of regexp_parser in use, see [Example Projects](#example-projects)._
16
-
17
-
18
- ---
19
- ## Requirements
20
-
21
- * Ruby >= 1.9
22
- * Ragel >= 6.0, but only if you want to build the gem or work on the scanner.
23
-
24
-
25
- _Note: See the .travis.yml file for covered versions._
26
-
27
-
28
- ---
29
- ## Install
30
-
31
- Install the gem with:
32
-
33
- `gem install regexp_parser`
34
-
35
- Or, add it to your project's `Gemfile`:
36
-
37
- ```gem 'regexp_parser', '~> X.Y.Z'```
38
-
39
- See rubygems for the the [latest version number](https://rubygems.org/gems/regexp_parser)
40
-
41
-
42
- ---
43
- ## Usage
44
-
45
- The three main modules are **Scanner**, **Lexer**, and **Parser**. Each of them
46
- provides a single method that takes a regular expression (as a RegExp object or
47
- a string) and returns its results. The **Lexer** and the **Parser** accept an
48
- optional second argument that specifies the syntax version, like 'ruby/2.0',
49
- which defaults to the host Ruby version (using RUBY_VERSION).
50
-
51
- Here are the basic usage examples:
52
-
53
- ```ruby
54
- require 'regexp_parser'
55
-
56
- Regexp::Scanner.scan(regexp)
57
-
58
- Regexp::Lexer.lex(regexp)
59
-
60
- Regexp::Parser.parse(regexp)
61
- ```
62
-
63
- All three methods accept a block as the last argument, which, if given, gets
64
- called with the results as follows:
65
-
66
- * **Scanner**: the block gets passed the results as they are scanned. See the
67
- example in the next section for details.
68
-
69
- * **Lexer**: after completion, the block gets passed the tokens one by one.
70
- _The result of the block is returned._
71
-
72
- * **Parser**: after completion, the block gets passed the root expression.
73
- _The result of the block is returned._
74
-
75
-
76
- ---
77
- ## Components
78
-
79
- ### Scanner
80
- A Ragel-generated scanner that recognizes the cumulative syntax of all
81
- supported syntax versions. It breaks a given expression's text into the
82
- smallest parts, and identifies their type, token, text, and start/end
83
- offsets within the pattern.
84
-
85
-
86
- #### Example
87
- The following scans the given pattern and prints out the type, token, text and
88
- start/end offsets for each token found.
89
-
90
- ```ruby
91
- require 'regexp_parser'
92
-
93
- Regexp::Scanner.scan /(ab?(cd)*[e-h]+)/ do |type, token, text, ts, te|
94
- puts "type: #{type}, token: #{token}, text: '#{text}' [#{ts}..#{te}]"
95
- end
96
-
97
- # output
98
- # type: group, token: capture, text: '(' [0..1]
99
- # type: literal, token: literal, text: 'ab' [1..3]
100
- # type: quantifier, token: zero_or_one, text: '?' [3..4]
101
- # type: group, token: capture, text: '(' [4..5]
102
- # type: literal, token: literal, text: 'cd' [5..7]
103
- # type: group, token: close, text: ')' [7..8]
104
- # type: quantifier, token: zero_or_more, text: '*' [8..9]
105
- # type: set, token: open, text: '[' [9..10]
106
- # type: set, token: range, text: 'e-h' [10..13]
107
- # type: set, token: close, text: ']' [13..14]
108
- # type: quantifier, token: one_or_more, text: '+' [14..15]
109
- # type: group, token: close, text: ')' [15..16]
110
- ```
111
-
112
- A one-liner that uses map on the result of the scan to return the textual
113
- parts of the pattern:
114
-
115
- ```ruby
116
- Regexp::Scanner.scan( /(cat?([bhm]at)){3,5}/ ).map {|token| token[2]}
117
- #=> ["(", "cat", "?", "(", "[", "b", "h", "m", "]", "at", ")", ")", "{3,5}"]
118
- ```
119
-
120
-
121
- #### Notes
122
- * The scanner performs basic syntax error checking, like detecting missing
123
- balancing punctuation and premature end of pattern. Flavor validity checks
124
- are performed in the lexer, which uses a syntax object.
125
-
126
- * If the input is a Ruby **Regexp** object, the scanner calls #source on it to
127
- get its string representation. #source does not include the options of
128
- the expression (m, i, and x). To include the options in the scan, #to_s
129
- should be called on the **Regexp** before passing it to the scanner or the
130
- lexer. For the parser, however, this is not necessary. It automatically
131
- exposes the options of a passed **Regexp** in the returned root expression.
132
-
133
- * To keep the scanner simple(r) and fairly reusable for other purposes, it
134
- does not perform lexical analysis on the tokens, sticking to the task
135
- of identifying the smallest possible tokens and leaving lexical analysis
136
- to the lexer.
137
-
138
- * The MRI implementation may accept expressions that either conflict with
139
- the documentation or are undocumented. The scanner does not support such
140
- implementation quirks.
141
- _(See issues [#3](https://github.com/ammar/regexp_parser/issues/3) and
142
- [#15](https://github.com/ammar/regexp_parser/issues/15) for examples)_
143
-
144
-
145
- ---
146
- ### Syntax
147
- Defines the supported tokens for a specific engine implementation (aka a
148
- flavor). Syntax classes act as lookup tables, and are layered to create
149
- flavor variations. Syntax only comes into play in the lexer.
150
-
151
- #### Example
152
- The following instantiates syntax objects for Ruby 2.0, 1.9, 1.8, and
153
- checks a few of their implementation features.
154
-
155
- ```ruby
156
- require 'regexp_parser'
157
-
158
- ruby_20 = Regexp::Syntax.new 'ruby/2.0'
159
- ruby_20.implements? :quantifier, :zero_or_one # => true
160
- ruby_20.implements? :quantifier, :zero_or_one_reluctant # => true
161
- ruby_20.implements? :quantifier, :zero_or_one_possessive # => true
162
- ruby_20.implements? :conditional, :condition # => true
163
-
164
- ruby_19 = Regexp::Syntax.new 'ruby/1.9'
165
- ruby_19.implements? :quantifier, :zero_or_one # => true
166
- ruby_19.implements? :quantifier, :zero_or_one_reluctant # => true
167
- ruby_19.implements? :quantifier, :zero_or_one_possessive # => true
168
- ruby_19.implements? :conditional, :condition # => false
169
-
170
- ruby_18 = Regexp::Syntax.new 'ruby/1.8'
171
- ruby_18.implements? :quantifier, :zero_or_one # => true
172
- ruby_18.implements? :quantifier, :zero_or_one_reluctant # => true
173
- ruby_18.implements? :quantifier, :zero_or_one_possessive # => false
174
- ruby_18.implements? :conditional, :condition # => false
175
- ```
176
-
177
-
178
- #### Notes
179
- * Variations on a token, for example a named group with angle brackets (< and >)
180
- vs one with a pair of single quotes, are specified with an underscore followed
181
- by two characters appended to the base token. In the previous named group example,
182
- the tokens would be :named_ab (angle brackets) and :named_sq (single quotes).
183
- These variations are normalized by the syntax to :named.
184
-
185
-
186
- ---
187
- ### Lexer
188
- Sits on top of the scanner and performs lexical analysis on the tokens that
189
- it emits. Among its tasks are; breaking quantified literal runs, collecting the
190
- emitted token attributes into Token objects, calculating their nesting depth,
191
- normalizing tokens for the parser, and checking if the tokens are implemented by
192
- the given syntax version.
193
-
194
- See the [Token Objects](https://github.com/ammar/regexp_parser/wiki/Token-Objects)
195
- wiki page for more information on Token objects.
196
-
197
-
198
- #### Example
199
- The following example lexes the given pattern, checks it against the Ruby 1.9
200
- syntax, and prints the token objects' text indented to their level.
201
-
202
- ```ruby
203
- require 'regexp_parser'
204
-
205
- Regexp::Lexer.lex /a?(b(c))*[d]+/, 'ruby/1.9' do |token|
206
- puts "#{' ' * token.level}#{token.text}"
207
- end
208
-
209
- # output
210
- # a
211
- # ?
212
- # (
213
- # b
214
- # (
215
- # c
216
- # )
217
- # )
218
- # *
219
- # [
220
- # d
221
- # ]
222
- # +
223
- ```
224
-
225
- A one-liner that returns an array of the textual parts of the given pattern.
226
- Compare the output with that of the one-liner example of the **Scanner**; notably
227
- how the sequence 'cat' is treated. The 't' is separated because it's followed
228
- by a quantifier that only applies to it.
229
-
230
- ```ruby
231
- Regexp::Lexer.scan( /(cat?([b]at)){3,5}/ ).map {|token| token.text}
232
- #=> ["(", "ca", "t", "?", "(", "[", "b", "]", "at", ")", ")", "{3,5}"]
233
- ```
234
-
235
- #### Notes
236
- * The syntax argument is optional. It defaults to the version of the Ruby
237
- interpreter in use, as returned by RUBY_VERSION.
238
-
239
- * The lexer normalizes some tokens, as noted in the Syntax section above.
240
-
241
-
242
- ---
243
- ### Parser
244
- Sits on top of the lexer and transforms the "stream" of Token objects emitted
245
- by it into a tree of Expression objects represented by an instance of the
246
- Expression::Root class.
247
-
248
- See the [Expression Objects](https://github.com/ammar/regexp_parser/wiki/Expression-Objects)
249
- wiki page for attributes and methods.
250
-
251
-
252
- #### Example
253
-
254
- ```ruby
255
- require 'regexp_parser'
256
-
257
- regex = /a?(b+(c)d)*(?<name>[0-9]+)/
258
-
259
- tree = Regexp::Parser.parse( regex, 'ruby/2.1' )
260
-
261
- tree.traverse do |event, exp|
262
- puts "#{event}: #{exp.type} `#{exp.to_s}`"
263
- end
264
-
265
- # Output
266
- # visit: literal `a?`
267
- # enter: group `(b+(c)d)*`
268
- # visit: literal `b+`
269
- # enter: group `(c)`
270
- # visit: literal `c`
271
- # exit: group `(c)`
272
- # visit: literal `d`
273
- # exit: group `(b+(c)d)*`
274
- # enter: group `(?<name>[0-9]+)`
275
- # visit: set `[0-9]+`
276
- # exit: group `(?<name>[0-9]+)`
277
- ```
278
-
279
- Another example, using each_expression and strfregexp to print the object tree.
280
- _See the traverse.rb and strfregexp.rb files under `lib/regexp_parser/expression/methods`
281
- for more information on these methods._
282
-
283
- ```ruby
284
- include_root = true
285
- indent_offset = include_root ? 1 : 0
286
-
287
- tree.each_expression(include_root) do |exp, level_index|
288
- puts exp.strfregexp("%>> %c", indent_offset)
289
- end
290
-
291
- # Output
292
- # > Regexp::Expression::Root
293
- # > Regexp::Expression::Literal
294
- # > Regexp::Expression::Group::Capture
295
- # > Regexp::Expression::Literal
296
- # > Regexp::Expression::Group::Capture
297
- # > Regexp::Expression::Literal
298
- # > Regexp::Expression::Literal
299
- # > Regexp::Expression::Group::Named
300
- # > Regexp::Expression::CharacterSet
301
- ```
302
-
303
- _Note: quantifiers do not appear in the output because they are members of the
304
- Expression class. See the next section for details._
305
-
306
-
307
- ---
308
-
309
-
310
- ## Supported Syntax
311
- The three modules support all the regular expression syntax features of Ruby 1.8,
312
- 1.9, and 2.x:
313
-
314
- _Note that not all of these are available in all versions of Ruby_
315
-
316
-
317
- | Syntax Feature | Examples | &#x22ef; |
318
- | ------------------------------------- | ------------------------------------------------------- |:--------:|
319
- | **Alternation** | `a\|b\|c` | &#x2713; |
320
- | **Anchors** | `\A`, `^`, `\b` | &#x2713; |
321
- | **Character Classes** | `[abc]`, `[^\\]`, `[a-d&&aeiou]`, `[a=e=b]` | &#x2713; |
322
- | **Character Types** | `\d`, `\H`, `\s` | &#x2713; |
323
- | **Cluster Types** | `\R`, `\X` | &#x2713; |
324
- | **Conditional Exps.** | `(?(cond)yes-subexp)`, `(?(cond)yes-subexp\|no-subexp)` | &#x2713; |
325
- | **Escape Sequences** | `\t`, `\\+`, `\?` | &#x2713; |
326
- | **Free Space** | whitespace and `# Comments` _(x modifier)_ | &#x2713; |
327
- | **Grouped Exps.** | | &#x22f1; |
328
- | &emsp;&nbsp;_**Assertions**_ | | &#x22f1; |
329
- | &emsp;&emsp;_Lookahead_ | `(?=abc)` | &#x2713; |
330
- | &emsp;&emsp;_Negative Lookahead_ | `(?!abc)` | &#x2713; |
331
- | &emsp;&emsp;_Lookbehind_ | `(?<=abc)` | &#x2713; |
332
- | &emsp;&emsp;_Negative Lookbehind_ | `(?<!abc)` | &#x2713; |
333
- | &emsp;&nbsp;_**Atomic**_ | `(?>abc)` | &#x2713; |
334
- | &emsp;&nbsp;_**Absence**_ | `(?~abc)` | &#x2713; |
335
- | &emsp;&nbsp;_**Back-references**_ | | &#x22f1; |
336
- | &emsp;&emsp;_Named_ | `\k<name>` | &#x2713; |
337
- | &emsp;&emsp;_Nest Level_ | `\k<n-1>` | &#x2713; |
338
- | &emsp;&emsp;_Numbered_ | `\k<1>` | &#x2713; |
339
- | &emsp;&emsp;_Relative_ | `\k<-2>` | &#x2713; |
340
- | &emsp;&emsp;_Traditional_ | `\1` thru `\9` | &#x2713; |
341
- | &emsp;&nbsp;_**Capturing**_ | `(abc)` | &#x2713; |
342
- | &emsp;&nbsp;_**Comments**_ | `(?# comment text)` | &#x2713; |
343
- | &emsp;&nbsp;_**Named**_ | `(?<name>abc)`, `(?'name'abc)` | &#x2713; |
344
- | &emsp;&nbsp;_**Options**_ | `(?mi-x:abc)`, `(?a:\s\w+)`, `(?i)` | &#x2713; |
345
- | &emsp;&nbsp;_**Passive**_ | `(?:abc)` | &#x2713; |
346
- | &emsp;&nbsp;_**Subexp. Calls**_ | `\g<name>`, `\g<1>` | &#x2713; |
347
- | **Keep** | `\K`, `(ab\Kc\|d\Ke)f` | &#x2713; |
348
- | **Literals** _(utf-8)_ | `Ruby`, `ルビー`, `روبي` | &#x2713; |
349
- | **POSIX Classes** | `[:alpha:]`, `[:^digit:]` | &#x2713; |
350
- | **Quantifiers** | | &#x22f1; |
351
- | &emsp;&nbsp;_**Greedy**_ | `?`, `*`, `+`, `{m,M}` | &#x2713; |
352
- | &emsp;&nbsp;_**Reluctant** (Lazy)_ | `??`, `*?`, `+?`, `{m,M}?` | &#x2713; |
353
- | &emsp;&nbsp;_**Possessive**_ | `?+`, `*+`, `++`, `{m,M}+` | &#x2713; |
354
- | **String Escapes** | | &#x22f1; |
355
- | &emsp;&nbsp;_**Control**_ | `\C-C`, `\cD` | &#x2713; |
356
- | &emsp;&nbsp;_**Hex**_ | `\x20`, `\x{701230}` | &#x2713; |
357
- | &emsp;&nbsp;_**Meta**_ | `\M-c`, `\M-\C-C`, `\M-\cC`, `\C-\M-C`, `\c\M-C` | &#x2713; |
358
- | &emsp;&nbsp;_**Octal**_ | `\0`, `\01`, `\012` | &#x2713; |
359
- | &emsp;&nbsp;_**Unicode**_ | `\uHHHH`, `\u{H+ H+}` | &#x2713; |
360
- | **Unicode Properties** | _<sub>([Unicode 11.0.0](http://www.unicode.org/versions/Unicode11.0.0/))</sub>_ | &#x22f1; |
361
- | &emsp;&nbsp;_**Age**_ | `\p{Age=5.2}`, `\P{age=7.0}`, `\p{^age=8.0}` | &#x2713; |
362
- | &emsp;&nbsp;_**Blocks**_ | `\p{InArmenian}`, `\P{InKhmer}`, `\p{^InThai}` | &#x2713; |
363
- | &emsp;&nbsp;_**Classes**_ | `\p{Alpha}`, `\P{Space}`, `\p{^Alnum}` | &#x2713; |
364
- | &emsp;&nbsp;_**Derived**_ | `\p{Math}`, `\P{Lowercase}`, `\p{^Cased}` | &#x2713; |
365
- | &emsp;&nbsp;_**General Categories**_ | `\p{Lu}`, `\P{Cs}`, `\p{^sc}` | &#x2713; |
366
- | &emsp;&nbsp;_**Scripts**_ | `\p{Arabic}`, `\P{Hiragana}`, `\p{^Greek}` | &#x2713; |
367
- | &emsp;&nbsp;_**Simple**_ | `\p{Dash}`, `\p{Extender}`, `\p{^Hyphen}` | &#x2713; |
368
-
369
- ##### Inapplicable Features
370
-
371
- Some modifiers, like `o` and `s`, apply to the **Regexp** object itself and do not
372
- appear in its source. Other such modifiers include the encoding modifiers `e` and `n`
373
- [See](http://www.ruby-doc.org/core-2.5.0/Regexp.html#class-Regexp-label-Encoding).
374
- These are not seen by the scanner.
375
-
376
- The following features are not currently enabled for Ruby by its regular
377
- expressions library (Onigmo). They are not supported by the scanner.
378
-
379
- - **Quotes**: `\Q...\E` _[[See]](https://github.com/k-takata/Onigmo/blob/7911409/doc/RE#L499)_
380
- - **Capture History**: `(?@...)`, `(?@<name>...)` _[[See]](https://github.com/k-takata/Onigmo/blob/7911409/doc/RE#L550)_
381
-
382
-
383
- See something missing? Please submit an [issue](https://github.com/ammar/regexp_parser/issues)
384
-
385
- _**Note**: Attempting to process expressions with unsupported syntax features can raise an error,
386
- or incorrectly return tokens/objects as literals._
387
-
388
-
389
- ## Testing
390
- To run the tests simply run rake from the root directory, as 'test' is the default task.
391
-
392
- It generates the scanner's code from the Ragel source files and runs all the tests, thus it requires Ragel to be installed.
393
-
394
- The tests use RSpec. They can also be run with the test runner that whitelists some warnings:
395
-
396
- ```
397
- bin/test
398
- ```
399
-
400
- You can run a specific test like so:
401
-
402
- ```
403
- bin/test spec/scanner/properties_spec.rb
404
- ```
405
-
406
- Note that changes to Ragel files will not be reflected when running `rspec` or `bin/test`, so you might want to run:
407
-
408
- ```
409
- rake ragel:rb && bin/test spec/scanner/properties_spec.rb
410
- ```
411
-
412
- ## Building
413
- Building the scanner and the gem requires [Ragel](http://www.colm.net/open-source/ragel/) to be
414
- installed. The build tasks will automatically invoke the 'ragel:rb' task to generate the
415
- Ruby scanner code.
416
-
417
-
418
- The project uses the standard rubygems package tasks, so:
419
-
420
-
421
- To build the gem, run:
422
- ```
423
- rake build
424
- ```
425
-
426
- To install the gem from the cloned project, run:
427
- ```
428
- rake install
429
- ```
430
-
431
-
432
- ## Example Projects
433
- Projects using regexp_parser.
434
-
435
- - [meta_re](https://github.com/ammar/meta_re) is a regular expression preprocessor with alias support.
436
-
437
- - [mutant](https://github.com/mbj/mutant) (before v0.9.0) manipulates your regular expressions (amongst others) to see if your tests cover their behavior.
438
-
439
- - [twitter-cldr-rb](https://github.com/twitter/twitter-cldr-rb) uses regexp_parser to generate examples of postal codes.
440
-
441
- - [js_regex](https://github.com/janosch-x/js_regex) converts Ruby regular expressions to JavaScript-compatible regular expressions.
442
-
443
-
444
- ## References
445
- Documentation and books used while working on this project.
446
-
447
-
448
- #### Ruby Flavors
449
- * Oniguruma Regular Expressions (Ruby 1.9.x) [link](https://github.com/kkos/oniguruma/blob/master/doc/RE)
450
- * Onigmo Regular Expressions (Ruby >= 2.0) [link](https://github.com/k-takata/Onigmo/blob/master/doc/RE)
451
-
452
-
453
- #### Regular Expressions
454
- * Mastering Regular Expressions, By Jeffrey E.F. Friedl (2nd Edition) [book](http://oreilly.com/catalog/9781565922570/)
455
- * Regular Expression Flavor Comparison [link](http://www.regular-expressions.info/refflavors.html)
456
- * Enumerating the strings of regular languages [link](http://www.cs.dartmouth.edu/~doug/nfa.ps.gz)
457
- * Stack Overflow Regular Expressions FAQ [link](http://stackoverflow.com/questions/22937618/reference-what-does-this-regex-mean/22944075#22944075)
458
-
459
-
460
- #### Unicode
461
- * Unicode Explained, By Jukka K. Korpela. [book](http://oreilly.com/catalog/9780596101213)
462
- * Unicode Derived Properties [link](http://www.unicode.org/Public/UNIDATA/DerivedCoreProperties.txt)
463
- * Unicode Property Aliases [link](http://www.unicode.org/Public/UNIDATA/PropertyAliases.txt)
464
- * Unicode Regular Expressions [link](http://www.unicode.org/reports/tr18/)
465
- * Unicode Standard Annex #44 [link](http://www.unicode.org/reports/tr44/)
466
-
467
-
468
- ---
469
- ##### Copyright
470
- _Copyright (c) 2010-2019 Ammar Ali. See LICENSE file for details._