regexp_parser 2.1.1 → 2.9.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (167) hide show
  1. checksums.yaml +4 -4
  2. data/Gemfile +6 -5
  3. data/LICENSE +1 -1
  4. data/Rakefile +6 -70
  5. data/lib/regexp_parser/error.rb +1 -1
  6. data/lib/regexp_parser/expression/base.rb +76 -0
  7. data/lib/regexp_parser/expression/classes/alternation.rb +1 -1
  8. data/lib/regexp_parser/expression/classes/anchor.rb +0 -2
  9. data/lib/regexp_parser/expression/classes/{backref.rb → backreference.rb} +18 -3
  10. data/lib/regexp_parser/expression/classes/{set → character_set}/range.rb +2 -7
  11. data/lib/regexp_parser/expression/classes/{set.rb → character_set.rb} +4 -8
  12. data/lib/regexp_parser/expression/classes/{type.rb → character_type.rb} +0 -2
  13. data/lib/regexp_parser/expression/classes/conditional.rb +2 -6
  14. data/lib/regexp_parser/expression/classes/{escape.rb → escape_sequence.rb} +15 -7
  15. data/lib/regexp_parser/expression/classes/free_space.rb +4 -4
  16. data/lib/regexp_parser/expression/classes/group.rb +10 -22
  17. data/lib/regexp_parser/expression/classes/keep.rb +2 -0
  18. data/lib/regexp_parser/expression/classes/literal.rb +1 -5
  19. data/lib/regexp_parser/expression/classes/posix_class.rb +5 -5
  20. data/lib/regexp_parser/expression/classes/root.rb +3 -6
  21. data/lib/regexp_parser/expression/classes/{property.rb → unicode_property.rb} +10 -11
  22. data/lib/regexp_parser/expression/methods/construct.rb +41 -0
  23. data/lib/regexp_parser/expression/methods/human_name.rb +43 -0
  24. data/lib/regexp_parser/expression/methods/match_length.rb +9 -5
  25. data/lib/regexp_parser/expression/methods/negative.rb +20 -0
  26. data/lib/regexp_parser/expression/methods/parts.rb +23 -0
  27. data/lib/regexp_parser/expression/methods/printing.rb +26 -0
  28. data/lib/regexp_parser/expression/methods/strfregexp.rb +1 -1
  29. data/lib/regexp_parser/expression/methods/tests.rb +47 -1
  30. data/lib/regexp_parser/expression/methods/traverse.rb +35 -19
  31. data/lib/regexp_parser/expression/quantifier.rb +55 -24
  32. data/lib/regexp_parser/expression/sequence.rb +11 -31
  33. data/lib/regexp_parser/expression/sequence_operation.rb +4 -9
  34. data/lib/regexp_parser/expression/shared.rb +111 -0
  35. data/lib/regexp_parser/expression/subexpression.rb +26 -18
  36. data/lib/regexp_parser/expression.rb +37 -155
  37. data/lib/regexp_parser/lexer.rb +81 -39
  38. data/lib/regexp_parser/parser.rb +135 -173
  39. data/lib/regexp_parser/scanner/errors/premature_end_error.rb +8 -0
  40. data/lib/regexp_parser/scanner/errors/scanner_error.rb +6 -0
  41. data/lib/regexp_parser/scanner/errors/validation_error.rb +63 -0
  42. data/lib/regexp_parser/scanner/properties/long.csv +651 -0
  43. data/lib/regexp_parser/scanner/properties/short.csv +249 -0
  44. data/lib/regexp_parser/scanner/property.rl +2 -2
  45. data/lib/regexp_parser/scanner/scanner.rl +127 -185
  46. data/lib/regexp_parser/scanner.rb +1185 -1402
  47. data/lib/regexp_parser/syntax/any.rb +2 -7
  48. data/lib/regexp_parser/syntax/base.rb +91 -66
  49. data/lib/regexp_parser/syntax/token/anchor.rb +15 -0
  50. data/lib/regexp_parser/syntax/{tokens → token}/assertion.rb +2 -2
  51. data/lib/regexp_parser/syntax/token/backreference.rb +33 -0
  52. data/lib/regexp_parser/syntax/token/character_set.rb +16 -0
  53. data/lib/regexp_parser/syntax/{tokens → token}/character_type.rb +3 -3
  54. data/lib/regexp_parser/syntax/{tokens → token}/conditional.rb +3 -3
  55. data/lib/regexp_parser/syntax/token/escape.rb +33 -0
  56. data/lib/regexp_parser/syntax/{tokens → token}/group.rb +7 -7
  57. data/lib/regexp_parser/syntax/{tokens → token}/keep.rb +1 -1
  58. data/lib/regexp_parser/syntax/token/meta.rb +20 -0
  59. data/lib/regexp_parser/syntax/{tokens → token}/posix_class.rb +3 -3
  60. data/lib/regexp_parser/syntax/token/quantifier.rb +35 -0
  61. data/lib/regexp_parser/syntax/token/unicode_property.rb +751 -0
  62. data/lib/regexp_parser/syntax/token/virtual.rb +11 -0
  63. data/lib/regexp_parser/syntax/token.rb +45 -0
  64. data/lib/regexp_parser/syntax/version_lookup.rb +17 -34
  65. data/lib/regexp_parser/syntax/versions/1.8.6.rb +13 -20
  66. data/lib/regexp_parser/syntax/versions/1.9.1.rb +10 -17
  67. data/lib/regexp_parser/syntax/versions/1.9.3.rb +3 -10
  68. data/lib/regexp_parser/syntax/versions/2.0.0.rb +8 -15
  69. data/lib/regexp_parser/syntax/versions/2.2.0.rb +3 -9
  70. data/lib/regexp_parser/syntax/versions/2.3.0.rb +3 -9
  71. data/lib/regexp_parser/syntax/versions/2.4.0.rb +3 -9
  72. data/lib/regexp_parser/syntax/versions/2.4.1.rb +2 -8
  73. data/lib/regexp_parser/syntax/versions/2.5.0.rb +3 -9
  74. data/lib/regexp_parser/syntax/versions/2.6.0.rb +3 -9
  75. data/lib/regexp_parser/syntax/versions/2.6.2.rb +3 -9
  76. data/lib/regexp_parser/syntax/versions/2.6.3.rb +3 -9
  77. data/lib/regexp_parser/syntax/versions/3.1.0.rb +4 -0
  78. data/lib/regexp_parser/syntax/versions/3.2.0.rb +4 -0
  79. data/lib/regexp_parser/syntax/versions.rb +4 -2
  80. data/lib/regexp_parser/syntax.rb +2 -2
  81. data/lib/regexp_parser/token.rb +9 -20
  82. data/lib/regexp_parser/version.rb +1 -1
  83. data/lib/regexp_parser.rb +6 -8
  84. data/regexp_parser.gemspec +20 -22
  85. metadata +49 -171
  86. data/CHANGELOG.md +0 -494
  87. data/README.md +0 -479
  88. data/lib/regexp_parser/scanner/properties/long.yml +0 -594
  89. data/lib/regexp_parser/scanner/properties/short.yml +0 -237
  90. data/lib/regexp_parser/syntax/tokens/anchor.rb +0 -15
  91. data/lib/regexp_parser/syntax/tokens/backref.rb +0 -24
  92. data/lib/regexp_parser/syntax/tokens/character_set.rb +0 -13
  93. data/lib/regexp_parser/syntax/tokens/escape.rb +0 -30
  94. data/lib/regexp_parser/syntax/tokens/meta.rb +0 -13
  95. data/lib/regexp_parser/syntax/tokens/quantifier.rb +0 -35
  96. data/lib/regexp_parser/syntax/tokens/unicode_property.rb +0 -675
  97. data/lib/regexp_parser/syntax/tokens.rb +0 -45
  98. data/spec/expression/base_spec.rb +0 -104
  99. data/spec/expression/clone_spec.rb +0 -152
  100. data/spec/expression/conditional_spec.rb +0 -89
  101. data/spec/expression/free_space_spec.rb +0 -27
  102. data/spec/expression/methods/match_length_spec.rb +0 -161
  103. data/spec/expression/methods/match_spec.rb +0 -25
  104. data/spec/expression/methods/strfregexp_spec.rb +0 -224
  105. data/spec/expression/methods/tests_spec.rb +0 -99
  106. data/spec/expression/methods/traverse_spec.rb +0 -161
  107. data/spec/expression/options_spec.rb +0 -128
  108. data/spec/expression/subexpression_spec.rb +0 -50
  109. data/spec/expression/to_h_spec.rb +0 -26
  110. data/spec/expression/to_s_spec.rb +0 -108
  111. data/spec/lexer/all_spec.rb +0 -22
  112. data/spec/lexer/conditionals_spec.rb +0 -53
  113. data/spec/lexer/delimiters_spec.rb +0 -68
  114. data/spec/lexer/escapes_spec.rb +0 -14
  115. data/spec/lexer/keep_spec.rb +0 -10
  116. data/spec/lexer/literals_spec.rb +0 -64
  117. data/spec/lexer/nesting_spec.rb +0 -99
  118. data/spec/lexer/refcalls_spec.rb +0 -60
  119. data/spec/parser/all_spec.rb +0 -43
  120. data/spec/parser/alternation_spec.rb +0 -88
  121. data/spec/parser/anchors_spec.rb +0 -17
  122. data/spec/parser/conditionals_spec.rb +0 -179
  123. data/spec/parser/errors_spec.rb +0 -30
  124. data/spec/parser/escapes_spec.rb +0 -121
  125. data/spec/parser/free_space_spec.rb +0 -130
  126. data/spec/parser/groups_spec.rb +0 -108
  127. data/spec/parser/keep_spec.rb +0 -6
  128. data/spec/parser/options_spec.rb +0 -28
  129. data/spec/parser/posix_classes_spec.rb +0 -8
  130. data/spec/parser/properties_spec.rb +0 -115
  131. data/spec/parser/quantifiers_spec.rb +0 -68
  132. data/spec/parser/refcalls_spec.rb +0 -117
  133. data/spec/parser/set/intersections_spec.rb +0 -127
  134. data/spec/parser/set/ranges_spec.rb +0 -111
  135. data/spec/parser/sets_spec.rb +0 -178
  136. data/spec/parser/types_spec.rb +0 -18
  137. data/spec/scanner/all_spec.rb +0 -18
  138. data/spec/scanner/anchors_spec.rb +0 -21
  139. data/spec/scanner/conditionals_spec.rb +0 -128
  140. data/spec/scanner/delimiters_spec.rb +0 -52
  141. data/spec/scanner/errors_spec.rb +0 -67
  142. data/spec/scanner/escapes_spec.rb +0 -64
  143. data/spec/scanner/free_space_spec.rb +0 -165
  144. data/spec/scanner/groups_spec.rb +0 -61
  145. data/spec/scanner/keep_spec.rb +0 -10
  146. data/spec/scanner/literals_spec.rb +0 -39
  147. data/spec/scanner/meta_spec.rb +0 -18
  148. data/spec/scanner/options_spec.rb +0 -36
  149. data/spec/scanner/properties_spec.rb +0 -64
  150. data/spec/scanner/quantifiers_spec.rb +0 -25
  151. data/spec/scanner/refcalls_spec.rb +0 -55
  152. data/spec/scanner/sets_spec.rb +0 -151
  153. data/spec/scanner/types_spec.rb +0 -14
  154. data/spec/spec_helper.rb +0 -16
  155. data/spec/support/runner.rb +0 -42
  156. data/spec/support/shared_examples.rb +0 -77
  157. data/spec/support/warning_extractor.rb +0 -60
  158. data/spec/syntax/syntax_spec.rb +0 -48
  159. data/spec/syntax/syntax_token_map_spec.rb +0 -23
  160. data/spec/syntax/versions/1.8.6_spec.rb +0 -17
  161. data/spec/syntax/versions/1.9.1_spec.rb +0 -10
  162. data/spec/syntax/versions/1.9.3_spec.rb +0 -9
  163. data/spec/syntax/versions/2.0.0_spec.rb +0 -13
  164. data/spec/syntax/versions/2.2.0_spec.rb +0 -9
  165. data/spec/syntax/versions/aliases_spec.rb +0 -37
  166. data/spec/token/token_spec.rb +0 -85
  167. /data/lib/regexp_parser/expression/classes/{set → character_set}/intersection.rb +0 -0
data/README.md DELETED
@@ -1,479 +0,0 @@
1
- # Regexp::Parser
2
-
3
- [![Gem Version](https://badge.fury.io/rb/regexp_parser.svg)](http://badge.fury.io/rb/regexp_parser) [![Build Status](https://github.com/ammar/regexp_parser/workflows/tests/badge.svg)](https://github.com/ammar/regexp_parser/actions) [![Build Status](https://github.com/ammar/regexp_parser/workflows/gouteur/badge.svg)](https://github.com/ammar/regexp_parser/actions) [![Code Climate](https://codeclimate.com/github/ammar/regexp_parser.svg)](https://codeclimate.com/github/ammar/regexp_parser/badges)
4
-
5
- A Ruby gem for tokenizing, parsing, and transforming regular expressions.
6
-
7
- * Multilayered
8
- * A scanner/tokenizer based on [Ragel](http://www.colm.net/open-source/ragel/)
9
- * A lexer that produces a "stream" of token objects.
10
- * A parser that produces a "tree" of Expression objects (OO API)
11
- * Runs on Ruby 2.x, 3.x and JRuby runtimes
12
- * Recognizes Ruby 1.8, 1.9, 2.x and 3.x regular expressions [See Supported Syntax](#supported-syntax)
13
-
14
-
15
- _For examples of regexp_parser in use, see [Example Projects](#example-projects)._
16
-
17
-
18
- ---
19
- ## Requirements
20
-
21
- * Ruby >= 2.0
22
- * Ragel >= 6.0, but only if you want to build the gem or work on the scanner.
23
-
24
-
25
- ---
26
- ## Install
27
-
28
- Install the gem with:
29
-
30
- `gem install regexp_parser`
31
-
32
- Or, add it to your project's `Gemfile`:
33
-
34
- ```gem 'regexp_parser', '~> X.Y.Z'```
35
-
36
- See rubygems for the the [latest version number](https://rubygems.org/gems/regexp_parser)
37
-
38
-
39
- ---
40
- ## Usage
41
-
42
- The three main modules are **Scanner**, **Lexer**, and **Parser**. Each of them
43
- provides a single method that takes a regular expression (as a RegExp object or
44
- a string) and returns its results. The **Lexer** and the **Parser** accept an
45
- optional second argument that specifies the syntax version, like 'ruby/2.0',
46
- which defaults to the host Ruby version (using RUBY_VERSION).
47
-
48
- Here are the basic usage examples:
49
-
50
- ```ruby
51
- require 'regexp_parser'
52
-
53
- Regexp::Scanner.scan(regexp)
54
-
55
- Regexp::Lexer.lex(regexp)
56
-
57
- Regexp::Parser.parse(regexp)
58
- ```
59
-
60
- All three methods accept a block as the last argument, which, if given, gets
61
- called with the results as follows:
62
-
63
- * **Scanner**: the block gets passed the results as they are scanned. See the
64
- example in the next section for details.
65
-
66
- * **Lexer**: after completion, the block gets passed the tokens one by one.
67
- _The result of the block is returned._
68
-
69
- * **Parser**: after completion, the block gets passed the root expression.
70
- _The result of the block is returned._
71
-
72
- All three methods accept either a `Regexp` or `String` (containing the pattern)
73
- - if a String is passed, `options` can be supplied:
74
-
75
- ```ruby
76
- require 'regexp_parser'
77
-
78
- Regexp::Parser.parse(
79
- "a+ # Recognises a and A...",
80
- options: ::Regexp::EXTENDED | ::Regexp::IGNORECASE
81
- )
82
- ```
83
-
84
- ---
85
- ## Components
86
-
87
- ### Scanner
88
- A Ragel-generated scanner that recognizes the cumulative syntax of all
89
- supported syntax versions. It breaks a given expression's text into the
90
- smallest parts, and identifies their type, token, text, and start/end
91
- offsets within the pattern.
92
-
93
-
94
- #### Example
95
- The following scans the given pattern and prints out the type, token, text and
96
- start/end offsets for each token found.
97
-
98
- ```ruby
99
- require 'regexp_parser'
100
-
101
- Regexp::Scanner.scan /(ab?(cd)*[e-h]+)/ do |type, token, text, ts, te|
102
- puts "type: #{type}, token: #{token}, text: '#{text}' [#{ts}..#{te}]"
103
- end
104
-
105
- # output
106
- # type: group, token: capture, text: '(' [0..1]
107
- # type: literal, token: literal, text: 'ab' [1..3]
108
- # type: quantifier, token: zero_or_one, text: '?' [3..4]
109
- # type: group, token: capture, text: '(' [4..5]
110
- # type: literal, token: literal, text: 'cd' [5..7]
111
- # type: group, token: close, text: ')' [7..8]
112
- # type: quantifier, token: zero_or_more, text: '*' [8..9]
113
- # type: set, token: open, text: '[' [9..10]
114
- # type: set, token: range, text: 'e-h' [10..13]
115
- # type: set, token: close, text: ']' [13..14]
116
- # type: quantifier, token: one_or_more, text: '+' [14..15]
117
- # type: group, token: close, text: ')' [15..16]
118
- ```
119
-
120
- A one-liner that uses map on the result of the scan to return the textual
121
- parts of the pattern:
122
-
123
- ```ruby
124
- Regexp::Scanner.scan( /(cat?([bhm]at)){3,5}/ ).map {|token| token[2]}
125
- #=> ["(", "cat", "?", "(", "[", "b", "h", "m", "]", "at", ")", ")", "{3,5}"]
126
- ```
127
-
128
-
129
- #### Notes
130
- * The scanner performs basic syntax error checking, like detecting missing
131
- balancing punctuation and premature end of pattern. Flavor validity checks
132
- are performed in the lexer, which uses a syntax object.
133
-
134
- * If the input is a Ruby **Regexp** object, the scanner calls #source on it to
135
- get its string representation. #source does not include the options of
136
- the expression (m, i, and x). To include the options in the scan, #to_s
137
- should be called on the **Regexp** before passing it to the scanner or the
138
- lexer. For the parser, however, this is not necessary. It automatically
139
- exposes the options of a passed **Regexp** in the returned root expression.
140
-
141
- * To keep the scanner simple(r) and fairly reusable for other purposes, it
142
- does not perform lexical analysis on the tokens, sticking to the task
143
- of identifying the smallest possible tokens and leaving lexical analysis
144
- to the lexer.
145
-
146
- * The MRI implementation may accept expressions that either conflict with
147
- the documentation or are undocumented, like `{}` and `]` _(unescaped)_.
148
- The scanner will try to support as many of these cases as possible.
149
-
150
- ---
151
- ### Syntax
152
- Defines the supported tokens for a specific engine implementation (aka a
153
- flavor). Syntax classes act as lookup tables, and are layered to create
154
- flavor variations. Syntax only comes into play in the lexer.
155
-
156
- #### Example
157
- The following instantiates syntax objects for Ruby 2.0, 1.9, 1.8, and
158
- checks a few of their implementation features.
159
-
160
- ```ruby
161
- require 'regexp_parser'
162
-
163
- ruby_20 = Regexp::Syntax.new 'ruby/2.0'
164
- ruby_20.implements? :quantifier, :zero_or_one # => true
165
- ruby_20.implements? :quantifier, :zero_or_one_reluctant # => true
166
- ruby_20.implements? :quantifier, :zero_or_one_possessive # => true
167
- ruby_20.implements? :conditional, :condition # => true
168
-
169
- ruby_19 = Regexp::Syntax.new 'ruby/1.9'
170
- ruby_19.implements? :quantifier, :zero_or_one # => true
171
- ruby_19.implements? :quantifier, :zero_or_one_reluctant # => true
172
- ruby_19.implements? :quantifier, :zero_or_one_possessive # => true
173
- ruby_19.implements? :conditional, :condition # => false
174
-
175
- ruby_18 = Regexp::Syntax.new 'ruby/1.8'
176
- ruby_18.implements? :quantifier, :zero_or_one # => true
177
- ruby_18.implements? :quantifier, :zero_or_one_reluctant # => true
178
- ruby_18.implements? :quantifier, :zero_or_one_possessive # => false
179
- ruby_18.implements? :conditional, :condition # => false
180
- ```
181
-
182
-
183
- #### Notes
184
- * Variations on a token, for example a named group with angle brackets (< and >)
185
- vs one with a pair of single quotes, are specified with an underscore followed
186
- by two characters appended to the base token. In the previous named group example,
187
- the tokens would be :named_ab (angle brackets) and :named_sq (single quotes).
188
- These variations are normalized by the syntax to :named.
189
-
190
-
191
- ---
192
- ### Lexer
193
- Sits on top of the scanner and performs lexical analysis on the tokens that
194
- it emits. Among its tasks are; breaking quantified literal runs, collecting the
195
- emitted token attributes into Token objects, calculating their nesting depth,
196
- normalizing tokens for the parser, and checking if the tokens are implemented by
197
- the given syntax version.
198
-
199
- See the [Token Objects](https://github.com/ammar/regexp_parser/wiki/Token-Objects)
200
- wiki page for more information on Token objects.
201
-
202
-
203
- #### Example
204
- The following example lexes the given pattern, checks it against the Ruby 1.9
205
- syntax, and prints the token objects' text indented to their level.
206
-
207
- ```ruby
208
- require 'regexp_parser'
209
-
210
- Regexp::Lexer.lex /a?(b(c))*[d]+/, 'ruby/1.9' do |token|
211
- puts "#{' ' * token.level}#{token.text}"
212
- end
213
-
214
- # output
215
- # a
216
- # ?
217
- # (
218
- # b
219
- # (
220
- # c
221
- # )
222
- # )
223
- # *
224
- # [
225
- # d
226
- # ]
227
- # +
228
- ```
229
-
230
- A one-liner that returns an array of the textual parts of the given pattern.
231
- Compare the output with that of the one-liner example of the **Scanner**; notably
232
- how the sequence 'cat' is treated. The 't' is separated because it's followed
233
- by a quantifier that only applies to it.
234
-
235
- ```ruby
236
- Regexp::Lexer.scan( /(cat?([b]at)){3,5}/ ).map {|token| token.text}
237
- #=> ["(", "ca", "t", "?", "(", "[", "b", "]", "at", ")", ")", "{3,5}"]
238
- ```
239
-
240
- #### Notes
241
- * The syntax argument is optional. It defaults to the version of the Ruby
242
- interpreter in use, as returned by RUBY_VERSION.
243
-
244
- * The lexer normalizes some tokens, as noted in the Syntax section above.
245
-
246
-
247
- ---
248
- ### Parser
249
- Sits on top of the lexer and transforms the "stream" of Token objects emitted
250
- by it into a tree of Expression objects represented by an instance of the
251
- Expression::Root class.
252
-
253
- See the [Expression Objects](https://github.com/ammar/regexp_parser/wiki/Expression-Objects)
254
- wiki page for attributes and methods.
255
-
256
-
257
- #### Example
258
-
259
- ```ruby
260
- require 'regexp_parser'
261
-
262
- regex = /a?(b+(c)d)*(?<name>[0-9]+)/
263
-
264
- tree = Regexp::Parser.parse( regex, 'ruby/2.1' )
265
-
266
- tree.traverse do |event, exp|
267
- puts "#{event}: #{exp.type} `#{exp.to_s}`"
268
- end
269
-
270
- # Output
271
- # visit: literal `a?`
272
- # enter: group `(b+(c)d)*`
273
- # visit: literal `b+`
274
- # enter: group `(c)`
275
- # visit: literal `c`
276
- # exit: group `(c)`
277
- # visit: literal `d`
278
- # exit: group `(b+(c)d)*`
279
- # enter: group `(?<name>[0-9]+)`
280
- # visit: set `[0-9]+`
281
- # exit: group `(?<name>[0-9]+)`
282
- ```
283
-
284
- Another example, using each_expression and strfregexp to print the object tree.
285
- _See the traverse.rb and strfregexp.rb files under `lib/regexp_parser/expression/methods`
286
- for more information on these methods._
287
-
288
- ```ruby
289
- include_root = true
290
- indent_offset = include_root ? 1 : 0
291
-
292
- tree.each_expression(include_root) do |exp, level_index|
293
- puts exp.strfregexp("%>> %c", indent_offset)
294
- end
295
-
296
- # Output
297
- # > Regexp::Expression::Root
298
- # > Regexp::Expression::Literal
299
- # > Regexp::Expression::Group::Capture
300
- # > Regexp::Expression::Literal
301
- # > Regexp::Expression::Group::Capture
302
- # > Regexp::Expression::Literal
303
- # > Regexp::Expression::Literal
304
- # > Regexp::Expression::Group::Named
305
- # > Regexp::Expression::CharacterSet
306
- ```
307
-
308
- _Note: quantifiers do not appear in the output because they are members of the
309
- Expression class. See the next section for details._
310
-
311
-
312
- ---
313
-
314
-
315
- ## Supported Syntax
316
- The three modules support all the regular expression syntax features of Ruby 1.8,
317
- 1.9, 2.x and 3.x:
318
-
319
- _Note that not all of these are available in all versions of Ruby_
320
-
321
-
322
- | Syntax Feature | Examples | &#x22ef; |
323
- | ------------------------------------- | ------------------------------------------------------- |:--------:|
324
- | **Alternation** | `a\|b\|c` | &#x2713; |
325
- | **Anchors** | `\A`, `^`, `\b` | &#x2713; |
326
- | **Character Classes** | `[abc]`, `[^\\]`, `[a-d&&aeiou]`, `[a=e=b]` | &#x2713; |
327
- | **Character Types** | `\d`, `\H`, `\s` | &#x2713; |
328
- | **Cluster Types** | `\R`, `\X` | &#x2713; |
329
- | **Conditional Exps.** | `(?(cond)yes-subexp)`, `(?(cond)yes-subexp\|no-subexp)` | &#x2713; |
330
- | **Escape Sequences** | `\t`, `\\+`, `\?` | &#x2713; |
331
- | **Free Space** | whitespace and `# Comments` _(x modifier)_ | &#x2713; |
332
- | **Grouped Exps.** | | &#x22f1; |
333
- | &emsp;&nbsp;_**Assertions**_ | | &#x22f1; |
334
- | &emsp;&emsp;_Lookahead_ | `(?=abc)` | &#x2713; |
335
- | &emsp;&emsp;_Negative Lookahead_ | `(?!abc)` | &#x2713; |
336
- | &emsp;&emsp;_Lookbehind_ | `(?<=abc)` | &#x2713; |
337
- | &emsp;&emsp;_Negative Lookbehind_ | `(?<!abc)` | &#x2713; |
338
- | &emsp;&nbsp;_**Atomic**_ | `(?>abc)` | &#x2713; |
339
- | &emsp;&nbsp;_**Absence**_ | `(?~abc)` | &#x2713; |
340
- | &emsp;&nbsp;_**Back-references**_ | | &#x22f1; |
341
- | &emsp;&emsp;_Named_ | `\k<name>` | &#x2713; |
342
- | &emsp;&emsp;_Nest Level_ | `\k<n-1>` | &#x2713; |
343
- | &emsp;&emsp;_Numbered_ | `\k<1>` | &#x2713; |
344
- | &emsp;&emsp;_Relative_ | `\k<-2>` | &#x2713; |
345
- | &emsp;&emsp;_Traditional_ | `\1` thru `\9` | &#x2713; |
346
- | &emsp;&nbsp;_**Capturing**_ | `(abc)` | &#x2713; |
347
- | &emsp;&nbsp;_**Comments**_ | `(?# comment text)` | &#x2713; |
348
- | &emsp;&nbsp;_**Named**_ | `(?<name>abc)`, `(?'name'abc)` | &#x2713; |
349
- | &emsp;&nbsp;_**Options**_ | `(?mi-x:abc)`, `(?a:\s\w+)`, `(?i)` | &#x2713; |
350
- | &emsp;&nbsp;_**Passive**_ | `(?:abc)` | &#x2713; |
351
- | &emsp;&nbsp;_**Subexp. Calls**_ | `\g<name>`, `\g<1>` | &#x2713; |
352
- | **Keep** | `\K`, `(ab\Kc\|d\Ke)f` | &#x2713; |
353
- | **Literals** _(utf-8)_ | `Ruby`, `ルビー`, `روبي` | &#x2713; |
354
- | **POSIX Classes** | `[:alpha:]`, `[:^digit:]` | &#x2713; |
355
- | **Quantifiers** | | &#x22f1; |
356
- | &emsp;&nbsp;_**Greedy**_ | `?`, `*`, `+`, `{m,M}` | &#x2713; |
357
- | &emsp;&nbsp;_**Reluctant** (Lazy)_ | `??`, `*?`, `+?`, `{m,M}?` | &#x2713; |
358
- | &emsp;&nbsp;_**Possessive**_ | `?+`, `*+`, `++`, `{m,M}+` | &#x2713; |
359
- | **String Escapes** | | &#x22f1; |
360
- | &emsp;&nbsp;_**Control**_ | `\C-C`, `\cD` | &#x2713; |
361
- | &emsp;&nbsp;_**Hex**_ | `\x20`, `\x{701230}` | &#x2713; |
362
- | &emsp;&nbsp;_**Meta**_ | `\M-c`, `\M-\C-C`, `\M-\cC`, `\C-\M-C`, `\c\M-C` | &#x2713; |
363
- | &emsp;&nbsp;_**Octal**_ | `\0`, `\01`, `\012` | &#x2713; |
364
- | &emsp;&nbsp;_**Unicode**_ | `\uHHHH`, `\u{H+ H+}` | &#x2713; |
365
- | **Unicode Properties** | _<sub>([Unicode 11.0.0](http://www.unicode.org/versions/Unicode11.0.0/))</sub>_ | &#x22f1; |
366
- | &emsp;&nbsp;_**Age**_ | `\p{Age=5.2}`, `\P{age=7.0}`, `\p{^age=8.0}` | &#x2713; |
367
- | &emsp;&nbsp;_**Blocks**_ | `\p{InArmenian}`, `\P{InKhmer}`, `\p{^InThai}` | &#x2713; |
368
- | &emsp;&nbsp;_**Classes**_ | `\p{Alpha}`, `\P{Space}`, `\p{^Alnum}` | &#x2713; |
369
- | &emsp;&nbsp;_**Derived**_ | `\p{Math}`, `\P{Lowercase}`, `\p{^Cased}` | &#x2713; |
370
- | &emsp;&nbsp;_**General Categories**_ | `\p{Lu}`, `\P{Cs}`, `\p{^sc}` | &#x2713; |
371
- | &emsp;&nbsp;_**Scripts**_ | `\p{Arabic}`, `\P{Hiragana}`, `\p{^Greek}` | &#x2713; |
372
- | &emsp;&nbsp;_**Simple**_ | `\p{Dash}`, `\p{Extender}`, `\p{^Hyphen}` | &#x2713; |
373
-
374
- ##### Inapplicable Features
375
-
376
- Some modifiers, like `o` and `s`, apply to the **Regexp** object itself and do not
377
- appear in its source. Other such modifiers include the encoding modifiers `e` and `n`
378
- [See](http://www.ruby-doc.org/core-2.5.0/Regexp.html#class-Regexp-label-Encoding).
379
- These are not seen by the scanner.
380
-
381
- The following features are not currently enabled for Ruby by its regular
382
- expressions library (Onigmo). They are not supported by the scanner.
383
-
384
- - **Quotes**: `\Q...\E` _[[See]](https://github.com/k-takata/Onigmo/blob/7911409/doc/RE#L499)_
385
- - **Capture History**: `(?@...)`, `(?@<name>...)` _[[See]](https://github.com/k-takata/Onigmo/blob/7911409/doc/RE#L550)_
386
-
387
-
388
- See something missing? Please submit an [issue](https://github.com/ammar/regexp_parser/issues)
389
-
390
- _**Note**: Attempting to process expressions with unsupported syntax features can raise an error,
391
- or incorrectly return tokens/objects as literals._
392
-
393
-
394
- ## Testing
395
- To run the tests simply run rake from the root directory, as 'test' is the default task.
396
-
397
- It generates the scanner's code from the Ragel source files and runs all the tests, thus it requires Ragel to be installed.
398
-
399
- The tests use RSpec. They can also be run with the test runner that whitelists some warnings:
400
-
401
- ```
402
- bin/test
403
- ```
404
-
405
- You can run a specific test like so:
406
-
407
- ```
408
- bin/test spec/scanner/properties_spec.rb
409
- ```
410
-
411
- Note that changes to Ragel files will not be reflected when running `rspec` or `bin/test`, so you might want to run:
412
-
413
- ```
414
- rake ragel:rb && bin/test spec/scanner/properties_spec.rb
415
- ```
416
-
417
- ## Building
418
- Building the scanner and the gem requires [Ragel](http://www.colm.net/open-source/ragel/) to be
419
- installed. The build tasks will automatically invoke the 'ragel:rb' task to generate the
420
- Ruby scanner code.
421
-
422
-
423
- The project uses the standard rubygems package tasks, so:
424
-
425
-
426
- To build the gem, run:
427
- ```
428
- rake build
429
- ```
430
-
431
- To install the gem from the cloned project, run:
432
- ```
433
- rake install
434
- ```
435
-
436
-
437
- ## Example Projects
438
- Projects using regexp_parser.
439
-
440
- - [capybara](https://github.com/teamcapybara/capybara) is an integration testing tool that uses regexp_parser to convert Regexps to css/xpath selectors.
441
-
442
- - [js_regex](https://github.com/janosch-x/js_regex) converts Ruby regular expressions to JavaScript-compatible regular expressions.
443
-
444
- - [meta_re](https://github.com/ammar/meta_re) is a regular expression preprocessor with alias support.
445
-
446
- - [mutant](https://github.com/mbj/mutant) (before v0.9.0) manipulates your regular expressions (amongst others) to see if your tests cover their behavior.
447
-
448
- - [rubocop](https://github.com/rubocop-hq/rubocop) is a linter for Ruby that uses regexp_parser to lint Regexps.
449
-
450
- - [twitter-cldr-rb](https://github.com/twitter/twitter-cldr-rb) is a localization helper that uses regexp_parser to generate examples of postal codes.
451
-
452
-
453
- ## References
454
- Documentation and books used while working on this project.
455
-
456
-
457
- #### Ruby Flavors
458
- * Oniguruma Regular Expressions (Ruby 1.9.x) [link](https://github.com/kkos/oniguruma/blob/master/doc/RE)
459
- * Onigmo Regular Expressions (Ruby >= 2.0) [link](https://github.com/k-takata/Onigmo/blob/master/doc/RE)
460
-
461
-
462
- #### Regular Expressions
463
- * Mastering Regular Expressions, By Jeffrey E.F. Friedl (2nd Edition) [book](http://oreilly.com/catalog/9781565922570/)
464
- * Regular Expression Flavor Comparison [link](http://www.regular-expressions.info/refflavors.html)
465
- * Enumerating the strings of regular languages [link](http://www.cs.dartmouth.edu/~doug/nfa.ps.gz)
466
- * Stack Overflow Regular Expressions FAQ [link](http://stackoverflow.com/questions/22937618/reference-what-does-this-regex-mean/22944075#22944075)
467
-
468
-
469
- #### Unicode
470
- * Unicode Explained, By Jukka K. Korpela. [book](http://oreilly.com/catalog/9780596101213)
471
- * Unicode Derived Properties [link](http://www.unicode.org/Public/UNIDATA/DerivedCoreProperties.txt)
472
- * Unicode Property Aliases [link](http://www.unicode.org/Public/UNIDATA/PropertyAliases.txt)
473
- * Unicode Regular Expressions [link](http://www.unicode.org/reports/tr18/)
474
- * Unicode Standard Annex #44 [link](http://www.unicode.org/reports/tr44/)
475
-
476
-
477
- ---
478
- ##### Copyright
479
- _Copyright (c) 2010-2020 Ammar Ali. See LICENSE file for details._