regexp_parser 2.7.0 → 2.9.2

Sign up to get free protection for your applications and to get access to all the features.
Files changed (52) hide show
  1. checksums.yaml +4 -4
  2. data/Gemfile +5 -5
  3. data/LICENSE +1 -1
  4. data/lib/regexp_parser/expression/base.rb +0 -7
  5. data/lib/regexp_parser/expression/classes/alternation.rb +1 -1
  6. data/lib/regexp_parser/expression/classes/backreference.rb +4 -6
  7. data/lib/regexp_parser/expression/classes/character_set/range.rb +2 -7
  8. data/lib/regexp_parser/expression/classes/character_set.rb +4 -8
  9. data/lib/regexp_parser/expression/classes/conditional.rb +2 -14
  10. data/lib/regexp_parser/expression/classes/escape_sequence.rb +3 -1
  11. data/lib/regexp_parser/expression/classes/free_space.rb +3 -1
  12. data/lib/regexp_parser/expression/classes/group.rb +0 -22
  13. data/lib/regexp_parser/expression/classes/keep.rb +1 -1
  14. data/lib/regexp_parser/expression/classes/posix_class.rb +5 -5
  15. data/lib/regexp_parser/expression/classes/unicode_property.rb +11 -11
  16. data/lib/regexp_parser/expression/methods/construct.rb +2 -4
  17. data/lib/regexp_parser/expression/methods/negative.rb +20 -0
  18. data/lib/regexp_parser/expression/methods/parts.rb +23 -0
  19. data/lib/regexp_parser/expression/methods/printing.rb +26 -0
  20. data/lib/regexp_parser/expression/methods/tests.rb +40 -3
  21. data/lib/regexp_parser/expression/methods/traverse.rb +33 -20
  22. data/lib/regexp_parser/expression/quantifier.rb +30 -17
  23. data/lib/regexp_parser/expression/sequence.rb +5 -9
  24. data/lib/regexp_parser/expression/sequence_operation.rb +4 -9
  25. data/lib/regexp_parser/expression/shared.rb +37 -24
  26. data/lib/regexp_parser/expression/subexpression.rb +20 -18
  27. data/lib/regexp_parser/expression.rb +34 -31
  28. data/lib/regexp_parser/lexer.rb +15 -7
  29. data/lib/regexp_parser/parser.rb +91 -91
  30. data/lib/regexp_parser/scanner/errors/premature_end_error.rb +8 -0
  31. data/lib/regexp_parser/scanner/errors/scanner_error.rb +6 -0
  32. data/lib/regexp_parser/scanner/errors/validation_error.rb +63 -0
  33. data/lib/regexp_parser/scanner/properties/long.csv +29 -0
  34. data/lib/regexp_parser/scanner/properties/short.csv +3 -0
  35. data/lib/regexp_parser/scanner/property.rl +1 -1
  36. data/lib/regexp_parser/scanner/scanner.rl +44 -130
  37. data/lib/regexp_parser/scanner.rb +1096 -1297
  38. data/lib/regexp_parser/syntax/token/backreference.rb +3 -0
  39. data/lib/regexp_parser/syntax/token/character_set.rb +3 -0
  40. data/lib/regexp_parser/syntax/token/escape.rb +3 -1
  41. data/lib/regexp_parser/syntax/token/meta.rb +9 -2
  42. data/lib/regexp_parser/syntax/token/unicode_property.rb +35 -1
  43. data/lib/regexp_parser/syntax/token/virtual.rb +11 -0
  44. data/lib/regexp_parser/syntax/token.rb +13 -13
  45. data/lib/regexp_parser/syntax/versions.rb +1 -1
  46. data/lib/regexp_parser/syntax.rb +1 -1
  47. data/lib/regexp_parser/version.rb +1 -1
  48. data/lib/regexp_parser.rb +6 -6
  49. data/regexp_parser.gemspec +5 -5
  50. metadata +14 -8
  51. data/CHANGELOG.md +0 -632
  52. data/README.md +0 -503
data/README.md DELETED
@@ -1,503 +0,0 @@
1
- # Regexp::Parser
2
-
3
- [![Gem Version](https://badge.fury.io/rb/regexp_parser.svg)](http://badge.fury.io/rb/regexp_parser)
4
- [![Build Status](https://github.com/ammar/regexp_parser/workflows/tests/badge.svg)](https://github.com/ammar/regexp_parser/actions)
5
- [![Build Status](https://github.com/ammar/regexp_parser/workflows/gouteur/badge.svg)](https://github.com/ammar/regexp_parser/actions)
6
- [![Code Climate](https://codeclimate.com/github/ammar/regexp_parser.svg)](https://codeclimate.com/github/ammar/regexp_parser/badges)
7
-
8
- A Ruby gem for tokenizing, parsing, and transforming regular expressions.
9
-
10
- * Multilayered
11
- * A scanner/tokenizer based on [Ragel](http://www.colm.net/open-source/ragel/)
12
- * A lexer that produces a "stream" of [Token objects](https://github.com/ammar/regexp_parser/wiki/Token-Objects)
13
- * A parser that produces a "tree" of [Expression objects (OO API)](https://github.com/ammar/regexp_parser/wiki/Expression-Objects)
14
- * Runs on Ruby 2.x, 3.x and JRuby runtimes
15
- * Recognizes Ruby 1.8, 1.9, 2.x and 3.x regular expressions [See Supported Syntax](#supported-syntax)
16
-
17
-
18
- _For examples of regexp_parser in use, see [Example Projects](#example-projects)._
19
-
20
-
21
- ---
22
- ## Requirements
23
-
24
- * Ruby >= 2.0
25
- * Ragel >= 6.0, but only if you want to build the gem or work on the scanner.
26
-
27
-
28
- ---
29
- ## Install
30
-
31
- Install the gem with:
32
-
33
- `gem install regexp_parser`
34
-
35
- Or, add it to your project's `Gemfile`:
36
-
37
- ```gem 'regexp_parser', '~> X.Y.Z'```
38
-
39
- See the badge at the top of this README or [rubygems](https://rubygems.org/gems/regexp_parser)
40
- for the the latest version number.
41
-
42
-
43
- ---
44
- ## Usage
45
-
46
- The three main modules are **Scanner**, **Lexer**, and **Parser**. Each of them
47
- provides a single method that takes a regular expression (as a Regexp object or
48
- a string) and returns its results. The **Lexer** and the **Parser** accept an
49
- optional second argument that specifies the syntax version, like 'ruby/2.0',
50
- which defaults to the host Ruby version (using RUBY_VERSION).
51
-
52
- Here are the basic usage examples:
53
-
54
- ```ruby
55
- require 'regexp_parser'
56
-
57
- Regexp::Scanner.scan(regexp)
58
-
59
- Regexp::Lexer.lex(regexp)
60
-
61
- Regexp::Parser.parse(regexp)
62
- ```
63
-
64
- All three methods accept a block as the last argument, which, if given, gets
65
- called with the results as follows:
66
-
67
- * **Scanner**: the block gets passed the results as they are scanned. See the
68
- example in the next section for details.
69
-
70
- * **Lexer**: after completion, the block gets passed the tokens one by one.
71
- _The result of the block is returned._
72
-
73
- * **Parser**: after completion, the block gets passed the root expression.
74
- _The result of the block is returned._
75
-
76
- All three methods accept either a `Regexp` or `String` (containing the pattern)
77
- - if a String is passed, `options` can be supplied:
78
-
79
- ```ruby
80
- require 'regexp_parser'
81
-
82
- Regexp::Parser.parse(
83
- "a+ # Recognizes a and A...",
84
- options: ::Regexp::EXTENDED | ::Regexp::IGNORECASE
85
- )
86
- ```
87
-
88
- ---
89
- ## Components
90
-
91
- ### Scanner
92
- A Ragel-generated scanner that recognizes the cumulative syntax of all
93
- supported syntax versions. It breaks a given expression's text into the
94
- smallest parts, and identifies their type, token, text, and start/end
95
- offsets within the pattern.
96
-
97
-
98
- #### Example
99
- The following scans the given pattern and prints out the type, token, text and
100
- start/end offsets for each token found.
101
-
102
- ```ruby
103
- require 'regexp_parser'
104
-
105
- Regexp::Scanner.scan(/(ab?(cd)*[e-h]+)/) do |type, token, text, ts, te|
106
- puts "type: #{type}, token: #{token}, text: '#{text}' [#{ts}..#{te}]"
107
- end
108
-
109
- # output
110
- # type: group, token: capture, text: '(' [0..1]
111
- # type: literal, token: literal, text: 'ab' [1..3]
112
- # type: quantifier, token: zero_or_one, text: '?' [3..4]
113
- # type: group, token: capture, text: '(' [4..5]
114
- # type: literal, token: literal, text: 'cd' [5..7]
115
- # type: group, token: close, text: ')' [7..8]
116
- # type: quantifier, token: zero_or_more, text: '*' [8..9]
117
- # type: set, token: open, text: '[' [9..10]
118
- # type: set, token: range, text: 'e-h' [10..13]
119
- # type: set, token: close, text: ']' [13..14]
120
- # type: quantifier, token: one_or_more, text: '+' [14..15]
121
- # type: group, token: close, text: ')' [15..16]
122
- ```
123
-
124
- A one-liner that uses map on the result of the scan to return the textual
125
- parts of the pattern:
126
-
127
- ```ruby
128
- Regexp::Scanner.scan(/(cat?([bhm]at)){3,5}/).map { |token| token[2] }
129
- #=> ["(", "cat", "?", "(", "[", "b", "h", "m", "]", "at", ")", ")", "{3,5}"]
130
- ```
131
-
132
-
133
- #### Notes
134
- * The scanner performs basic syntax error checking, like detecting missing
135
- balancing punctuation and premature end of pattern. Flavor validity checks
136
- are performed in the lexer, which uses a syntax object.
137
-
138
- * If the input is a Ruby **Regexp** object, the scanner calls #source on it to
139
- get its string representation. #source does not include the options of
140
- the expression (m, i, and x). To include the options in the scan, #to_s
141
- should be called on the **Regexp** before passing it to the scanner or the
142
- lexer. For the parser, however, this is not necessary. It automatically
143
- exposes the options of a passed **Regexp** in the returned root expression.
144
-
145
- * To keep the scanner simple(r) and fairly reusable for other purposes, it
146
- does not perform lexical analysis on the tokens, sticking to the task
147
- of identifying the smallest possible tokens and leaving lexical analysis
148
- to the lexer.
149
-
150
- * The MRI implementation may accept expressions that either conflict with
151
- the documentation or are undocumented, like `{}` and `]` _(unescaped)_.
152
- The scanner will try to support as many of these cases as possible.
153
-
154
- ---
155
- ### Syntax
156
- Defines the supported tokens for a specific engine implementation (aka a
157
- flavor). Syntax classes act as lookup tables, and are layered to create
158
- flavor variations. Syntax only comes into play in the lexer.
159
-
160
- #### Example
161
- The following fetches syntax objects for Ruby 2.0, 1.9, 1.8, and
162
- checks a few of their implementation features.
163
-
164
- ```ruby
165
- require 'regexp_parser'
166
-
167
- ruby_20 = Regexp::Syntax.for 'ruby/2.0'
168
- ruby_20.implements? :quantifier, :zero_or_one # => true
169
- ruby_20.implements? :quantifier, :zero_or_one_reluctant # => true
170
- ruby_20.implements? :quantifier, :zero_or_one_possessive # => true
171
- ruby_20.implements? :conditional, :condition # => true
172
-
173
- ruby_19 = Regexp::Syntax.for 'ruby/1.9'
174
- ruby_19.implements? :quantifier, :zero_or_one # => true
175
- ruby_19.implements? :quantifier, :zero_or_one_reluctant # => true
176
- ruby_19.implements? :quantifier, :zero_or_one_possessive # => true
177
- ruby_19.implements? :conditional, :condition # => false
178
-
179
- ruby_18 = Regexp::Syntax.for 'ruby/1.8'
180
- ruby_18.implements? :quantifier, :zero_or_one # => true
181
- ruby_18.implements? :quantifier, :zero_or_one_reluctant # => true
182
- ruby_18.implements? :quantifier, :zero_or_one_possessive # => false
183
- ruby_18.implements? :conditional, :condition # => false
184
- ```
185
-
186
- Syntax objects can also be queried about their complete and relative feature sets.
187
-
188
- ```ruby
189
- require 'regexp_parser'
190
-
191
- ruby_20 = Regexp::Syntax.for 'ruby/2.0' # => Regexp::Syntax::V2_0_0
192
- ruby_20.added_features # => { conditional: [...], ... }
193
- ruby_20.removed_features # => { property: [:newline], ... }
194
- ruby_20.features # => { anchor: [...], ... }
195
- ```
196
-
197
- #### Notes
198
- * Variations on a token, for example a named group with angle brackets (< and >)
199
- vs one with a pair of single quotes, are specified with an underscore followed
200
- by two characters appended to the base token. In the previous named group example,
201
- the tokens would be :named_ab (angle brackets) and :named_sq (single quotes).
202
- These variations are normalized by the syntax to :named.
203
-
204
-
205
- ---
206
- ### Lexer
207
- Sits on top of the scanner and performs lexical analysis on the tokens that
208
- it emits. Among its tasks are; breaking quantified literal runs, collecting the
209
- emitted token attributes into Token objects, calculating their nesting depth,
210
- normalizing tokens for the parser, and checking if the tokens are implemented by
211
- the given syntax version.
212
-
213
- See the [Token Objects](https://github.com/ammar/regexp_parser/wiki/Token-Objects)
214
- wiki page for more information on Token objects.
215
-
216
-
217
- #### Example
218
- The following example lexes the given pattern, checks it against the Ruby 1.9
219
- syntax, and prints the token objects' text indented to their level.
220
-
221
- ```ruby
222
- require 'regexp_parser'
223
-
224
- Regexp::Lexer.lex(/a?(b(c))*[d]+/, 'ruby/1.9') do |token|
225
- puts "#{' ' * token.level}#{token.text}"
226
- end
227
-
228
- # output
229
- # a
230
- # ?
231
- # (
232
- # b
233
- # (
234
- # c
235
- # )
236
- # )
237
- # *
238
- # [
239
- # d
240
- # ]
241
- # +
242
- ```
243
-
244
- A one-liner that returns an array of the textual parts of the given pattern.
245
- Compare the output with that of the one-liner example of the **Scanner**; notably
246
- how the sequence 'cat' is treated. The 't' is separated because it's followed
247
- by a quantifier that only applies to it.
248
-
249
- ```ruby
250
- Regexp::Lexer.scan(/(cat?([b]at)){3,5}/).map { |token| token.text }
251
- #=> ["(", "ca", "t", "?", "(", "[", "b", "]", "at", ")", ")", "{3,5}"]
252
- ```
253
-
254
- #### Notes
255
- * The syntax argument is optional. It defaults to the version of the Ruby
256
- interpreter in use, as returned by RUBY_VERSION.
257
-
258
- * The lexer normalizes some tokens, as noted in the Syntax section above.
259
-
260
-
261
- ---
262
- ### Parser
263
- Sits on top of the lexer and transforms the "stream" of Token objects emitted
264
- by it into a tree of Expression objects represented by an instance of the
265
- Expression::Root class.
266
-
267
- See the [Expression Objects](https://github.com/ammar/regexp_parser/wiki/Expression-Objects)
268
- wiki page for attributes and methods.
269
-
270
-
271
- #### Example
272
-
273
- ```ruby
274
- require 'regexp_parser'
275
-
276
- regex = /a?(b+(c)d)*(?<name>[0-9]+)/
277
-
278
- tree = Regexp::Parser.parse(regex, 'ruby/2.1')
279
-
280
- tree.traverse do |event, exp|
281
- puts "#{event}: #{exp.type} `#{exp.to_s}`"
282
- end
283
-
284
- # Output
285
- # visit: literal `a?`
286
- # enter: group `(b+(c)d)*`
287
- # visit: literal `b+`
288
- # enter: group `(c)`
289
- # visit: literal `c`
290
- # exit: group `(c)`
291
- # visit: literal `d`
292
- # exit: group `(b+(c)d)*`
293
- # enter: group `(?<name>[0-9]+)`
294
- # visit: set `[0-9]+`
295
- # exit: group `(?<name>[0-9]+)`
296
- ```
297
-
298
- Another example, using each_expression and strfregexp to print the object tree.
299
- _See the traverse.rb and strfregexp.rb files under `lib/regexp_parser/expression/methods`
300
- for more information on these methods._
301
-
302
- ```ruby
303
- include_root = true
304
- indent_offset = include_root ? 1 : 0
305
-
306
- tree.each_expression(include_root) do |exp, level_index|
307
- puts exp.strfregexp("%>> %c", indent_offset)
308
- end
309
-
310
- # Output
311
- # > Regexp::Expression::Root
312
- # > Regexp::Expression::Literal
313
- # > Regexp::Expression::Group::Capture
314
- # > Regexp::Expression::Literal
315
- # > Regexp::Expression::Group::Capture
316
- # > Regexp::Expression::Literal
317
- # > Regexp::Expression::Literal
318
- # > Regexp::Expression::Group::Named
319
- # > Regexp::Expression::CharacterSet
320
- ```
321
-
322
- _Note: quantifiers do not appear in the output because they are members of the
323
- Expression class. See the next section for details._
324
-
325
-
326
- ---
327
-
328
-
329
- ## Supported Syntax
330
- The three modules support all the regular expression syntax features of Ruby 1.8,
331
- 1.9, 2.x and 3.x:
332
-
333
- _Note that not all of these are available in all versions of Ruby_
334
-
335
-
336
- | Syntax Feature | Examples | &#x22ef; |
337
- | ------------------------------------- | ------------------------------------------------------- |:--------:|
338
- | **Alternation** | `a\|b\|c` | &#x2713; |
339
- | **Anchors** | `\A`, `^`, `\b` | &#x2713; |
340
- | **Character Classes** | `[abc]`, `[^\\]`, `[a-d&&aeiou]`, `[a=e=b]` | &#x2713; |
341
- | **Character Types** | `\d`, `\H`, `\s` | &#x2713; |
342
- | **Cluster Types** | `\R`, `\X` | &#x2713; |
343
- | **Conditional Exps.** | `(?(cond)yes-subexp)`, `(?(cond)yes-subexp\|no-subexp)` | &#x2713; |
344
- | **Escape Sequences** | `\t`, `\\+`, `\?` | &#x2713; |
345
- | **Free Space** | whitespace and `# Comments` _(x modifier)_ | &#x2713; |
346
- | **Grouped Exps.** | | &#x22f1; |
347
- | &emsp;&nbsp;_**Assertions**_ | | &#x22f1; |
348
- | &emsp;&emsp;_Lookahead_ | `(?=abc)` | &#x2713; |
349
- | &emsp;&emsp;_Negative Lookahead_ | `(?!abc)` | &#x2713; |
350
- | &emsp;&emsp;_Lookbehind_ | `(?<=abc)` | &#x2713; |
351
- | &emsp;&emsp;_Negative Lookbehind_ | `(?<!abc)` | &#x2713; |
352
- | &emsp;&nbsp;_**Atomic**_ | `(?>abc)` | &#x2713; |
353
- | &emsp;&nbsp;_**Absence**_ | `(?~abc)` | &#x2713; |
354
- | &emsp;&nbsp;_**Back-references**_ | | &#x22f1; |
355
- | &emsp;&emsp;_Named_ | `\k<name>` | &#x2713; |
356
- | &emsp;&emsp;_Nest Level_ | `\k<n-1>` | &#x2713; |
357
- | &emsp;&emsp;_Numbered_ | `\k<1>` | &#x2713; |
358
- | &emsp;&emsp;_Relative_ | `\k<-2>` | &#x2713; |
359
- | &emsp;&emsp;_Traditional_ | `\1` through `\9` | &#x2713; |
360
- | &emsp;&nbsp;_**Capturing**_ | `(abc)` | &#x2713; |
361
- | &emsp;&nbsp;_**Comments**_ | `(?# comment text)` | &#x2713; |
362
- | &emsp;&nbsp;_**Named**_ | `(?<name>abc)`, `(?'name'abc)` | &#x2713; |
363
- | &emsp;&nbsp;_**Options**_ | `(?mi-x:abc)`, `(?a:\s\w+)`, `(?i)` | &#x2713; |
364
- | &emsp;&nbsp;_**Passive**_ | `(?:abc)` | &#x2713; |
365
- | &emsp;&nbsp;_**Subexp. Calls**_ | `\g<name>`, `\g<1>` | &#x2713; |
366
- | **Keep** | `\K`, `(ab\Kc\|d\Ke)f` | &#x2713; |
367
- | **Literals** _(utf-8)_ | `Ruby`, `ルビー`, `روبي` | &#x2713; |
368
- | **POSIX Classes** | `[:alpha:]`, `[:^digit:]` | &#x2713; |
369
- | **Quantifiers** | | &#x22f1; |
370
- | &emsp;&nbsp;_**Greedy**_ | `?`, `*`, `+`, `{m,M}` | &#x2713; |
371
- | &emsp;&nbsp;_**Reluctant** (Lazy)_ | `??`, `*?`, `+?` \[1\] | &#x2713; |
372
- | &emsp;&nbsp;_**Possessive**_ | `?+`, `*+`, `++` \[1\] | &#x2713; |
373
- | **String Escapes** | | &#x22f1; |
374
- | &emsp;&nbsp;_**Control** \[2\]_ | `\C-C`, `\cD` | &#x2713; |
375
- | &emsp;&nbsp;_**Hex**_ | `\x20`, `\x{701230}` | &#x2713; |
376
- | &emsp;&nbsp;_**Meta** \[2\]_ | `\M-c`, `\M-\C-C`, `\M-\cC`, `\C-\M-C`, `\c\M-C` | &#x2713; |
377
- | &emsp;&nbsp;_**Octal**_ | `\0`, `\01`, `\012` | &#x2713; |
378
- | &emsp;&nbsp;_**Unicode**_ | `\uHHHH`, `\u{H+ H+}` | &#x2713; |
379
- | **Unicode Properties** | _<sub>([Unicode 13.0.0])</sub>_ | &#x22f1; |
380
- | &emsp;&nbsp;_**Age**_ | `\p{Age=5.2}`, `\P{age=7.0}`, `\p{^age=8.0}` | &#x2713; |
381
- | &emsp;&nbsp;_**Blocks**_ | `\p{InArmenian}`, `\P{InKhmer}`, `\p{^InThai}` | &#x2713; |
382
- | &emsp;&nbsp;_**Classes**_ | `\p{Alpha}`, `\P{Space}`, `\p{^Alnum}` | &#x2713; |
383
- | &emsp;&nbsp;_**Derived**_ | `\p{Math}`, `\P{Lowercase}`, `\p{^Cased}` | &#x2713; |
384
- | &emsp;&nbsp;_**General Categories**_ | `\p{Lu}`, `\P{Cs}`, `\p{^sc}` | &#x2713; |
385
- | &emsp;&nbsp;_**Scripts**_ | `\p{Arabic}`, `\P{Hiragana}`, `\p{^Greek}` | &#x2713; |
386
- | &emsp;&nbsp;_**Simple**_ | `\p{Dash}`, `\p{Extender}`, `\p{^Hyphen}` | &#x2713; |
387
-
388
- [Unicode 13.0.0]: https://www.unicode.org/versions/Unicode13.0.0/
389
-
390
- **\[1\]**: Ruby does not support lazy or possessive interval quantifiers.
391
- Any `+` or `?` that follows an interval quantifier will be treated as another,
392
- chained quantifier. See also [#3](https://github.com/ammar/regexp_parser/issue/3),
393
- [#69](https://github.com/ammar/regexp_parser/pull/69).
394
-
395
- **\[2\]**: As of Ruby 3.1, meta and control sequences are [pre-processed to hex
396
- escapes when used in Regexp literals](https://github.com/ruby/ruby/commit/11ae581),
397
- so they will only reach the scanner and will only be emitted if a String or a Regexp
398
- that has been built with the `::new` constructor is scanned.
399
-
400
- ##### Inapplicable Features
401
-
402
- Some modifiers, like `o` and `s`, apply to the **Regexp** object itself and do not
403
- appear in its source. Other such modifiers include the encoding modifiers `e` and `n`
404
- [See](http://www.ruby-doc.org/core-2.5.0/Regexp.html#class-Regexp-label-Encoding).
405
- These are not seen by the scanner.
406
-
407
- The following features are not currently enabled for Ruby by its regular
408
- expressions library (Onigmo). They are not supported by the scanner.
409
-
410
- - **Quotes**: `\Q...\E` _[[See]](https://github.com/k-takata/Onigmo/blob/7911409/doc/RE#L499)_
411
- - **Capture History**: `(?@...)`, `(?@<name>...)` _[[See]](https://github.com/k-takata/Onigmo/blob/7911409/doc/RE#L550)_
412
-
413
- See something missing? Please submit an [issue](https://github.com/ammar/regexp_parser/issues)
414
-
415
- _**Note**: Attempting to process expressions with unsupported syntax features can raise
416
- an error, or incorrectly return tokens/objects as literals._
417
-
418
-
419
- ## Testing
420
- To run the tests simply run rake from the root directory.
421
-
422
- The default task generates the scanner's code from the Ragel source files and runs
423
- all the specs, thus it requires Ragel to be installed.
424
-
425
- Note that changes to Ragel files will not be reflected when running `rspec` on its own,
426
- so to run individual tests you might want to run:
427
-
428
- ```
429
- rake ragel:rb && rspec spec/scanner/properties_spec.rb
430
- ```
431
-
432
- ## Building
433
- Building the scanner and the gem requires [Ragel](http://www.colm.net/open-source/ragel/)
434
- to be installed. The build tasks will automatically invoke the 'ragel:rb' task to generate
435
- the Ruby scanner code.
436
-
437
-
438
- The project uses the standard rubygems package tasks, so:
439
-
440
-
441
- To build the gem, run:
442
- ```
443
- rake build
444
- ```
445
-
446
- To install the gem from the cloned project, run:
447
- ```
448
- rake install
449
- ```
450
-
451
-
452
- ## Example Projects
453
- Projects using regexp_parser.
454
-
455
- - [capybara](https://github.com/teamcapybara/capybara) is an integration testing tool
456
- that uses regexp_parser to convert Regexps to css/xpath selectors.
457
-
458
- - [js_regex](https://github.com/jaynetics/js_regex) converts Ruby regular expressions
459
- to JavaScript-compatible regular expressions.
460
-
461
- - [meta_re](https://github.com/ammar/meta_re) is a regular expression preprocessor
462
- with alias support.
463
-
464
- - [mutant](https://github.com/mbj/mutant) manipulates your regular expressions
465
- (amongst others) to see if your tests cover their behavior.
466
-
467
- - [repper](https://github.com/jaynetics/repper) is a regular expression
468
- pretty-printer and formatter for Ruby.
469
-
470
- - [rubocop](https://github.com/rubocop-hq/rubocop) is a linter for Ruby that
471
- uses regexp_parser to lint Regexps.
472
-
473
- - [twitter-cldr-rb](https://github.com/twitter/twitter-cldr-rb) is a localization helper
474
- that uses regexp_parser to generate examples of postal codes.
475
-
476
-
477
- ## References
478
- Documentation and books used while working on this project.
479
-
480
-
481
- #### Ruby Flavors
482
- * Oniguruma Regular Expressions (Ruby 1.9.x) [link](https://github.com/kkos/oniguruma/blob/master/doc/RE)
483
- * Onigmo Regular Expressions (Ruby >= 2.0) [link](https://github.com/k-takata/Onigmo/blob/master/doc/RE)
484
-
485
-
486
- #### Regular Expressions
487
- * Mastering Regular Expressions, By Jeffrey E.F. Friedl (2nd Edition) [book](http://oreilly.com/catalog/9781565922570/)
488
- * Regular Expression Flavor Comparison [link](http://www.regular-expressions.info/refflavors.html)
489
- * Enumerating the strings of regular languages [link](http://www.cs.dartmouth.edu/~doug/nfa.ps.gz)
490
- * Stack Overflow Regular Expressions FAQ [link](http://stackoverflow.com/questions/22937618/reference-what-does-this-regex-mean/22944075#22944075)
491
-
492
-
493
- #### Unicode
494
- * Unicode Explained, By Jukka K. Korpela. [book](http://oreilly.com/catalog/9780596101213)
495
- * Unicode Derived Properties [link](http://www.unicode.org/Public/UNIDATA/DerivedCoreProperties.txt)
496
- * Unicode Property Aliases [link](http://www.unicode.org/Public/UNIDATA/PropertyAliases.txt)
497
- * Unicode Regular Expressions [link](http://www.unicode.org/reports/tr18/)
498
- * Unicode Standard Annex #44 [link](http://www.unicode.org/reports/tr44/)
499
-
500
-
501
- ---
502
- ##### Copyright
503
- _Copyright (c) 2010-2022 Ammar Ali. See LICENSE file for details._