regexp_parser 2.8.0 → 2.8.2

Sign up to get free protection for your applications and to get access to all the features.
data/README.md DELETED
@@ -1,506 +0,0 @@
1
- # Regexp::Parser
2
-
3
- [![Gem Version](https://badge.fury.io/rb/regexp_parser.svg)](http://badge.fury.io/rb/regexp_parser)
4
- [![Build Status](https://github.com/ammar/regexp_parser/workflows/tests/badge.svg)](https://github.com/ammar/regexp_parser/actions)
5
- [![Build Status](https://github.com/ammar/regexp_parser/workflows/gouteur/badge.svg)](https://github.com/ammar/regexp_parser/actions)
6
- [![Code Climate](https://codeclimate.com/github/ammar/regexp_parser.svg)](https://codeclimate.com/github/ammar/regexp_parser/badges)
7
-
8
- A Ruby gem for tokenizing, parsing, and transforming regular expressions.
9
-
10
- * Multilayered
11
- * A scanner/tokenizer based on [Ragel](http://www.colm.net/open-source/ragel/)
12
- * A lexer that produces a "stream" of [Token objects](https://github.com/ammar/regexp_parser/wiki/Token-Objects)
13
- * A parser that produces a "tree" of [Expression objects (OO API)](https://github.com/ammar/regexp_parser/wiki/Expression-Objects)
14
- * Runs on Ruby 2.x, 3.x and JRuby runtimes
15
- * Recognizes Ruby 1.8, 1.9, 2.x and 3.x regular expressions [See Supported Syntax](#supported-syntax)
16
-
17
-
18
- _For examples of regexp_parser in use, see [Example Projects](#example-projects)._
19
-
20
-
21
- ---
22
- ## Requirements
23
-
24
- * Ruby >= 2.0
25
- * Ragel >= 6.0, but only if you want to build the gem or work on the scanner.
26
-
27
-
28
- ---
29
- ## Install
30
-
31
- Install the gem with:
32
-
33
- `gem install regexp_parser`
34
-
35
- Or, add it to your project's `Gemfile`:
36
-
37
- ```gem 'regexp_parser', '~> X.Y.Z'```
38
-
39
- See the badge at the top of this README or [rubygems](https://rubygems.org/gems/regexp_parser)
40
- for the the latest version number.
41
-
42
-
43
- ---
44
- ## Usage
45
-
46
- The three main modules are **Scanner**, **Lexer**, and **Parser**. Each of them
47
- provides a single method that takes a regular expression (as a Regexp object or
48
- a string) and returns its results. The **Lexer** and the **Parser** accept an
49
- optional second argument that specifies the syntax version, like 'ruby/2.0',
50
- which defaults to the host Ruby version (using RUBY_VERSION).
51
-
52
- Here are the basic usage examples:
53
-
54
- ```ruby
55
- require 'regexp_parser'
56
-
57
- Regexp::Scanner.scan(regexp)
58
-
59
- Regexp::Lexer.lex(regexp)
60
-
61
- Regexp::Parser.parse(regexp)
62
- ```
63
-
64
- All three methods accept a block as the last argument, which, if given, gets
65
- called with the results as follows:
66
-
67
- * **Scanner**: the block gets passed the results as they are scanned. See the
68
- example in the next section for details.
69
-
70
- * **Lexer**: the block gets passed the tokens one by one as they are scanned.
71
- _The result of the block is returned._
72
-
73
- * **Parser**: after completion, the block gets passed the root expression.
74
- _The result of the block is returned._
75
-
76
- All three methods accept either a `Regexp` or `String` (containing the pattern)
77
- - if a String is passed, `options` can be supplied:
78
-
79
- ```ruby
80
- require 'regexp_parser'
81
-
82
- Regexp::Parser.parse(
83
- "a+ # Recognizes a and A...",
84
- options: ::Regexp::EXTENDED | ::Regexp::IGNORECASE
85
- )
86
- ```
87
-
88
- ---
89
- ## Components
90
-
91
- ### Scanner
92
- A Ragel-generated scanner that recognizes the cumulative syntax of all
93
- supported syntax versions. It breaks a given expression's text into the
94
- smallest parts, and identifies their type, token, text, and start/end
95
- offsets within the pattern.
96
-
97
-
98
- #### Example
99
- The following scans the given pattern and prints out the type, token, text and
100
- start/end offsets for each token found.
101
-
102
- ```ruby
103
- require 'regexp_parser'
104
-
105
- Regexp::Scanner.scan(/(ab?(cd)*[e-h]+)/) do |type, token, text, ts, te|
106
- puts "type: #{type}, token: #{token}, text: '#{text}' [#{ts}..#{te}]"
107
- end
108
-
109
- # output
110
- # type: group, token: capture, text: '(' [0..1]
111
- # type: literal, token: literal, text: 'ab' [1..3]
112
- # type: quantifier, token: zero_or_one, text: '?' [3..4]
113
- # type: group, token: capture, text: '(' [4..5]
114
- # type: literal, token: literal, text: 'cd' [5..7]
115
- # type: group, token: close, text: ')' [7..8]
116
- # type: quantifier, token: zero_or_more, text: '*' [8..9]
117
- # type: set, token: open, text: '[' [9..10]
118
- # type: set, token: range, text: 'e-h' [10..13]
119
- # type: set, token: close, text: ']' [13..14]
120
- # type: quantifier, token: one_or_more, text: '+' [14..15]
121
- # type: group, token: close, text: ')' [15..16]
122
- ```
123
-
124
- A one-liner that uses map on the result of the scan to return the textual
125
- parts of the pattern:
126
-
127
- ```ruby
128
- Regexp::Scanner.scan(/(cat?([bhm]at)){3,5}/).map { |token| token[2] }
129
- # => ["(", "cat", "?", "(", "[", "b", "h", "m", "]", "at", ")", ")", "{3,5}"]
130
- ```
131
-
132
-
133
- #### Notes
134
- * The scanner performs basic syntax error checking, like detecting missing
135
- balancing punctuation and premature end of pattern. Flavor validity checks
136
- are performed in the lexer, which uses a syntax object.
137
-
138
- * If the input is a Ruby **Regexp** object, the scanner calls #source on it to
139
- get its string representation. #source does not include the options of
140
- the expression (m, i, and x). To include the options in the scan, #to_s
141
- should be called on the **Regexp** before passing it to the scanner or the
142
- lexer. For the parser, however, this is not necessary. It automatically
143
- exposes the options of a passed **Regexp** in the returned root expression.
144
-
145
- * To keep the scanner simple(r) and fairly reusable for other purposes, it
146
- does not perform lexical analysis on the tokens, sticking to the task
147
- of identifying the smallest possible tokens and leaving lexical analysis
148
- to the lexer.
149
-
150
- * The MRI implementation may accept expressions that either conflict with
151
- the documentation or are undocumented, like `{}` and `]` _(unescaped)_.
152
- The scanner will try to support as many of these cases as possible.
153
-
154
- ---
155
- ### Syntax
156
- Defines the supported tokens for a specific engine implementation (aka a
157
- flavor). Syntax classes act as lookup tables, and are layered to create
158
- flavor variations. Syntax only comes into play in the lexer.
159
-
160
- #### Example
161
- The following fetches syntax objects for Ruby 2.0, 1.9, 1.8, and
162
- checks a few of their implementation features.
163
-
164
- ```ruby
165
- require 'regexp_parser'
166
-
167
- ruby_20 = Regexp::Syntax.for 'ruby/2.0'
168
- ruby_20.implements? :quantifier, :zero_or_one # => true
169
- ruby_20.implements? :quantifier, :zero_or_one_reluctant # => true
170
- ruby_20.implements? :quantifier, :zero_or_one_possessive # => true
171
- ruby_20.implements? :conditional, :condition # => true
172
-
173
- ruby_19 = Regexp::Syntax.for 'ruby/1.9'
174
- ruby_19.implements? :quantifier, :zero_or_one # => true
175
- ruby_19.implements? :quantifier, :zero_or_one_reluctant # => true
176
- ruby_19.implements? :quantifier, :zero_or_one_possessive # => true
177
- ruby_19.implements? :conditional, :condition # => false
178
-
179
- ruby_18 = Regexp::Syntax.for 'ruby/1.8'
180
- ruby_18.implements? :quantifier, :zero_or_one # => true
181
- ruby_18.implements? :quantifier, :zero_or_one_reluctant # => true
182
- ruby_18.implements? :quantifier, :zero_or_one_possessive # => false
183
- ruby_18.implements? :conditional, :condition # => false
184
- ```
185
-
186
- Syntax objects can also be queried about their complete and relative feature sets.
187
-
188
- ```ruby
189
- require 'regexp_parser'
190
-
191
- ruby_20 = Regexp::Syntax.for 'ruby/2.0' # => Regexp::Syntax::V2_0_0
192
- ruby_20.added_features # => { conditional: [...], ... }
193
- ruby_20.removed_features # => { property: [:newline], ... }
194
- ruby_20.features # => { anchor: [...], ... }
195
- ```
196
-
197
- #### Notes
198
- * Variations on a token, for example a named group with angle brackets (< and >)
199
- vs one with a pair of single quotes, are specified with an underscore followed
200
- by two characters appended to the base token. In the previous named group example,
201
- the tokens would be :named_ab (angle brackets) and :named_sq (single quotes).
202
- These variations are normalized by the syntax to :named.
203
-
204
-
205
- ---
206
- ### Lexer
207
- Sits on top of the scanner and performs lexical analysis on the tokens that
208
- it emits. Among its tasks are; breaking quantified literal runs, collecting the
209
- emitted token attributes into Token objects, calculating their nesting depth,
210
- normalizing tokens for the parser, and checking if the tokens are implemented by
211
- the given syntax version.
212
-
213
- See the [Token Objects](https://github.com/ammar/regexp_parser/wiki/Token-Objects)
214
- wiki page for more information on Token objects.
215
-
216
-
217
- #### Example
218
- The following example lexes the given pattern, checks it against the Ruby 1.9
219
- syntax, and prints the token objects' text indented to their level.
220
-
221
- ```ruby
222
- require 'regexp_parser'
223
-
224
- Regexp::Lexer.lex(/a?(b(c))*[d]+/, 'ruby/1.9') do |token|
225
- puts "#{' ' * token.level}#{token.text}"
226
- end
227
-
228
- # output
229
- # a
230
- # ?
231
- # (
232
- # b
233
- # (
234
- # c
235
- # )
236
- # )
237
- # *
238
- # [
239
- # d
240
- # ]
241
- # +
242
- ```
243
-
244
- A one-liner that returns an array of the textual parts of the given pattern.
245
- Compare the output with that of the one-liner example of the **Scanner**; notably
246
- how the sequence 'cat' is treated. The 't' is separated because it's followed
247
- by a quantifier that only applies to it.
248
-
249
- ```ruby
250
- Regexp::Lexer.scan(/(cat?([b]at)){3,5}/).map { |token| token.text }
251
- # => ["(", "ca", "t", "?", "(", "[", "b", "]", "at", ")", ")", "{3,5}"]
252
- ```
253
-
254
- #### Notes
255
- * The syntax argument is optional. It defaults to the version of the Ruby
256
- interpreter in use, as returned by RUBY_VERSION.
257
-
258
- * The lexer normalizes some tokens, as noted in the Syntax section above.
259
-
260
-
261
- ---
262
- ### Parser
263
- Sits on top of the lexer and transforms the "stream" of Token objects emitted
264
- by it into a tree of Expression objects represented by an instance of the
265
- `Expression::Root` class.
266
-
267
- See the [Expression Objects](https://github.com/ammar/regexp_parser/wiki/Expression-Objects)
268
- wiki page for attributes and methods.
269
-
270
-
271
- #### Example
272
-
273
- This example uses the tree traversal method `#each_expression`
274
- and the method `#strfregexp` to print each object in the tree.
275
-
276
- ```ruby
277
- include_root = true
278
- indent_offset = include_root ? 1 : 0
279
-
280
- tree.each_expression(include_root) do |exp|
281
- puts exp.strfregexp("%>> %c", indent_offset)
282
- end
283
-
284
- # Output
285
- # > Regexp::Expression::Root
286
- # > Regexp::Expression::Literal
287
- # > Regexp::Expression::Group::Capture
288
- # > Regexp::Expression::Literal
289
- # > Regexp::Expression::Group::Capture
290
- # > Regexp::Expression::Literal
291
- # > Regexp::Expression::Literal
292
- # > Regexp::Expression::Group::Named
293
- # > Regexp::Expression::CharacterSet
294
- ```
295
-
296
- _Note: quantifiers do not appear in the output because they are members of the
297
- Expression class. See the next section for details._
298
-
299
- Another example, using `#traverse` for a more fine-grained tree traversal:
300
-
301
- ```ruby
302
- require 'regexp_parser'
303
-
304
- regex = /a?(b+(c)d)*(?<name>[0-9]+)/
305
-
306
- tree = Regexp::Parser.parse(regex, 'ruby/2.1')
307
-
308
- tree.traverse do |event, exp|
309
- puts "#{event}: #{exp.type} `#{exp.to_s}`"
310
- end
311
-
312
- # Output
313
- # visit: literal `a?`
314
- # enter: group `(b+(c)d)*`
315
- # visit: literal `b+`
316
- # enter: group `(c)`
317
- # visit: literal `c`
318
- # exit: group `(c)`
319
- # visit: literal `d`
320
- # exit: group `(b+(c)d)*`
321
- # enter: group `(?<name>[0-9]+)`
322
- # visit: set `[0-9]+`
323
- # exit: group `(?<name>[0-9]+)`
324
- ```
325
-
326
- _See the traverse.rb and strfregexp.rb files under `lib/regexp_parser/expression/methods`
327
- for more information on these methods._
328
-
329
- ---
330
-
331
-
332
- ## Supported Syntax
333
- The three modules support all the regular expression syntax features of Ruby 1.8,
334
- 1.9, 2.x and 3.x:
335
-
336
- _Note that not all of these are available in all versions of Ruby_
337
-
338
-
339
- | Syntax Feature | Examples | &#x22ef; |
340
- | ------------------------------------- | ------------------------------------------------------- |:--------:|
341
- | **Alternation** | `a\|b\|c` | &#x2713; |
342
- | **Anchors** | `\A`, `^`, `\b` | &#x2713; |
343
- | **Character Classes** | `[abc]`, `[^\\]`, `[a-d&&aeiou]`, `[a=e=b]` | &#x2713; |
344
- | **Character Types** | `\d`, `\H`, `\s` | &#x2713; |
345
- | **Cluster Types** | `\R`, `\X` | &#x2713; |
346
- | **Conditional Exps.** | `(?(cond)yes-subexp)`, `(?(cond)yes-subexp\|no-subexp)` | &#x2713; |
347
- | **Escape Sequences** | `\t`, `\\+`, `\?` | &#x2713; |
348
- | **Free Space** | whitespace and `# Comments` _(x modifier)_ | &#x2713; |
349
- | **Grouped Exps.** | | &#x22f1; |
350
- | &emsp;&nbsp;_**Assertions**_ | | &#x22f1; |
351
- | &emsp;&emsp;_Lookahead_ | `(?=abc)` | &#x2713; |
352
- | &emsp;&emsp;_Negative Lookahead_ | `(?!abc)` | &#x2713; |
353
- | &emsp;&emsp;_Lookbehind_ | `(?<=abc)` | &#x2713; |
354
- | &emsp;&emsp;_Negative Lookbehind_ | `(?<!abc)` | &#x2713; |
355
- | &emsp;&nbsp;_**Atomic**_ | `(?>abc)` | &#x2713; |
356
- | &emsp;&nbsp;_**Absence**_ | `(?~abc)` | &#x2713; |
357
- | &emsp;&nbsp;_**Back-references**_ | | &#x22f1; |
358
- | &emsp;&emsp;_Named_ | `\k<name>` | &#x2713; |
359
- | &emsp;&emsp;_Nest Level_ | `\k<n-1>` | &#x2713; |
360
- | &emsp;&emsp;_Numbered_ | `\k<1>` | &#x2713; |
361
- | &emsp;&emsp;_Relative_ | `\k<-2>` | &#x2713; |
362
- | &emsp;&emsp;_Traditional_ | `\1` through `\9` | &#x2713; |
363
- | &emsp;&nbsp;_**Capturing**_ | `(abc)` | &#x2713; |
364
- | &emsp;&nbsp;_**Comments**_ | `(?# comment text)` | &#x2713; |
365
- | &emsp;&nbsp;_**Named**_ | `(?<name>abc)`, `(?'name'abc)` | &#x2713; |
366
- | &emsp;&nbsp;_**Options**_ | `(?mi-x:abc)`, `(?a:\s\w+)`, `(?i)` | &#x2713; |
367
- | &emsp;&nbsp;_**Passive**_ | `(?:abc)` | &#x2713; |
368
- | &emsp;&nbsp;_**Subexp. Calls**_ | `\g<name>`, `\g<1>` | &#x2713; |
369
- | **Keep** | `\K`, `(ab\Kc\|d\Ke)f` | &#x2713; |
370
- | **Literals** _(utf-8)_ | `Ruby`, `ルビー`, `روبي` | &#x2713; |
371
- | **POSIX Classes** | `[:alpha:]`, `[:^digit:]` | &#x2713; |
372
- | **Quantifiers** | | &#x22f1; |
373
- | &emsp;&nbsp;_**Greedy**_ | `?`, `*`, `+`, `{m,M}` | &#x2713; |
374
- | &emsp;&nbsp;_**Reluctant** (Lazy)_ | `??`, `*?`, `+?` \[1\] | &#x2713; |
375
- | &emsp;&nbsp;_**Possessive**_ | `?+`, `*+`, `++` \[1\] | &#x2713; |
376
- | **String Escapes** | | &#x22f1; |
377
- | &emsp;&nbsp;_**Control** \[2\]_ | `\C-C`, `\cD` | &#x2713; |
378
- | &emsp;&nbsp;_**Hex**_ | `\x20`, `\x{701230}` | &#x2713; |
379
- | &emsp;&nbsp;_**Meta** \[2\]_ | `\M-c`, `\M-\C-C`, `\M-\cC`, `\C-\M-C`, `\c\M-C` | &#x2713; |
380
- | &emsp;&nbsp;_**Octal**_ | `\0`, `\01`, `\012` | &#x2713; |
381
- | &emsp;&nbsp;_**Unicode**_ | `\uHHHH`, `\u{H+ H+}` | &#x2713; |
382
- | **Unicode Properties** | _<sub>([Unicode 13.0.0])</sub>_ | &#x22f1; |
383
- | &emsp;&nbsp;_**Age**_ | `\p{Age=5.2}`, `\P{age=7.0}`, `\p{^age=8.0}` | &#x2713; |
384
- | &emsp;&nbsp;_**Blocks**_ | `\p{InArmenian}`, `\P{InKhmer}`, `\p{^InThai}` | &#x2713; |
385
- | &emsp;&nbsp;_**Classes**_ | `\p{Alpha}`, `\P{Space}`, `\p{^Alnum}` | &#x2713; |
386
- | &emsp;&nbsp;_**Derived**_ | `\p{Math}`, `\P{Lowercase}`, `\p{^Cased}` | &#x2713; |
387
- | &emsp;&nbsp;_**General Categories**_ | `\p{Lu}`, `\P{Cs}`, `\p{^sc}` | &#x2713; |
388
- | &emsp;&nbsp;_**Scripts**_ | `\p{Arabic}`, `\P{Hiragana}`, `\p{^Greek}` | &#x2713; |
389
- | &emsp;&nbsp;_**Simple**_ | `\p{Dash}`, `\p{Extender}`, `\p{^Hyphen}` | &#x2713; |
390
-
391
- [Unicode 13.0.0]: https://www.unicode.org/versions/Unicode13.0.0/
392
-
393
- **\[1\]**: Ruby does not support lazy or possessive interval quantifiers.
394
- Any `+` or `?` that follows an interval quantifier will be treated as another,
395
- chained quantifier. See also [#3](https://github.com/ammar/regexp_parser/issue/3),
396
- [#69](https://github.com/ammar/regexp_parser/pull/69).
397
-
398
- **\[2\]**: As of Ruby 3.1, meta and control sequences are [pre-processed to hex
399
- escapes when used in Regexp literals](https://github.com/ruby/ruby/commit/11ae581),
400
- so they will only reach the scanner and will only be emitted if a String or a Regexp
401
- that has been built with the `::new` constructor is scanned.
402
-
403
- ##### Inapplicable Features
404
-
405
- Some modifiers, like `o` and `s`, apply to the **Regexp** object itself and do not
406
- appear in its source. Other such modifiers include the encoding modifiers `e` and `n`
407
- [See](http://www.ruby-doc.org/core-2.5.0/Regexp.html#class-Regexp-label-Encoding).
408
- These are not seen by the scanner.
409
-
410
- The following features are not currently enabled for Ruby by its regular
411
- expressions library (Onigmo). They are not supported by the scanner.
412
-
413
- - **Quotes**: `\Q...\E` _[[See]](https://github.com/k-takata/Onigmo/blob/7911409/doc/RE#L499)_
414
- - **Capture History**: `(?@...)`, `(?@<name>...)` _[[See]](https://github.com/k-takata/Onigmo/blob/7911409/doc/RE#L550)_
415
-
416
- See something missing? Please submit an [issue](https://github.com/ammar/regexp_parser/issues)
417
-
418
- _**Note**: Attempting to process expressions with unsupported syntax features can raise
419
- an error, or incorrectly return tokens/objects as literals._
420
-
421
-
422
- ## Testing
423
- To run the tests simply run rake from the root directory.
424
-
425
- The default task generates the scanner's code from the Ragel source files and runs
426
- all the specs, thus it requires Ragel to be installed.
427
-
428
- Note that changes to Ragel files will not be reflected when running `rspec` on its own,
429
- so to run individual tests you might want to run:
430
-
431
- ```
432
- rake ragel:rb && rspec spec/scanner/properties_spec.rb
433
- ```
434
-
435
- ## Building
436
- Building the scanner and the gem requires [Ragel](http://www.colm.net/open-source/ragel/)
437
- to be installed. The build tasks will automatically invoke the 'ragel:rb' task to generate
438
- the Ruby scanner code.
439
-
440
-
441
- The project uses the standard rubygems package tasks, so:
442
-
443
-
444
- To build the gem, run:
445
- ```
446
- rake build
447
- ```
448
-
449
- To install the gem from the cloned project, run:
450
- ```
451
- rake install
452
- ```
453
-
454
-
455
- ## Example Projects
456
- Projects using regexp_parser.
457
-
458
- - [capybara](https://github.com/teamcapybara/capybara) is an integration testing tool
459
- that uses regexp_parser to convert Regexps to css/xpath selectors.
460
-
461
- - [js_regex](https://github.com/jaynetics/js_regex) converts Ruby regular expressions
462
- to JavaScript-compatible regular expressions.
463
-
464
- - [meta_re](https://github.com/ammar/meta_re) is a regular expression preprocessor
465
- with alias support.
466
-
467
- - [mutant](https://github.com/mbj/mutant) manipulates your regular expressions
468
- (amongst others) to see if your tests cover their behavior.
469
-
470
- - [repper](https://github.com/jaynetics/repper) is a regular expression
471
- pretty-printer and formatter for Ruby.
472
-
473
- - [rubocop](https://github.com/rubocop-hq/rubocop) is a linter for Ruby that
474
- uses regexp_parser to lint Regexps.
475
-
476
- - [twitter-cldr-rb](https://github.com/twitter/twitter-cldr-rb) is a localization helper
477
- that uses regexp_parser to generate examples of postal codes.
478
-
479
-
480
- ## References
481
- Documentation and books used while working on this project.
482
-
483
-
484
- #### Ruby Flavors
485
- * Oniguruma Regular Expressions (Ruby 1.9.x) [link](https://github.com/kkos/oniguruma/blob/master/doc/RE)
486
- * Onigmo Regular Expressions (Ruby >= 2.0) [link](https://github.com/k-takata/Onigmo/blob/master/doc/RE)
487
-
488
-
489
- #### Regular Expressions
490
- * Mastering Regular Expressions, By Jeffrey E.F. Friedl (2nd Edition) [book](http://oreilly.com/catalog/9781565922570/)
491
- * Regular Expression Flavor Comparison [link](http://www.regular-expressions.info/refflavors.html)
492
- * Enumerating the strings of regular languages [link](http://www.cs.dartmouth.edu/~doug/nfa.ps.gz)
493
- * Stack Overflow Regular Expressions FAQ [link](http://stackoverflow.com/questions/22937618/reference-what-does-this-regex-mean/22944075#22944075)
494
-
495
-
496
- #### Unicode
497
- * Unicode Explained, By Jukka K. Korpela. [book](http://oreilly.com/catalog/9780596101213)
498
- * Unicode Derived Properties [link](http://www.unicode.org/Public/UNIDATA/DerivedCoreProperties.txt)
499
- * Unicode Property Aliases [link](http://www.unicode.org/Public/UNIDATA/PropertyAliases.txt)
500
- * Unicode Regular Expressions [link](http://www.unicode.org/reports/tr18/)
501
- * Unicode Standard Annex #44 [link](http://www.unicode.org/reports/tr44/)
502
-
503
-
504
- ---
505
- ##### Copyright
506
- _Copyright (c) 2010-2022 Ammar Ali. See LICENSE file for details._
@@ -1,89 +0,0 @@
1
- # mapping for simple cases with a 1:1 relation between text and token
2
- class Regexp::Scanner
3
- MAPPING = {
4
- anchor: {
5
- '\A' => :bos,
6
- '\B' => :nonword_boundary,
7
- '\G' => :match_start,
8
- '\Z' => :eos_ob_eol,
9
- '\b' => :word_boundary,
10
- '\z' => :eos,
11
- },
12
- assertion: {
13
- '(?=' => :lookahead,
14
- '(?!' => :nlookahead,
15
- '(?<=' => :lookbehind,
16
- '(?<!' => :nlookbehind,
17
- },
18
- conditional: {
19
- '(?' => :open,
20
- },
21
- escape: {
22
- '\.' => :dot,
23
- '\|' => :alternation,
24
- '\^' => :bol,
25
- '\$' => :eol,
26
- '\?' => :zero_or_one,
27
- '\*' => :zero_or_more,
28
- '\+' => :one_or_more,
29
- '\(' => :group_open,
30
- '\)' => :group_close,
31
- '\{' => :interval_open,
32
- '\}' => :interval_close,
33
- '\[' => :set_open,
34
- '\]' => :set_close,
35
- '\\\\' => :backslash,
36
- '\a' => :bell,
37
- '\b' => :backspace,
38
- '\e' => :escape,
39
- '\f' => :form_feed,
40
- '\n' => :newline,
41
- '\r' => :carriage,
42
- '\t' => :tab,
43
- '\v' => :vertical_tab,
44
- },
45
- group: {
46
- '(?:' => :passive,
47
- '(?>' => :atomic,
48
- '(?~' => :absence,
49
- },
50
- meta: {
51
- '|' => :alternation,
52
- '.' => :dot,
53
- },
54
- quantifier: {
55
- '?' => :zero_or_one,
56
- '??' => :zero_or_one_reluctant,
57
- '?+' => :zero_or_one_possessive,
58
- '*' => :zero_or_more,
59
- '*?' => :zero_or_more_reluctant,
60
- '*+' => :zero_or_more_possessive,
61
- '+' => :one_or_more,
62
- '+?' => :one_or_more_reluctant,
63
- '++' => :one_or_more_possessive,
64
- },
65
- set: {
66
- '[' => :character,
67
- '-' => :range,
68
- '&&' => :intersection,
69
- },
70
- type: {
71
- '\d' => :digit,
72
- '\D' => :nondigit,
73
- '\h' => :hex,
74
- '\H' => :nonhex,
75
- '\s' => :space,
76
- '\S' => :nonspace,
77
- '\w' => :word,
78
- '\W' => :nonword,
79
- '\R' => :linebreak,
80
- '\X' => :xgrapheme,
81
- }
82
- }
83
- ANCHOR_MAPPING = MAPPING[:anchor]
84
- ASSERTION_MAPPING = MAPPING[:assertion]
85
- ESCAPE_MAPPING = MAPPING[:escape]
86
- GROUP_MAPPING = MAPPING[:group]
87
- QUANTIFIER_MAPPING = MAPPING[:quantifier]
88
- TYPE_MAPPING = MAPPING[:type]
89
- end