regexp_parser 1.1.0 → 1.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +13 -0
- data/README.md +22 -22
- data/lib/regexp_parser/expression.rb +2 -1
- data/lib/regexp_parser/expression/classes/conditional.rb +12 -10
- data/lib/regexp_parser/expression/subexpression.rb +4 -3
- data/lib/regexp_parser/version.rb +1 -1
- data/test/parser/test_conditionals.rb +2 -1
- metadata +2 -2
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 20ba21704667276107a1041b3bb5943bbbec0078f706cf0d7db85110631dfe8d
|
|
4
|
+
data.tar.gz: 87886f6cad480ebc62f3e1f243d9b61170097e5419fc8b3972cd3348e5d8d7e0
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: '0678640973741b2ea63053c058809fa075b3b465756bddee9a1914f67f7181a3681d3592662d4eadf5a60e844c550950b371577239924c4d3ce7f07f9fdfefa6'
|
|
7
|
+
data.tar.gz: 3bf18d0d7989c1f9eef010d1579ac78537c6c083c9b7c7c2f0cda094c0f973e1fdcc17c5992ae35d823720d2cdb10a60424876e08bd4b2b60b125c8b107a62bf
|
data/CHANGELOG.md
CHANGED
|
@@ -1,3 +1,16 @@
|
|
|
1
|
+
## [1.2.0] - 2018-09-28 - [Janosch Müller](mailto:janosch84@gmail.com)
|
|
2
|
+
|
|
3
|
+
### Added
|
|
4
|
+
|
|
5
|
+
- `Subexpression` (branch node) includes `Enumerable`, allowing to `#select` children etc.
|
|
6
|
+
|
|
7
|
+
### Fixed
|
|
8
|
+
|
|
9
|
+
- Fixed missing quantifier in `Conditional::Expression` methods `#to_s`, `#to_re`
|
|
10
|
+
- `Conditional::Condition` no longer lives outside the recursive `#expressions` tree
|
|
11
|
+
- it used to be the only expression stored in a custom ivar, complicating traversal
|
|
12
|
+
- its setter and getter (`#condition=`, `#condition`) still work as before
|
|
13
|
+
|
|
1
14
|
## [1.1.0] - 2018-09-17 - [Janosch Müller](mailto:janosch84@gmail.com)
|
|
2
15
|
|
|
3
16
|
### Added
|
data/README.md
CHANGED
|
@@ -2,14 +2,14 @@
|
|
|
2
2
|
|
|
3
3
|
[](http://badge.fury.io/rb/regexp_parser) [](http://travis-ci.org/ammar/regexp_parser) [](https://codeclimate.com/github/ammar/regexp_parser/badges)
|
|
4
4
|
|
|
5
|
-
A
|
|
5
|
+
A Ruby gem for tokenizing, parsing, and transforming regular expressions.
|
|
6
6
|
|
|
7
7
|
* Multilayered
|
|
8
|
-
* A scanner/tokenizer based on [
|
|
8
|
+
* A scanner/tokenizer based on [Ragel](http://www.colm.net/open-source/ragel/)
|
|
9
9
|
* A lexer that produces a "stream" of token objects.
|
|
10
10
|
* A parser that produces a "tree" of Expression objects (OO API)
|
|
11
|
-
* Runs on
|
|
12
|
-
* Recognizes
|
|
11
|
+
* Runs on Ruby 1.9, 2.x, and JRuby (1.9 mode) runtimes.
|
|
12
|
+
* Recognizes Ruby 1.8, 1.9, and 2.x regular expressions [See Supported Syntax](#supported-syntax)
|
|
13
13
|
|
|
14
14
|
|
|
15
15
|
_For examples of regexp_parser in use, see [Example Projects](#example-projects)._
|
|
@@ -46,7 +46,7 @@ The three main modules are **Scanner**, **Lexer**, and **Parser**. Each of them
|
|
|
46
46
|
provides a single method that takes a regular expression (as a RegExp object or
|
|
47
47
|
a string) and returns its results. The **Lexer** and the **Parser** accept an
|
|
48
48
|
optional second argument that specifies the syntax version, like 'ruby/2.0',
|
|
49
|
-
which defaults to the host
|
|
49
|
+
which defaults to the host Ruby version (using RUBY_VERSION).
|
|
50
50
|
|
|
51
51
|
Here are the basic usage examples:
|
|
52
52
|
|
|
@@ -77,7 +77,7 @@ called with the results as follows:
|
|
|
77
77
|
## Components
|
|
78
78
|
|
|
79
79
|
### Scanner
|
|
80
|
-
A
|
|
80
|
+
A Ragel-generated scanner that recognizes the cumulative syntax of all
|
|
81
81
|
supported syntax versions. It breaks a given expression's text into the
|
|
82
82
|
smallest parts, and identifies their type, token, text, and start/end
|
|
83
83
|
offsets within the pattern.
|
|
@@ -123,7 +123,7 @@ Regexp::Scanner.scan( /(cat?([bhm]at)){3,5}/ ).map {|token| token[2]}
|
|
|
123
123
|
balancing punctuation and premature end of pattern. Flavor validity checks
|
|
124
124
|
are performed in the lexer, which uses a syntax object.
|
|
125
125
|
|
|
126
|
-
* If the input is a
|
|
126
|
+
* If the input is a Ruby **Regexp** object, the scanner calls #source on it to
|
|
127
127
|
get its string representation. #source does not include the options of
|
|
128
128
|
the expression (m, i, and x). To include the options in the scan, #to_s
|
|
129
129
|
should be called on the **Regexp** before passing it to the scanner or the
|
|
@@ -188,7 +188,7 @@ ruby_18.implements? :conditional, :condition # => false
|
|
|
188
188
|
Sits on top of the scanner and performs lexical analysis on the tokens that
|
|
189
189
|
it emits. Among its tasks are; breaking quantified literal runs, collecting the
|
|
190
190
|
emitted token attributes into Token objects, calculating their nesting depth,
|
|
191
|
-
normalizing tokens for the parser, and
|
|
191
|
+
normalizing tokens for the parser, and checking if the tokens are implemented by
|
|
192
192
|
the given syntax version.
|
|
193
193
|
|
|
194
194
|
See the [Token Objects](https://github.com/ammar/regexp_parser/wiki/Token-Objects)
|
|
@@ -196,7 +196,7 @@ wiki page for more information on Token objects.
|
|
|
196
196
|
|
|
197
197
|
|
|
198
198
|
#### Example
|
|
199
|
-
The following example lexes the given pattern, checks it against the
|
|
199
|
+
The following example lexes the given pattern, checks it against the Ruby 1.9
|
|
200
200
|
syntax, and prints the token objects' text indented to their level.
|
|
201
201
|
|
|
202
202
|
```ruby
|
|
@@ -224,7 +224,7 @@ end
|
|
|
224
224
|
|
|
225
225
|
A one-liner that returns an array of the textual parts of the given pattern.
|
|
226
226
|
Compare the output with that of the one-liner example of the **Scanner**; notably
|
|
227
|
-
how the sequence 'cat' is treated. The 't' is
|
|
227
|
+
how the sequence 'cat' is treated. The 't' is separated because it's followed
|
|
228
228
|
by a quantifier that only applies to it.
|
|
229
229
|
|
|
230
230
|
```ruby
|
|
@@ -233,7 +233,7 @@ Regexp::Lexer.scan( /(cat?([b]at)){3,5}/ ).map {|token| token.text}
|
|
|
233
233
|
```
|
|
234
234
|
|
|
235
235
|
#### Notes
|
|
236
|
-
* The syntax argument is optional. It defaults to the version of the
|
|
236
|
+
* The syntax argument is optional. It defaults to the version of the Ruby
|
|
237
237
|
interpreter in use, as returned by RUBY_VERSION.
|
|
238
238
|
|
|
239
239
|
* The lexer normalizes some tokens, as noted in the Syntax section above.
|
|
@@ -308,8 +308,8 @@ Expression class. See the next section for details._
|
|
|
308
308
|
|
|
309
309
|
|
|
310
310
|
## Supported Syntax
|
|
311
|
-
The three modules support all the regular expression syntax features of Ruby 1.8
|
|
312
|
-
|
|
311
|
+
The three modules support all the regular expression syntax features of Ruby 1.8,
|
|
312
|
+
1.9, and 2.x:
|
|
313
313
|
|
|
314
314
|
_Note that not all of these are available in all versions of Ruby_
|
|
315
315
|
|
|
@@ -318,7 +318,7 @@ _Note that not all of these are available in all versions of Ruby_
|
|
|
318
318
|
| ------------------------------------- | ------------------------------------------------------- |:--------:|
|
|
319
319
|
| **Alternation** | `a\|b\|c` | ✓ |
|
|
320
320
|
| **Anchors** | `\A`, `^`, `\b` | ✓ |
|
|
321
|
-
| **Character Classes** | `[abc]`, `[^\\]`, `[a-d&&
|
|
321
|
+
| **Character Classes** | `[abc]`, `[^\\]`, `[a-d&&aeiou]`, `[a=e=b]` | ✓ |
|
|
322
322
|
| **Character Types** | `\d`, `\H`, `\s` | ✓ |
|
|
323
323
|
| **Cluster Types** | `\R`, `\X` | ✓ |
|
|
324
324
|
| **Conditional Exps.** | `(?(cond)yes-subexp)`, `(?(cond)yes-subexp\|no-subexp)` | ✓ |
|
|
@@ -362,9 +362,9 @@ _Note that not all of these are available in all versions of Ruby_
|
|
|
362
362
|
|   _**Blocks**_ | `\p{InArmenian}`, `\P{InKhmer}`, `\p{^InThai}` | ✓ |
|
|
363
363
|
|   _**Classes**_ | `\p{Alpha}`, `\P{Space}`, `\p{^Alnum}` | ✓ |
|
|
364
364
|
|   _**Derived**_ | `\p{Math}`, `\P{Lowercase}`, `\p{^Cased}` | ✓ |
|
|
365
|
-
|   _**General Categories**_ | `\p{Lu}`, `\P{Cs}`,
|
|
366
|
-
|   _**Scripts**_ | `\p{Arabic}`, `\P{Hiragana}`,
|
|
367
|
-
|   _**Simple**_ | `\p{Dash}`, `\p{Extender}`,
|
|
365
|
+
|   _**General Categories**_ | `\p{Lu}`, `\P{Cs}`, `\p{^sc}` | ✓ |
|
|
366
|
+
|   _**Scripts**_ | `\p{Arabic}`, `\P{Hiragana}`, `\p{^Greek}` | ✓ |
|
|
367
|
+
|   _**Simple**_ | `\p{Dash}`, `\p{Extender}`, `\p{^Hyphen}` | ✓ |
|
|
368
368
|
|
|
369
369
|
##### Inapplicable Features
|
|
370
370
|
|
|
@@ -389,9 +389,9 @@ or incorrectly return tokens/objects as literals._
|
|
|
389
389
|
## Testing
|
|
390
390
|
To run the tests simply run rake from the root directory, as 'test' is the default task.
|
|
391
391
|
|
|
392
|
-
It generates the scanner's code from the
|
|
392
|
+
It generates the scanner's code from the Ragel source files and runs all the tests, thus it requires Ragel to be installed.
|
|
393
393
|
|
|
394
|
-
The tests use
|
|
394
|
+
The tests use Ruby's test/unit. They can also be run with:
|
|
395
395
|
|
|
396
396
|
```
|
|
397
397
|
bin/test
|
|
@@ -409,16 +409,16 @@ It is sometimes helpful during development to focus on a specific test case, for
|
|
|
409
409
|
bin/test test/expression/test_base.rb -n test_expression_to_re
|
|
410
410
|
```
|
|
411
411
|
|
|
412
|
-
Note that changes to
|
|
412
|
+
Note that changes to Ragel files will not be reflected when using `bin/test`, so you might want to run:
|
|
413
413
|
|
|
414
414
|
```
|
|
415
415
|
rake ragel:rb && bin/test test/scanner/test_properties.rb
|
|
416
416
|
```
|
|
417
417
|
|
|
418
418
|
## Building
|
|
419
|
-
Building the scanner and the gem requires [
|
|
419
|
+
Building the scanner and the gem requires [Ragel](http://www.colm.net/open-source/ragel/) to be
|
|
420
420
|
installed. The build tasks will automatically invoke the 'ragel:rb' task to generate the
|
|
421
|
-
|
|
421
|
+
Ruby scanner code.
|
|
422
422
|
|
|
423
423
|
|
|
424
424
|
The project uses the standard rubygems package tasks, so:
|
|
@@ -127,7 +127,7 @@ module Regexp::Expression
|
|
|
127
127
|
end
|
|
128
128
|
alias :=~ :match
|
|
129
129
|
|
|
130
|
-
def
|
|
130
|
+
def attributes
|
|
131
131
|
{
|
|
132
132
|
type: type,
|
|
133
133
|
token: token,
|
|
@@ -141,6 +141,7 @@ module Regexp::Expression
|
|
|
141
141
|
quantifier: quantified? ? quantifier.to_h : nil,
|
|
142
142
|
}
|
|
143
143
|
end
|
|
144
|
+
alias :to_h :attributes
|
|
144
145
|
end
|
|
145
146
|
|
|
146
147
|
def self.parsed(exp)
|
|
@@ -18,13 +18,6 @@ module Regexp::Expression
|
|
|
18
18
|
class Branch < Regexp::Expression::Sequence; end
|
|
19
19
|
|
|
20
20
|
class Expression < Regexp::Expression::Subexpression
|
|
21
|
-
attr_reader :condition
|
|
22
|
-
|
|
23
|
-
def condition=(exp)
|
|
24
|
-
@condition = exp
|
|
25
|
-
expressions << exp
|
|
26
|
-
end
|
|
27
|
-
|
|
28
21
|
def <<(exp)
|
|
29
22
|
expressions.last << exp
|
|
30
23
|
end
|
|
@@ -35,16 +28,25 @@ module Regexp::Expression
|
|
|
35
28
|
end
|
|
36
29
|
alias :branch :add_sequence
|
|
37
30
|
|
|
31
|
+
def condition=(exp)
|
|
32
|
+
expressions.delete(condition)
|
|
33
|
+
expressions.unshift(exp)
|
|
34
|
+
end
|
|
35
|
+
|
|
36
|
+
def condition
|
|
37
|
+
find { |subexp| subexp.is_a?(Condition) }
|
|
38
|
+
end
|
|
39
|
+
|
|
38
40
|
def branches
|
|
39
|
-
|
|
41
|
+
select { |subexp| subexp.is_a?(Sequence) }
|
|
40
42
|
end
|
|
41
43
|
|
|
42
44
|
def reference
|
|
43
45
|
condition.reference
|
|
44
46
|
end
|
|
45
47
|
|
|
46
|
-
def to_s(
|
|
47
|
-
text
|
|
48
|
+
def to_s(format = :full)
|
|
49
|
+
"#{text}#{condition}#{branches.join('|')})#{quantifier_affix(format)}"
|
|
48
50
|
end
|
|
49
51
|
end
|
|
50
52
|
end
|
|
@@ -1,6 +1,8 @@
|
|
|
1
1
|
module Regexp::Expression
|
|
2
2
|
|
|
3
3
|
class Subexpression < Regexp::Expression::Base
|
|
4
|
+
include Enumerable
|
|
5
|
+
|
|
4
6
|
attr_accessor :expressions
|
|
5
7
|
|
|
6
8
|
def initialize(token, options = {})
|
|
@@ -24,8 +26,7 @@ module Regexp::Expression
|
|
|
24
26
|
end
|
|
25
27
|
end
|
|
26
28
|
|
|
27
|
-
%w[[]
|
|
28
|
-
fetch find first index join last length map values_at].each do |method|
|
|
29
|
+
%w[[] at each empty? fetch index join last length values_at].each do |method|
|
|
29
30
|
class_eval <<-RUBY, __FILE__, __LINE__ + 1
|
|
30
31
|
def #{method}(*args, &block)
|
|
31
32
|
expressions.#{method}(*args, &block)
|
|
@@ -51,7 +52,7 @@ module Regexp::Expression
|
|
|
51
52
|
end
|
|
52
53
|
|
|
53
54
|
def to_h
|
|
54
|
-
|
|
55
|
+
attributes.merge({
|
|
55
56
|
text: to_s(:base),
|
|
56
57
|
expressions: expressions.map(&:to_h)
|
|
57
58
|
})
|
|
@@ -157,7 +157,8 @@ class TestParserConditionals < Test::Unit::TestCase
|
|
|
157
157
|
conditional = root[1]
|
|
158
158
|
|
|
159
159
|
assert conditional.quantified?
|
|
160
|
-
assert_equal '{42}',
|
|
160
|
+
assert_equal '{42}', conditional.quantifier.text
|
|
161
|
+
assert_equal '(?(1)\d|(\w)){42}', conditional.to_s
|
|
161
162
|
refute conditional.branches.any?(&:quantified?)
|
|
162
163
|
end
|
|
163
164
|
|
metadata
CHANGED
|
@@ -1,14 +1,14 @@
|
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
|
2
2
|
name: regexp_parser
|
|
3
3
|
version: !ruby/object:Gem::Version
|
|
4
|
-
version: 1.
|
|
4
|
+
version: 1.2.0
|
|
5
5
|
platform: ruby
|
|
6
6
|
authors:
|
|
7
7
|
- Ammar Ali
|
|
8
8
|
autorequire:
|
|
9
9
|
bindir: bin
|
|
10
10
|
cert_chain: []
|
|
11
|
-
date: 2018-09-
|
|
11
|
+
date: 2018-09-28 00:00:00.000000000 Z
|
|
12
12
|
dependencies: []
|
|
13
13
|
description: A library for tokenizing, lexing, and parsing Ruby regular expressions.
|
|
14
14
|
email:
|