regexp_parser 1.1.0 → 1.2.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/CHANGELOG.md +13 -0
- data/README.md +22 -22
- data/lib/regexp_parser/expression.rb +2 -1
- data/lib/regexp_parser/expression/classes/conditional.rb +12 -10
- data/lib/regexp_parser/expression/subexpression.rb +4 -3
- data/lib/regexp_parser/version.rb +1 -1
- data/test/parser/test_conditionals.rb +2 -1
- metadata +2 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 20ba21704667276107a1041b3bb5943bbbec0078f706cf0d7db85110631dfe8d
|
4
|
+
data.tar.gz: 87886f6cad480ebc62f3e1f243d9b61170097e5419fc8b3972cd3348e5d8d7e0
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: '0678640973741b2ea63053c058809fa075b3b465756bddee9a1914f67f7181a3681d3592662d4eadf5a60e844c550950b371577239924c4d3ce7f07f9fdfefa6'
|
7
|
+
data.tar.gz: 3bf18d0d7989c1f9eef010d1579ac78537c6c083c9b7c7c2f0cda094c0f973e1fdcc17c5992ae35d823720d2cdb10a60424876e08bd4b2b60b125c8b107a62bf
|
data/CHANGELOG.md
CHANGED
@@ -1,3 +1,16 @@
|
|
1
|
+
## [1.2.0] - 2018-09-28 - [Janosch Müller](mailto:janosch84@gmail.com)
|
2
|
+
|
3
|
+
### Added
|
4
|
+
|
5
|
+
- `Subexpression` (branch node) includes `Enumerable`, allowing to `#select` children etc.
|
6
|
+
|
7
|
+
### Fixed
|
8
|
+
|
9
|
+
- Fixed missing quantifier in `Conditional::Expression` methods `#to_s`, `#to_re`
|
10
|
+
- `Conditional::Condition` no longer lives outside the recursive `#expressions` tree
|
11
|
+
- it used to be the only expression stored in a custom ivar, complicating traversal
|
12
|
+
- its setter and getter (`#condition=`, `#condition`) still work as before
|
13
|
+
|
1
14
|
## [1.1.0] - 2018-09-17 - [Janosch Müller](mailto:janosch84@gmail.com)
|
2
15
|
|
3
16
|
### Added
|
data/README.md
CHANGED
@@ -2,14 +2,14 @@
|
|
2
2
|
|
3
3
|
[![Gem Version](https://badge.fury.io/rb/regexp_parser.svg)](http://badge.fury.io/rb/regexp_parser) [![Build Status](https://secure.travis-ci.org/ammar/regexp_parser.svg?branch=master)](http://travis-ci.org/ammar/regexp_parser) [![Code Climate](https://codeclimate.com/github/ammar/regexp_parser.svg)](https://codeclimate.com/github/ammar/regexp_parser/badges)
|
4
4
|
|
5
|
-
A
|
5
|
+
A Ruby gem for tokenizing, parsing, and transforming regular expressions.
|
6
6
|
|
7
7
|
* Multilayered
|
8
|
-
* A scanner/tokenizer based on [
|
8
|
+
* A scanner/tokenizer based on [Ragel](http://www.colm.net/open-source/ragel/)
|
9
9
|
* A lexer that produces a "stream" of token objects.
|
10
10
|
* A parser that produces a "tree" of Expression objects (OO API)
|
11
|
-
* Runs on
|
12
|
-
* Recognizes
|
11
|
+
* Runs on Ruby 1.9, 2.x, and JRuby (1.9 mode) runtimes.
|
12
|
+
* Recognizes Ruby 1.8, 1.9, and 2.x regular expressions [See Supported Syntax](#supported-syntax)
|
13
13
|
|
14
14
|
|
15
15
|
_For examples of regexp_parser in use, see [Example Projects](#example-projects)._
|
@@ -46,7 +46,7 @@ The three main modules are **Scanner**, **Lexer**, and **Parser**. Each of them
|
|
46
46
|
provides a single method that takes a regular expression (as a RegExp object or
|
47
47
|
a string) and returns its results. The **Lexer** and the **Parser** accept an
|
48
48
|
optional second argument that specifies the syntax version, like 'ruby/2.0',
|
49
|
-
which defaults to the host
|
49
|
+
which defaults to the host Ruby version (using RUBY_VERSION).
|
50
50
|
|
51
51
|
Here are the basic usage examples:
|
52
52
|
|
@@ -77,7 +77,7 @@ called with the results as follows:
|
|
77
77
|
## Components
|
78
78
|
|
79
79
|
### Scanner
|
80
|
-
A
|
80
|
+
A Ragel-generated scanner that recognizes the cumulative syntax of all
|
81
81
|
supported syntax versions. It breaks a given expression's text into the
|
82
82
|
smallest parts, and identifies their type, token, text, and start/end
|
83
83
|
offsets within the pattern.
|
@@ -123,7 +123,7 @@ Regexp::Scanner.scan( /(cat?([bhm]at)){3,5}/ ).map {|token| token[2]}
|
|
123
123
|
balancing punctuation and premature end of pattern. Flavor validity checks
|
124
124
|
are performed in the lexer, which uses a syntax object.
|
125
125
|
|
126
|
-
* If the input is a
|
126
|
+
* If the input is a Ruby **Regexp** object, the scanner calls #source on it to
|
127
127
|
get its string representation. #source does not include the options of
|
128
128
|
the expression (m, i, and x). To include the options in the scan, #to_s
|
129
129
|
should be called on the **Regexp** before passing it to the scanner or the
|
@@ -188,7 +188,7 @@ ruby_18.implements? :conditional, :condition # => false
|
|
188
188
|
Sits on top of the scanner and performs lexical analysis on the tokens that
|
189
189
|
it emits. Among its tasks are; breaking quantified literal runs, collecting the
|
190
190
|
emitted token attributes into Token objects, calculating their nesting depth,
|
191
|
-
normalizing tokens for the parser, and
|
191
|
+
normalizing tokens for the parser, and checking if the tokens are implemented by
|
192
192
|
the given syntax version.
|
193
193
|
|
194
194
|
See the [Token Objects](https://github.com/ammar/regexp_parser/wiki/Token-Objects)
|
@@ -196,7 +196,7 @@ wiki page for more information on Token objects.
|
|
196
196
|
|
197
197
|
|
198
198
|
#### Example
|
199
|
-
The following example lexes the given pattern, checks it against the
|
199
|
+
The following example lexes the given pattern, checks it against the Ruby 1.9
|
200
200
|
syntax, and prints the token objects' text indented to their level.
|
201
201
|
|
202
202
|
```ruby
|
@@ -224,7 +224,7 @@ end
|
|
224
224
|
|
225
225
|
A one-liner that returns an array of the textual parts of the given pattern.
|
226
226
|
Compare the output with that of the one-liner example of the **Scanner**; notably
|
227
|
-
how the sequence 'cat' is treated. The 't' is
|
227
|
+
how the sequence 'cat' is treated. The 't' is separated because it's followed
|
228
228
|
by a quantifier that only applies to it.
|
229
229
|
|
230
230
|
```ruby
|
@@ -233,7 +233,7 @@ Regexp::Lexer.scan( /(cat?([b]at)){3,5}/ ).map {|token| token.text}
|
|
233
233
|
```
|
234
234
|
|
235
235
|
#### Notes
|
236
|
-
* The syntax argument is optional. It defaults to the version of the
|
236
|
+
* The syntax argument is optional. It defaults to the version of the Ruby
|
237
237
|
interpreter in use, as returned by RUBY_VERSION.
|
238
238
|
|
239
239
|
* The lexer normalizes some tokens, as noted in the Syntax section above.
|
@@ -308,8 +308,8 @@ Expression class. See the next section for details._
|
|
308
308
|
|
309
309
|
|
310
310
|
## Supported Syntax
|
311
|
-
The three modules support all the regular expression syntax features of Ruby 1.8
|
312
|
-
|
311
|
+
The three modules support all the regular expression syntax features of Ruby 1.8,
|
312
|
+
1.9, and 2.x:
|
313
313
|
|
314
314
|
_Note that not all of these are available in all versions of Ruby_
|
315
315
|
|
@@ -318,7 +318,7 @@ _Note that not all of these are available in all versions of Ruby_
|
|
318
318
|
| ------------------------------------- | ------------------------------------------------------- |:--------:|
|
319
319
|
| **Alternation** | `a\|b\|c` | ✓ |
|
320
320
|
| **Anchors** | `\A`, `^`, `\b` | ✓ |
|
321
|
-
| **Character Classes** | `[abc]`, `[^\\]`, `[a-d&&
|
321
|
+
| **Character Classes** | `[abc]`, `[^\\]`, `[a-d&&aeiou]`, `[a=e=b]` | ✓ |
|
322
322
|
| **Character Types** | `\d`, `\H`, `\s` | ✓ |
|
323
323
|
| **Cluster Types** | `\R`, `\X` | ✓ |
|
324
324
|
| **Conditional Exps.** | `(?(cond)yes-subexp)`, `(?(cond)yes-subexp\|no-subexp)` | ✓ |
|
@@ -362,9 +362,9 @@ _Note that not all of these are available in all versions of Ruby_
|
|
362
362
|
|   _**Blocks**_ | `\p{InArmenian}`, `\P{InKhmer}`, `\p{^InThai}` | ✓ |
|
363
363
|
|   _**Classes**_ | `\p{Alpha}`, `\P{Space}`, `\p{^Alnum}` | ✓ |
|
364
364
|
|   _**Derived**_ | `\p{Math}`, `\P{Lowercase}`, `\p{^Cased}` | ✓ |
|
365
|
-
|   _**General Categories**_ | `\p{Lu}`, `\P{Cs}`,
|
366
|
-
|   _**Scripts**_ | `\p{Arabic}`, `\P{Hiragana}`,
|
367
|
-
|   _**Simple**_ | `\p{Dash}`, `\p{Extender}`,
|
365
|
+
|   _**General Categories**_ | `\p{Lu}`, `\P{Cs}`, `\p{^sc}` | ✓ |
|
366
|
+
|   _**Scripts**_ | `\p{Arabic}`, `\P{Hiragana}`, `\p{^Greek}` | ✓ |
|
367
|
+
|   _**Simple**_ | `\p{Dash}`, `\p{Extender}`, `\p{^Hyphen}` | ✓ |
|
368
368
|
|
369
369
|
##### Inapplicable Features
|
370
370
|
|
@@ -389,9 +389,9 @@ or incorrectly return tokens/objects as literals._
|
|
389
389
|
## Testing
|
390
390
|
To run the tests simply run rake from the root directory, as 'test' is the default task.
|
391
391
|
|
392
|
-
It generates the scanner's code from the
|
392
|
+
It generates the scanner's code from the Ragel source files and runs all the tests, thus it requires Ragel to be installed.
|
393
393
|
|
394
|
-
The tests use
|
394
|
+
The tests use Ruby's test/unit. They can also be run with:
|
395
395
|
|
396
396
|
```
|
397
397
|
bin/test
|
@@ -409,16 +409,16 @@ It is sometimes helpful during development to focus on a specific test case, for
|
|
409
409
|
bin/test test/expression/test_base.rb -n test_expression_to_re
|
410
410
|
```
|
411
411
|
|
412
|
-
Note that changes to
|
412
|
+
Note that changes to Ragel files will not be reflected when using `bin/test`, so you might want to run:
|
413
413
|
|
414
414
|
```
|
415
415
|
rake ragel:rb && bin/test test/scanner/test_properties.rb
|
416
416
|
```
|
417
417
|
|
418
418
|
## Building
|
419
|
-
Building the scanner and the gem requires [
|
419
|
+
Building the scanner and the gem requires [Ragel](http://www.colm.net/open-source/ragel/) to be
|
420
420
|
installed. The build tasks will automatically invoke the 'ragel:rb' task to generate the
|
421
|
-
|
421
|
+
Ruby scanner code.
|
422
422
|
|
423
423
|
|
424
424
|
The project uses the standard rubygems package tasks, so:
|
@@ -127,7 +127,7 @@ module Regexp::Expression
|
|
127
127
|
end
|
128
128
|
alias :=~ :match
|
129
129
|
|
130
|
-
def
|
130
|
+
def attributes
|
131
131
|
{
|
132
132
|
type: type,
|
133
133
|
token: token,
|
@@ -141,6 +141,7 @@ module Regexp::Expression
|
|
141
141
|
quantifier: quantified? ? quantifier.to_h : nil,
|
142
142
|
}
|
143
143
|
end
|
144
|
+
alias :to_h :attributes
|
144
145
|
end
|
145
146
|
|
146
147
|
def self.parsed(exp)
|
@@ -18,13 +18,6 @@ module Regexp::Expression
|
|
18
18
|
class Branch < Regexp::Expression::Sequence; end
|
19
19
|
|
20
20
|
class Expression < Regexp::Expression::Subexpression
|
21
|
-
attr_reader :condition
|
22
|
-
|
23
|
-
def condition=(exp)
|
24
|
-
@condition = exp
|
25
|
-
expressions << exp
|
26
|
-
end
|
27
|
-
|
28
21
|
def <<(exp)
|
29
22
|
expressions.last << exp
|
30
23
|
end
|
@@ -35,16 +28,25 @@ module Regexp::Expression
|
|
35
28
|
end
|
36
29
|
alias :branch :add_sequence
|
37
30
|
|
31
|
+
def condition=(exp)
|
32
|
+
expressions.delete(condition)
|
33
|
+
expressions.unshift(exp)
|
34
|
+
end
|
35
|
+
|
36
|
+
def condition
|
37
|
+
find { |subexp| subexp.is_a?(Condition) }
|
38
|
+
end
|
39
|
+
|
38
40
|
def branches
|
39
|
-
|
41
|
+
select { |subexp| subexp.is_a?(Sequence) }
|
40
42
|
end
|
41
43
|
|
42
44
|
def reference
|
43
45
|
condition.reference
|
44
46
|
end
|
45
47
|
|
46
|
-
def to_s(
|
47
|
-
text
|
48
|
+
def to_s(format = :full)
|
49
|
+
"#{text}#{condition}#{branches.join('|')})#{quantifier_affix(format)}"
|
48
50
|
end
|
49
51
|
end
|
50
52
|
end
|
@@ -1,6 +1,8 @@
|
|
1
1
|
module Regexp::Expression
|
2
2
|
|
3
3
|
class Subexpression < Regexp::Expression::Base
|
4
|
+
include Enumerable
|
5
|
+
|
4
6
|
attr_accessor :expressions
|
5
7
|
|
6
8
|
def initialize(token, options = {})
|
@@ -24,8 +26,7 @@ module Regexp::Expression
|
|
24
26
|
end
|
25
27
|
end
|
26
28
|
|
27
|
-
%w[[]
|
28
|
-
fetch find first index join last length map values_at].each do |method|
|
29
|
+
%w[[] at each empty? fetch index join last length values_at].each do |method|
|
29
30
|
class_eval <<-RUBY, __FILE__, __LINE__ + 1
|
30
31
|
def #{method}(*args, &block)
|
31
32
|
expressions.#{method}(*args, &block)
|
@@ -51,7 +52,7 @@ module Regexp::Expression
|
|
51
52
|
end
|
52
53
|
|
53
54
|
def to_h
|
54
|
-
|
55
|
+
attributes.merge({
|
55
56
|
text: to_s(:base),
|
56
57
|
expressions: expressions.map(&:to_h)
|
57
58
|
})
|
@@ -157,7 +157,8 @@ class TestParserConditionals < Test::Unit::TestCase
|
|
157
157
|
conditional = root[1]
|
158
158
|
|
159
159
|
assert conditional.quantified?
|
160
|
-
assert_equal '{42}',
|
160
|
+
assert_equal '{42}', conditional.quantifier.text
|
161
|
+
assert_equal '(?(1)\d|(\w)){42}', conditional.to_s
|
161
162
|
refute conditional.branches.any?(&:quantified?)
|
162
163
|
end
|
163
164
|
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: regexp_parser
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 1.
|
4
|
+
version: 1.2.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Ammar Ali
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2018-09-
|
11
|
+
date: 2018-09-28 00:00:00.000000000 Z
|
12
12
|
dependencies: []
|
13
13
|
description: A library for tokenizing, lexing, and parsing Ruby regular expressions.
|
14
14
|
email:
|