RubyGems - regexp_parser - Versions diffs - 2.5.0 → 2.6.1 - Mend

regexp_parser 2.5.0 → 2.6.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (16) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +67 -39
data/README.md +45 -31
data/lib/regexp_parser/expression/base.rb +17 -9
data/lib/regexp_parser/expression/classes/backreference.rb +1 -1
data/lib/regexp_parser/expression/classes/escape_sequence.rb +1 -1
data/lib/regexp_parser/expression/classes/group.rb +10 -0
data/lib/regexp_parser/expression/classes/unicode_property.rb +1 -1
data/lib/regexp_parser/expression/methods/human_name.rb +43 -0
data/lib/regexp_parser/expression/shared.rb +11 -2
data/lib/regexp_parser/expression.rb +1 -0
data/lib/regexp_parser/parser.rb +16 -4
data/lib/regexp_parser/scanner/scanner.rl +2 -2
data/lib/regexp_parser/scanner.rb +582 -578
data/lib/regexp_parser/version.rb +1 -1
metadata +3 -2

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: f871ec3cdea5a594f72f5386f1b344710e6204f7307ba40d966653197f526be8
-  data.tar.gz: dd93c880f29ec77531faa2379fbfc8e34a9b67680664c6a3477d38afeaa1809a
+  metadata.gz: a468f97c0fecc8b90781d4d6775f82423fd5e7f15561a419be849b1d24fe05d9
+  data.tar.gz: c5c78beabe6ebe360b4f7cdede3c62149f4eba3c1556fd55cf02e3300cdb38b7
 SHA512:
-  metadata.gz: 45e52ab0ce7bec3e4a275efa3828532778c49e8d36eec1ea82a43755a87abc9eee97e986027aa8f5c64fd604f15164d2ad4f37e5d6e22a5a1e3e9da6788271b9
-  data.tar.gz: 1f5514f3252294d9fe0877cff1d8b0db0400838c97ed78d15bbb794b94595c20d081681e4b1fe9bb6c89be7749514d8b2b8cf385360d002cd89e2a76ce6d2e63
+  metadata.gz: a3b86a8f66154804b49d227ad4653cb969f1c337d4dc90de09e116e39cd87f608a12d29cc0422e4b1b4201234bc2b5b6467b065d94c274674fb1c555a04518d8
+  data.tar.gz: fb26d224504f71645645013ee3dd5a07066b0323f9c97f8c0a716e75ea0d4fdffbf41c0526eafdf19c6d7fe1772d6616aec71541dc46d51123640cfc76b703f6

data/CHANGELOG.md CHANGED Viewed

@@ -1,37 +1,77 @@
+# Changelog
+All notable changes to this project will be documented in this file.
+The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
+and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 ## [Unreleased]
+## [2.6.1] - 2022-11-16 - [Janosch Müller](mailto:janosch84@gmail.com)
+### Fixed
+- fixed scanning of two negative lookbehind edge cases
+  * `(?<!x)y>` used to raise a ScannerError
+  * `(?<!x>)y` used to be misinterpreted as a named group
+  * thanks to [Sergio Medina](https://github.com/serch) for the report
+## [2.6.0] - 2022-09-26 - [Janosch Müller](mailto:janosch84@gmail.com)
+### Fixed
+- fixed `#referenced_expression` for `\g<0>` (was `nil`, is now the `Root` exp)
+- fixed `#reference`, `#referenced_expression` for recursion level backrefs
+  * e.g. `(a)(b)\k<-1+1>`
+  * `#referenced_expression` was `nil`, now it is the correct `Group` exp
+- detect and raise for two more syntax errors when parsing String input
+  * quantification of option switches (e.g. `(?i)+`)
+  * invalid references (e.g. `/\k<1>/`)
+  * these are a `SyntaxError` in Ruby, so could only be passed as a String
+### Added
+- `Regexp::Expression::Base#human_name`
+  * returns a nice, human-readable description of the expression
+- `Regexp::Expression::Base#optional?`
+  * returns `true` if the expression is quantified accordingly (e.g. with `*`, `{,n}`)
+- added a deprecation warning when calling `#to_re` on set members
+## [2.5.0] - 2022-05-27 - [Janosch Müller](mailto:janosch84@gmail.com)
 ### Added
 - `Regexp::Expression::Base.construct` and `.token_class` methods
+  * see the [wiki](https://github.com/ammar/regexp_parser/wiki) for details
 ## [2.4.0] - 2022-05-09 - [Janosch Müller](mailto:janosch84@gmail.com)
 ### Fixed
 - fixed interpretation of `+` and `?` after interval quantifiers (`{n,n}`)
-  - they used to be treated as reluctant or possessive mode indicators
-  - however, Ruby does not support these modes for interval quantifiers
-  - they are now treated as chained quantifiers instead, as Ruby does it
-  - c.f. [#3](https://github.com/ammar/regexp_parser/issues/3)
+  * they used to be treated as reluctant or possessive mode indicators
+  * however, Ruby does not support these modes for interval quantifiers
+  * they are now treated as chained quantifiers instead, as Ruby does it
+  * c.f. [#3](https://github.com/ammar/regexp_parser/issues/3)
 - fixed `Expression::Base#nesting_level` for some tree rewrite cases
-  - e.g. the alternatives in `/a|[b]/` had an inconsistent nesting_level
+  * e.g. the alternatives in `/a|[b]/` had an inconsistent nesting_level
 - fixed `Scanner` accepting invalid posix classes, e.g. `[[:foo:]]`
-  - they raise a `SyntaxError` when used in a Regexp, so could only be passed as String
-  - they now raise a `Regexp::Scanner::ValidationError` in the `Scanner`
+  * they raise a `SyntaxError` when used in a Regexp, so could only be passed as String
+  * they now raise a `Regexp::Scanner::ValidationError` in the `Scanner`
 ### Added
 - added `Expression::Base#==` for (deep) comparison of expressions
 - added `Expression::Base#parts`
-  - returns the text elements and subexpressions of an expression
-  - e.g. `parse(/(a)/)[0].parts # => ["(", #<Literal @text="a"...>, ")"]`
+  * returns the text elements and subexpressions of an expression
+  * e.g. `parse(/(a)/)[0].parts # => ["(", #<Literal @text="a"...>, ")"]`
 - added `Expression::Base#te` (a.k.a. token end index)
-  - `Expression::Subexpression` always had `#te`, only terminal nodes lacked it so far
+  * `Expression::Subexpression` always had `#te`, only terminal nodes lacked it so far
 - made some `Expression::Base` methods available on `Quantifier` instances, too
-  - `#type`, `#type?`, `#is?`, `#one_of?`, `#options`, `#terminal?`
-  - `#base_length`, `#full_length`, `#starts_at`, `#te`, `#ts`, `#offset`
-  - `#conditional_level`, `#level`, `#nesting_level` , `#set_level`
-  - this allows a more unified handling with `Expression::Base` instances
+  * `#type`, `#type?`, `#is?`, `#one_of?`, `#options`, `#terminal?`
+  * `#base_length`, `#full_length`, `#starts_at`, `#te`, `#ts`, `#offset`
+  * `#conditional_level`, `#level`, `#nesting_level` , `#set_level`
+  * this allows a more unified handling with `Expression::Base` instances
 - allowed `Quantifier#initialize` to take a token and options Hash like other nodes
 - added a deprecation warning for initializing Quantifiers with 4+ arguments:
@@ -54,18 +94,18 @@
 ### Fixed
 - removed five inexistent unicode properties from `Syntax#features`
-  - these were never supported by Ruby or the `Regexp::Scanner`
-  - thanks to [Markus Schirp](https://github.com/mbj) for the report
+  * these were never supported by Ruby or the `Regexp::Scanner`
+  * thanks to [Markus Schirp](https://github.com/mbj) for the report
 ## [2.3.0] - 2022-04-08 - [Janosch Müller](mailto:janosch84@gmail.com)
 ### Added
 - improved parsing performance through `Syntax` refactoring
-  - instead of fresh `Syntax` instances, pre-loaded constants are now re-used
-  - this approximately doubles the parsing speed for simple regexps
+  * instead of fresh `Syntax` instances, pre-loaded constants are now re-used
+  * this approximately doubles the parsing speed for simple regexps
 - added methods to `Syntax` classes to show relative feature sets
-  - e.g. `Regexp::Syntax::V3_2_0.added_features`
+  * e.g. `Regexp::Syntax::V3_2_0.added_features`
 - support for new unicode properties of Ruby 3.2 / Unicode 14.0
 ## [2.2.1] - 2022-02-11 - [Janosch Müller](mailto:janosch84@gmail.com)
@@ -73,14 +113,14 @@
 ### Fixed
 - fixed Syntax version of absence groups (`(?~...)`)
-  - the lexer accepted them for any Ruby version
-  - now they are only recognized for Ruby >= 2.4.1 in which they were introduced
+  * the lexer accepted them for any Ruby version
+  * now they are only recognized for Ruby >= 2.4.1 in which they were introduced
 - reduced gem size by excluding specs from package
 - removed deprecated `test_files` gemspec setting
 - no longer depend on `yaml`/`psych` (except for Ruby <= 2.4)
 - no longer depend on `set`
-  - `set` was removed from the stdlib and made a standalone gem as of Ruby 3
-  - this made it a hidden/undeclared dependency of `regexp_parser`
+  * `set` was removed from the stdlib and made a standalone gem as of Ruby 3
+  * this made it a hidden/undeclared dependency of `regexp_parser`
 ## [2.2.0] - 2021-12-04 - [Janosch Müller](mailto:janosch84@gmail.com)
@@ -318,8 +358,8 @@
 - Fixed missing quantifier in `Conditional::Expression` methods `#to_s`, `#to_re`
 - `Conditional::Condition` no longer lives outside the recursive `#expressions` tree
-  - it used to be the only expression stored in a custom ivar, complicating traversal
-  - its setter and getter (`#condition=`, `#condition`) still work as before
+  * it used to be the only expression stored in a custom ivar, complicating traversal
+  * its setter and getter (`#condition=`, `#condition`) still work as before
 ## [1.1.0] - 2018-09-17 - [Janosch Müller](mailto:janosch84@gmail.com)
@@ -327,8 +367,8 @@
 - Added `Quantifier` methods `#greedy?`, `#possessive?`, `#reluctant?`/`#lazy?`
 - Added `Group::Options#option_changes`
-  - shows the options enabled or disabled by the given options group
-  - as with all other expressions, `#options` shows the overall active options
+  * shows the options enabled or disabled by the given options group
+  * as with all other expressions, `#options` shows the overall active options
 - Added `Conditional#reference` and `Condition#reference`, indicating the determinative group
 - Added `Subexpression#dig`, acts like [`Array#dig`](http://ruby-doc.org/core-2.5.0/Array.html#method-i-dig)
@@ -512,7 +552,6 @@ This release includes several breaking changes, mostly to character sets, #map a
   * Fixed scanning of zero length comments (PR #12)
   * Fixed missing escape:codepoint_list syntax token (PR #14)
   * Fixed to_s for modified interval quantifiers (PR #17)
-- Added a note about MRI implementation quirks to Scanner section
 ## [0.3.2] - 2016-01-01 - [Ammar Ali](mailto:ammarabuali@gmail.com)
@@ -538,7 +577,6 @@ This release includes several breaking changes, mostly to character sets, #map a
 - Renamed Lexer's method to lex, added an alias to the old name (scan)
 - Use #map instead of #each to run the block in Lexer.lex.
 - Replaced VERSION.yml file with a constant.
-- Updated README
 - Update tokens and scanner with new additions in Unicode 7.0.
 ## [0.1.6] - 2014-10-06 - [Ammar Ali](mailto:ammarabuali@gmail.com)
@@ -548,20 +586,11 @@ This release includes several breaking changes, mostly to character sets, #map a
 - Added syntax files for missing ruby 2.x versions. These do not add
   extra syntax support, they just make the gem work with the newer
   ruby versions.
-- Added .travis.yml to project root.
-- README:
-  - Removed note purporting runtime support for ruby 1.8.6.
-  - Added a section identifying the main unsupported syntax features.
-  - Added sections for Testing and Building
-  - Added badges for gem version, Travis CI, and code climate.
-- Updated README, fixing broken examples, and converting it from a rdoc file to Github's flavor of Markdown.
 - Fixed a parser bug where an alternation sequence that contained nested expressions was incorrectly being appended to the parent expression when the nesting was exited. e.g. in /a|(b)c/, c was appended to the root.
 - Fixed a bug where character types were not being correctly scanned within character sets. e.g. in [\d], two tokens were scanned; one for the backslash '\' and one for the 'd'
 ## [0.1.5] - 2014-01-14 - [Ammar Ali](mailto:ammarabuali@gmail.com)
-- Correct ChangeLog.
 - Added syntax stubs for ruby versions 2.0 and 2.1
 - Added clone methods for deep copying expressions.
 - Added optional format argument for to_s on expressions to return the text of the expression with (:full, the default) or without (:base) its quantifier.
@@ -570,7 +599,6 @@ This release includes several breaking changes, mostly to character sets, #map a
 - Improved EOF handling in general and especially from sequences like hex and control escapes.
 - Fixed a bug where named groups with an empty name would return a blank token [].
 - Fixed a bug where member of a parent set where being added to its last subset.
-- Various code cleanups in scanner.rl
 - Fixed a few mutable string bugs by calling dup on the originals.
 - Made ruby 1.8.6 the base for all 1.8 syntax, and the 1.8 name a pointer to the latest (1.8.7 at this time)
 - Removed look-behind assertions (positive and negative) from 1.8 syntax

data/README.md CHANGED Viewed

@@ -9,8 +9,8 @@ A Ruby gem for tokenizing, parsing, and transforming regular expressions.
 * Multilayered
   * A scanner/tokenizer based on [Ragel](http://www.colm.net/open-source/ragel/)
-  * A lexer that produces a "stream" of token objects.
-  * A parser that produces a "tree" of Expression objects (OO API)
+  * A lexer that produces a "stream" of [Token objects](https://github.com/ammar/regexp_parser/wiki/Token-Objects)
+  * A parser that produces a "tree" of [Expression objects (OO API)](https://github.com/ammar/regexp_parser/wiki/Expression-Objects)
 * Runs on Ruby 2.x, 3.x and JRuby runtimes
 * Recognizes Ruby 1.8, 1.9, 2.x and 3.x regular expressions [See Supported Syntax](#supported-syntax)
@@ -36,14 +36,15 @@ Or, add it to your project's `Gemfile`:
 ```gem 'regexp_parser', '~> X.Y.Z'```
-See rubygems for the the [latest version number](https://rubygems.org/gems/regexp_parser)
+See the badge at the top of this README or [rubygems](https://rubygems.org/gems/regexp_parser)
+for the the latest version number.
 ---
 ## Usage
 The three main modules are **Scanner**, **Lexer**, and **Parser**. Each of them
-provides a single method that takes a regular expression (as a RegExp object or
+provides a single method that takes a regular expression (as a Regexp object or
 a string) and returns its results. The **Lexer** and the **Parser** accept an
 optional second argument that specifies the syntax version, like 'ruby/2.0',
 which defaults to the host Ruby version (using RUBY_VERSION).
@@ -79,7 +80,7 @@ All three methods accept either a `Regexp` or `String` (containing the pattern)
 require 'regexp_parser'
 Regexp::Parser.parse(
-  "a+ # Recognises a and A...",
+  "a+ # Recognizes a and A...",
   options: ::Regexp::EXTENDED | ::Regexp::IGNORECASE
 )
 ```
@@ -101,7 +102,7 @@ start/end offsets for each token found.
 ```ruby
 require 'regexp_parser'
-Regexp::Scanner.scan /(ab?(cd)*[e-h]+)/  do |type, token, text, ts, te|
+Regexp::Scanner.scan(/(ab?(cd)*[e-h]+)/) do |type, token, text, ts, te|
   puts "type: #{type}, token: #{token}, text: '#{text}' [#{ts}..#{te}]"
 end
@@ -124,7 +125,7 @@ A one-liner that uses map on the result of the scan to return the textual
 parts of the pattern:
 ```ruby
-Regexp::Scanner.scan( /(cat?([bhm]at)){3,5}/ ).map {|token| token[2]}
+Regexp::Scanner.scan(/(cat?([bhm]at)){3,5}/).map { |token| token[2] }
 #=> ["(", "cat", "?", "(", "[", "b", "h", "m", "]", "at", ")", ")", "{3,5}"]
 ```
@@ -220,7 +221,7 @@ syntax, and prints the token objects' text indented to their level.
 ```ruby
 require 'regexp_parser'
-Regexp::Lexer.lex /a?(b(c))*[d]+/, 'ruby/1.9' do |token|
+Regexp::Lexer.lex(/a?(b(c))*[d]+/, 'ruby/1.9') do |token|
   puts "#{'  ' * token.level}#{token.text}"
 end
@@ -246,7 +247,7 @@ how the sequence 'cat' is treated. The 't' is separated because it's followed
 by a quantifier that only applies to it.
 ```ruby
-Regexp::Lexer.scan( /(cat?([b]at)){3,5}/ ).map {|token| token.text}
+Regexp::Lexer.scan(/(cat?([b]at)){3,5}/).map { |token| token.text }
 #=> ["(", "ca", "t", "?", "(", "[", "b", "]", "at", ")", ")", "{3,5}"]
 ```
@@ -274,7 +275,7 @@ require 'regexp_parser'
 regex = /a?(b+(c)d)*(?<name>[0-9]+)/
-tree = Regexp::Parser.parse( regex, 'ruby/2.1' )
+tree = Regexp::Parser.parse(regex, 'ruby/2.1')
 tree.traverse do |event, exp|
   puts "#{event}: #{exp.type} `#{exp.to_s}`"
@@ -355,7 +356,7 @@ _Note that not all of these are available in all versions of Ruby_
 | &emsp;&emsp;_Nest Level_              | `\k<n-1>`                                               | &#x2713; |
 | &emsp;&emsp;_Numbered_                | `\k<1>`                                                 | &#x2713; |
 | &emsp;&emsp;_Relative_                | `\k<-2>`                                                | &#x2713; |
-| &emsp;&emsp;_Traditional_             | `\1` thru `\9`                                          | &#x2713; |
+| &emsp;&emsp;_Traditional_             | `\1` through `\9`                                       | &#x2713; |
 | &emsp;&nbsp;_**Capturing**_           | `(abc)`                                                 | &#x2713; |
 | &emsp;&nbsp;_**Comments**_            | `(?# comment text)`                                     | &#x2713; |
 | &emsp;&nbsp;_**Named**_               | `(?<name>abc)`, `(?'name'abc)`                          | &#x2713; |
@@ -375,7 +376,7 @@ _Note that not all of these are available in all versions of Ruby_
 | &emsp;&nbsp;_**Meta** \[2\]_          | `\M-c`, `\M-\C-C`, `\M-\cC`, `\C-\M-C`, `\c\M-C`        | &#x2713; |
 | &emsp;&nbsp;_**Octal**_               | `\0`, `\01`, `\012`                                     | &#x2713; |
 | &emsp;&nbsp;_**Unicode**_             | `\uHHHH`, `\u{H+ H+}`                                   | &#x2713; |
-| **Unicode Properties**                | _<sub>([Unicode 13.0.0](https://www.unicode.org/versions/Unicode13.0.0/))</sub>_ | &#x22f1; |
+| **Unicode Properties**                | _<sub>([Unicode 13.0.0])</sub>_                         | &#x22f1; |
 | &emsp;&nbsp;_**Age**_                 | `\p{Age=5.2}`, `\P{age=7.0}`, `\p{^age=8.0}`            | &#x2713; |
 | &emsp;&nbsp;_**Blocks**_              | `\p{InArmenian}`, `\P{InKhmer}`, `\p{^InThai}`          | &#x2713; |
 | &emsp;&nbsp;_**Classes**_             | `\p{Alpha}`, `\P{Space}`, `\p{^Alnum}`                  | &#x2713; |
@@ -384,13 +385,17 @@ _Note that not all of these are available in all versions of Ruby_
 | &emsp;&nbsp;_**Scripts**_             | `\p{Arabic}`, `\P{Hiragana}`, `\p{^Greek}`              | &#x2713; |
 | &emsp;&nbsp;_**Simple**_              | `\p{Dash}`, `\p{Extender}`, `\p{^Hyphen}`               | &#x2713; |
-**\[1\]**: Ruby does not support lazy or possessive interval quantifiers. Any `+` or `?` that follows an interval
-quantifier will be treated as another, chained quantifier. See also [#3](https://github.com/ammar/regexp_parser/issue/3),
+[Unicode 13.0.0]: https://www.unicode.org/versions/Unicode13.0.0/
+**\[1\]**: Ruby does not support lazy or possessive interval quantifiers.
+Any `+` or `?` that follows an interval quantifier will be treated as another,
+chained quantifier. See also [#3](https://github.com/ammar/regexp_parser/issue/3),
 [#69](https://github.com/ammar/regexp_parser/pull/69).
-**\[2\]**: As of Ruby 3.1, meta and control sequences are [pre-processed to hex escapes when used in Regexp literals](
- https://github.com/ruby/ruby/commit/11ae581a4a7f5d5f5ec6378872eab8f25381b1b9 ), so they will only reach the
-scanner and will only be emitted if a String or a Regexp that has been built with the `::new` constructor is scanned.
+**\[2\]**: As of Ruby 3.1, meta and control sequences are [pre-processed to hex
+escapes when used in Regexp literals](https://github.com/ruby/ruby/commit/11ae581),
+so they will only reach the scanner and will only be emitted if a String or a Regexp
+that has been built with the `::new` constructor is scanned.
 ##### Inapplicable Features
@@ -407,25 +412,27 @@ expressions library (Onigmo). They are not supported by the scanner.
 See something missing? Please submit an [issue](https://github.com/ammar/regexp_parser/issues)
-_**Note**: Attempting to process expressions with unsupported syntax features can raise an error,
-or incorrectly return tokens/objects as literals._
+_**Note**: Attempting to process expressions with unsupported syntax features can raise
+an error, or incorrectly return tokens/objects as literals._
 ## Testing
 To run the tests simply run rake from the root directory.
-The default task generates the scanner's code from the Ragel source files and runs all the specs, thus it requires Ragel to be installed.
+The default task generates the scanner's code from the Ragel source files and runs
+all the specs, thus it requires Ragel to be installed.
-Note that changes to Ragel files will not be reflected when running `rspec` on its own, so to run individual tests you might want to run:
+Note that changes to Ragel files will not be reflected when running `rspec` on its own,
+so to run individual tests you might want to run:
 ```
 rake ragel:rb && rspec spec/scanner/properties_spec.rb
 ```
 ## Building
-Building the scanner and the gem requires [Ragel](http://www.colm.net/open-source/ragel/) to be
-installed. The build tasks will automatically invoke the 'ragel:rb' task to generate the
-Ruby scanner code.
+Building the scanner and the gem requires [Ragel](http://www.colm.net/open-source/ragel/)
+to be installed. The build tasks will automatically invoke the 'ragel:rb' task to generate
+the Ruby scanner code.
 The project uses the standard rubygems package tasks, so:
@@ -445,19 +452,26 @@ rake install
 ## Example Projects
 Projects using regexp_parser.
-- [capybara](https://github.com/teamcapybara/capybara) is an integration testing tool that uses regexp_parser to convert Regexps to css/xpath selectors.
+- [capybara](https://github.com/teamcapybara/capybara) is an integration testing tool
+that uses regexp_parser to convert Regexps to css/xpath selectors.
-- [js_regex](https://github.com/jaynetics/js_regex) converts Ruby regular expressions to JavaScript-compatible regular expressions.
+- [js_regex](https://github.com/jaynetics/js_regex) converts Ruby regular expressions
+to JavaScript-compatible regular expressions.
-- [meta_re](https://github.com/ammar/meta_re) is a regular expression preprocessor with alias support.
+- [meta_re](https://github.com/ammar/meta_re) is a regular expression preprocessor
+with alias support.
-- [mutant](https://github.com/mbj/mutant) manipulates your regular expressions (amongst others) to see if your tests cover their behavior.
+- [mutant](https://github.com/mbj/mutant) manipulates your regular expressions
+(amongst others) to see if your tests cover their behavior.
-- [repper](https://github.com/jaynetics/repper) is a regular expression pretty-printer for Ruby.
+- [repper](https://github.com/jaynetics/repper) is a regular expression
+pretty-printer and formatter for Ruby.
-- [rubocop](https://github.com/rubocop-hq/rubocop) is a linter for Ruby that uses regexp_parser to lint Regexps.
+- [rubocop](https://github.com/rubocop-hq/rubocop) is a linter for Ruby that
+uses regexp_parser to lint Regexps.
-- [twitter-cldr-rb](https://github.com/twitter/twitter-cldr-rb) is a localization helper that uses regexp_parser to generate examples of postal codes.
+- [twitter-cldr-rb](https://github.com/twitter/twitter-cldr-rb) is a localization helper
+that uses regexp_parser to generate examples of postal codes.
 ## References

data/lib/regexp_parser/expression/base.rb CHANGED Viewed

@@ -14,6 +14,10 @@ module Regexp::Expression
     end
     def to_re(format = :full)
+      if set_level > 0
+        warn "Calling #to_re on character set members is deprecated - "\
+             "their behavior might not be equivalent outside of the set."
+      end
       ::Regexp.new(to_s(format))
     end
@@ -32,15 +36,19 @@ module Regexp::Expression
     end
     def repetitions
-      return 1..1 unless quantified?
-      min = quantifier.min
-      max = quantifier.max < 0 ? Float::INFINITY : quantifier.max
-      range = min..max
-      # fix Range#minmax on old Rubies - https://bugs.ruby-lang.org/issues/15807
-      if RUBY_VERSION.to_f < 2.7
-        range.define_singleton_method(:minmax) { [min, max] }
-      end
-      range
+      @repetitions ||=
+        if quantified?
+          min = quantifier.min
+          max = quantifier.max < 0 ? Float::INFINITY : quantifier.max
+          range = min..max
+          # fix Range#minmax on old Rubies - https://bugs.ruby-lang.org/issues/15807
+          if RUBY_VERSION.to_f < 2.7
+            range.define_singleton_method(:minmax) { [min, max] }
+          end
+          range
+        else
+          1..1
+        end
     end
     def greedy?

data/lib/regexp_parser/expression/classes/backreference.rb CHANGED Viewed

@@ -39,7 +39,7 @@ module Regexp::Expression
     class NameCall           < Backreference::Name; end
     class NumberCallRelative < Backreference::NumberRelative; end
-    class NumberRecursionLevel < Backreference::Number
+    class NumberRecursionLevel < Backreference::NumberRelative
       attr_reader :recursion_level
       def initialize(token, options = {})

data/lib/regexp_parser/expression/classes/escape_sequence.rb CHANGED Viewed

@@ -1,5 +1,5 @@
 module Regexp::Expression
-  # TODO: unify naming with Token::Escape, on way or the other, in v3.0.0
+  # TODO: unify naming with Token::Escape, one way or the other, in v3.0.0
   module EscapeSequence
     class Base < Regexp::Expression::Base
       def codepoint

data/lib/regexp_parser/expression/classes/group.rb CHANGED Viewed

@@ -33,6 +33,8 @@ module Regexp::Expression
     class Absence < Group::Base; end
     class Atomic  < Group::Base; end
+    # TODO: should split off OptionsSwitch in v3.0.0. Maybe even make it no
+    # longer inherit from Group because it is effectively a terminal expression.
     class Options < Group::Base
       attr_accessor :option_changes
@@ -40,6 +42,14 @@ module Regexp::Expression
         self.option_changes = orig.option_changes.dup
         super
       end
+      def quantify(*args)
+        if token == :options_switch
+          raise Regexp::Parser::Error, 'Can not quantify an option switch'
+        else
+          super
+        end
+      end
     end
     class Capture < Group::Base

data/lib/regexp_parser/expression/classes/unicode_property.rb CHANGED Viewed

@@ -1,5 +1,5 @@
 module Regexp::Expression
-  # TODO: unify name with token :property, on way or the other, in v3.0.0
+  # TODO: unify name with token :property, one way or the other, in v3.0.0
   module UnicodeProperty
     class Base < Regexp::Expression::Base
       def negative?

data/lib/regexp_parser/expression/methods/human_name.rb ADDED Viewed

@@ -0,0 +1,43 @@
+module Regexp::Expression
+  module Shared
+    # default implementation, e.g. "atomic group", "hex escape", "word type", ..
+    def human_name
+      [token, type].compact.join(' ').tr('_', ' ')
+    end
+  end
+  Alternation.class_eval                       { def human_name; 'alternation'                 end }
+  Alternative.class_eval                       { def human_name; 'alternative'                 end }
+  Anchor::BOL.class_eval                       { def human_name; 'beginning of line'           end }
+  Anchor::BOS.class_eval                       { def human_name; 'beginning of string'         end }
+  Anchor::EOL.class_eval                       { def human_name; 'end of line'                 end }
+  Anchor::EOS.class_eval                       { def human_name; 'end of string'               end }
+  Anchor::EOSobEOL.class_eval                  { def human_name; 'newline-ready end of string' end }
+  Anchor::MatchStart.class_eval                { def human_name; 'match start'                 end }
+  Anchor::NonWordBoundary.class_eval           { def human_name; 'no word boundary'            end }
+  Anchor::WordBoundary.class_eval              { def human_name; 'word boundary'               end }
+  Assertion::Lookahead.class_eval              { def human_name; 'lookahead'                   end }
+  Assertion::Lookbehind.class_eval             { def human_name; 'lookbehind'                  end }
+  Assertion::NegativeLookahead.class_eval      { def human_name; 'negative lookahead'          end }
+  Assertion::NegativeLookbehind.class_eval     { def human_name; 'negative lookbehind'         end }
+  Backreference::Name.class_eval               { def human_name; 'backreference by name'       end }
+  Backreference::NameCall.class_eval           { def human_name; 'subexpression call by name'  end }
+  Backreference::Number.class_eval             { def human_name; 'backreference'               end }
+  Backreference::NumberRelative.class_eval     { def human_name; 'relative backreference'      end }
+  Backreference::NumberCall.class_eval         { def human_name; 'subexpression call'          end }
+  Backreference::NumberCallRelative.class_eval { def human_name; 'relative subexpression call' end }
+  CharacterSet::IntersectedSequence.class_eval { def human_name; 'intersected sequence'        end }
+  CharacterSet::Intersection.class_eval        { def human_name; 'intersection'                end }
+  CharacterSet::Range.class_eval               { def human_name; 'character range'             end }
+  CharacterType::Any.class_eval                { def human_name; 'match-all'                   end }
+  Comment.class_eval                           { def human_name; 'comment'                     end }
+  Conditional::Branch.class_eval               { def human_name; 'conditional branch'          end }
+  Conditional::Condition.class_eval            { def human_name; 'condition'                   end }
+  Conditional::Expression.class_eval           { def human_name; 'conditional'                 end }
+  Group::Capture.class_eval                    { def human_name; "capture group #{number}"     end }
+  Group::Named.class_eval                      { def human_name; 'named capture group'         end }
+  Keep::Mark.class_eval                        { def human_name; 'keep-mark lookbehind'        end }
+  Literal.class_eval                           { def human_name; 'literal'                     end }
+  Root.class_eval                              { def human_name; 'root'                        end }
+  WhiteSpace.class_eval                        { def human_name; 'free space'                  end }
+end

data/lib/regexp_parser/expression/shared.rb CHANGED Viewed

@@ -8,9 +8,9 @@ module Regexp::Expression
         attr_accessor :type, :token, :text, :ts, :te,
                       :level, :set_level, :conditional_level,
-                      :options, :quantifier
+                      :options
-        attr_reader   :nesting_level
+        attr_reader   :nesting_level, :quantifier
       end
     end
@@ -64,6 +64,10 @@ module Regexp::Expression
       !quantifier.nil?
     end
+    def optional?
+      quantified? && quantifier.min == 0
+    end
     def offset
       [starts_at, full_length]
     end
@@ -81,5 +85,10 @@ module Regexp::Expression
       quantifier && quantifier.nesting_level = lvl
       terminal? || each { |subexp| subexp.nesting_level = lvl + 1 }
     end
+    def quantifier=(qtf)
+      @quantifier = qtf
+      @repetitions = nil # clear memoized value
+    end
   end
 end

data/lib/regexp_parser/expression.rb CHANGED Viewed

@@ -25,6 +25,7 @@ require 'regexp_parser/expression/classes/root'
 require 'regexp_parser/expression/classes/unicode_property'
 require 'regexp_parser/expression/methods/construct'
+require 'regexp_parser/expression/methods/human_name'
 require 'regexp_parser/expression/methods/match'
 require 'regexp_parser/expression/methods/match_length'
 require 'regexp_parser/expression/methods/options'

data/lib/regexp_parser/parser.rb CHANGED Viewed

@@ -235,7 +235,15 @@ class Regexp::Parser
     when :number, :number_ref
       node << Backreference::Number.new(token, active_opts)
     when :number_recursion_ref
-      node << Backreference::NumberRecursionLevel.new(token, active_opts)
+      node << Backreference::NumberRecursionLevel.new(token, active_opts).tap do |exp|
+        # TODO: should split off new token number_recursion_rel_ref and new
+        # class NumberRelativeRecursionLevel in v3.0.0 to get rid of this
+        if exp.text =~ /[<'][+-]/
+          assign_effective_number(exp)
+        else
+          exp.effective_number = exp.number
+        end
+      end
     when :number_call
       node << Backreference::NumberCall.new(token, active_opts)
     when :number_rel_ref
@@ -254,6 +262,8 @@ class Regexp::Parser
   def assign_effective_number(exp)
     exp.effective_number =
       exp.number + total_captured_group_count + (exp.number < 0 ? 1 : 0)
+    exp.effective_number > 0 ||
+      raise(ParserError, "Invalid reference: #{exp.reference}")
   end
   def conditional(token)
@@ -569,15 +579,17 @@ class Regexp::Parser
   # an instance of Backreference::Number, its #referenced_expression is set to
   # the instance of Group::Capture that it refers to via its number.
   def assign_referenced_expressions
-    targets = {}
     # find all referencable expressions
+    targets = { 0 => root }
     root.each_expression do |exp|
       exp.is_a?(Group::Capture) && targets[exp.identifier] = exp
     end
     # assign them to any refering expressions
     root.each_expression do |exp|
-      exp.respond_to?(:reference) &&
-        exp.referenced_expression = targets[exp.reference]
+      next unless exp.respond_to?(:reference)
+      exp.referenced_expression = targets[exp.reference] ||
+        raise(ParserError, "Invalid reference: #{exp.reference}")
     end
   end
 end # module Regexp::Parser

data/lib/regexp_parser/scanner/scanner.rl CHANGED Viewed

@@ -90,8 +90,8 @@
   group_options         = '?' . ( [^!#'():<=>~]+ . ':'? ) ?;
   group_ref             = [gk];
-  group_name_id_ab      = ([^0-9\->] | utf8_multibyte) . ([^>] | utf8_multibyte)*;
-  group_name_id_sq      = ([^0-9\-'] | utf8_multibyte) . ([^'] | utf8_multibyte)*;
+  group_name_id_ab      = ([^!0-9\->] | utf8_multibyte) . ([^>] | utf8_multibyte)*;
+  group_name_id_sq      = ([^0-9\-']  | utf8_multibyte) . ([^'] | utf8_multibyte)*;
   group_number          = '-'? . [1-9] . [0-9]*;
   group_level           = [+\-] . [0-9]+;