RubyGems - regexp_parser - Versions diffs - 2.5.0 → 2.6.2 - Mend

regexp_parser 2.5.0 → 2.6.2

Files changed (17) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +74 -39
data/README.md +45 -31
data/lib/regexp_parser/expression/base.rb +17 -9
data/lib/regexp_parser/expression/classes/backreference.rb +14 -2
data/lib/regexp_parser/expression/classes/escape_sequence.rb +1 -1
data/lib/regexp_parser/expression/classes/group.rb +10 -0
data/lib/regexp_parser/expression/classes/unicode_property.rb +1 -1
data/lib/regexp_parser/expression/methods/human_name.rb +43 -0
data/lib/regexp_parser/expression/methods/match_length.rb +8 -4
data/lib/regexp_parser/expression/shared.rb +11 -2
data/lib/regexp_parser/expression.rb +1 -0
data/lib/regexp_parser/parser.rb +16 -4
data/lib/regexp_parser/scanner/scanner.rl +2 -2
data/lib/regexp_parser/scanner.rb +582 -578
data/lib/regexp_parser/version.rb +1 -1
metadata +3 -2

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: f871ec3cdea5a594f72f5386f1b344710e6204f7307ba40d966653197f526be8
-  data.tar.gz: dd93c880f29ec77531faa2379fbfc8e34a9b67680664c6a3477d38afeaa1809a
+  metadata.gz: 66568005494b517613155277c6be4731eb8a26bb9b48a692a9430507286ce583
+  data.tar.gz: d1fc6c6f1a0c7f939c51703ac844c2dbb134f96e0e55780646cb7e3e87d7a652
 SHA512:
-  metadata.gz: 45e52ab0ce7bec3e4a275efa3828532778c49e8d36eec1ea82a43755a87abc9eee97e986027aa8f5c64fd604f15164d2ad4f37e5d6e22a5a1e3e9da6788271b9
-  data.tar.gz: 1f5514f3252294d9fe0877cff1d8b0db0400838c97ed78d15bbb794b94595c20d081681e4b1fe9bb6c89be7749514d8b2b8cf385360d002cd89e2a76ce6d2e63
+  metadata.gz: b955b2215b71c94497e52841142fab8c2b9930d0d6cea6ea2b3eeb8ed9fe84575e2f34aae3a6051af2b56429f98cf070b9151805f2cb93ddb511ec1e0e50dd7c
+  data.tar.gz: 3a4f083942b66ddb4b67ab33f14bb1c0b724a60c2b30605059d32ce3648e9cb46e31e797b7a526a2028c1e018d73365f5ef955256de4e63397d6ea105714ff12

data/CHANGELOG.md CHANGED Viewed

@@ -1,37 +1,84 @@
+# Changelog
+All notable changes to this project will be documented in this file.
+The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
+and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 ## [Unreleased]
+## [2.6.2] - 2023-01-19 - [Janosch Müller](mailto:janosch84@gmail.com)
+### Fixed
+- fixed `SystemStackError` when cloning recursive subexpression calls
+  * e.g. `Regexp::Parser.parse(/a|b\g<0>/).dup`
+## [2.6.1] - 2022-11-16 - [Janosch Müller](mailto:janosch84@gmail.com)
+### Fixed
+- fixed scanning of two negative lookbehind edge cases
+  * `(?<!x)y>` used to raise a ScannerError
+  * `(?<!x>)y` used to be misinterpreted as a named group
+  * thanks to [Sergio Medina](https://github.com/serch) for the report
+## [2.6.0] - 2022-09-26 - [Janosch Müller](mailto:janosch84@gmail.com)
+### Fixed
+- fixed `#referenced_expression` for `\g<0>` (was `nil`, is now the `Root` exp)
+- fixed `#reference`, `#referenced_expression` for recursion level backrefs
+  * e.g. `(a)(b)\k<-1+1>`
+  * `#referenced_expression` was `nil`, now it is the correct `Group` exp
+- detect and raise for two more syntax errors when parsing String input
+  * quantification of option switches (e.g. `(?i)+`)
+  * invalid references (e.g. `/\k<1>/`)
+  * these are a `SyntaxError` in Ruby, so could only be passed as a String
+### Added
+- `Regexp::Expression::Base#human_name`
+  * returns a nice, human-readable description of the expression
+- `Regexp::Expression::Base#optional?`
+  * returns `true` if the expression is quantified accordingly (e.g. with `*`, `{,n}`)
+- added a deprecation warning when calling `#to_re` on set members
+## [2.5.0] - 2022-05-27 - [Janosch Müller](mailto:janosch84@gmail.com)
 ### Added
 - `Regexp::Expression::Base.construct` and `.token_class` methods
+  * see the [wiki](https://github.com/ammar/regexp_parser/wiki) for details
 ## [2.4.0] - 2022-05-09 - [Janosch Müller](mailto:janosch84@gmail.com)
 ### Fixed
 - fixed interpretation of `+` and `?` after interval quantifiers (`{n,n}`)
-  - they used to be treated as reluctant or possessive mode indicators
-  - however, Ruby does not support these modes for interval quantifiers
-  - they are now treated as chained quantifiers instead, as Ruby does it
-  - c.f. [#3](https://github.com/ammar/regexp_parser/issues/3)
+  * they used to be treated as reluctant or possessive mode indicators
+  * however, Ruby does not support these modes for interval quantifiers
+  * they are now treated as chained quantifiers instead, as Ruby does it
+  * c.f. [#3](https://github.com/ammar/regexp_parser/issues/3)
 - fixed `Expression::Base#nesting_level` for some tree rewrite cases
-  - e.g. the alternatives in `/a|[b]/` had an inconsistent nesting_level
+  * e.g. the alternatives in `/a|[b]/` had an inconsistent nesting_level
 - fixed `Scanner` accepting invalid posix classes, e.g. `[[:foo:]]`
-  - they raise a `SyntaxError` when used in a Regexp, so could only be passed as String
-  - they now raise a `Regexp::Scanner::ValidationError` in the `Scanner`
+  * they raise a `SyntaxError` when used in a Regexp, so could only be passed as String
+  * they now raise a `Regexp::Scanner::ValidationError` in the `Scanner`
 ### Added
 - added `Expression::Base#==` for (deep) comparison of expressions
 - added `Expression::Base#parts`
-  - returns the text elements and subexpressions of an expression
-  - e.g. `parse(/(a)/)[0].parts # => ["(", #<Literal @text="a"...>, ")"]`
+  * returns the text elements and subexpressions of an expression
+  * e.g. `parse(/(a)/)[0].parts # => ["(", #<Literal @text="a"...>, ")"]`
 - added `Expression::Base#te` (a.k.a. token end index)
-  - `Expression::Subexpression` always had `#te`, only terminal nodes lacked it so far
+  * `Expression::Subexpression` always had `#te`, only terminal nodes lacked it so far
 - made some `Expression::Base` methods available on `Quantifier` instances, too
-  - `#type`, `#type?`, `#is?`, `#one_of?`, `#options`, `#terminal?`
-  - `#base_length`, `#full_length`, `#starts_at`, `#te`, `#ts`, `#offset`
-  - `#conditional_level`, `#level`, `#nesting_level` , `#set_level`
-  - this allows a more unified handling with `Expression::Base` instances
+  * `#type`, `#type?`, `#is?`, `#one_of?`, `#options`, `#terminal?`
+  * `#base_length`, `#full_length`, `#starts_at`, `#te`, `#ts`, `#offset`
+  * `#conditional_level`, `#level`, `#nesting_level` , `#set_level`
+  * this allows a more unified handling with `Expression::Base` instances
 - allowed `Quantifier#initialize` to take a token and options Hash like other nodes
 - added a deprecation warning for initializing Quantifiers with 4+ arguments:
@@ -54,18 +101,18 @@
 ### Fixed
 - removed five inexistent unicode properties from `Syntax#features`
-  - these were never supported by Ruby or the `Regexp::Scanner`
-  - thanks to [Markus Schirp](https://github.com/mbj) for the report
+  * these were never supported by Ruby or the `Regexp::Scanner`
+  * thanks to [Markus Schirp](https://github.com/mbj) for the report
 ## [2.3.0] - 2022-04-08 - [Janosch Müller](mailto:janosch84@gmail.com)
 ### Added
 - improved parsing performance through `Syntax` refactoring
-  - instead of fresh `Syntax` instances, pre-loaded constants are now re-used
-  - this approximately doubles the parsing speed for simple regexps
+  * instead of fresh `Syntax` instances, pre-loaded constants are now re-used
+  * this approximately doubles the parsing speed for simple regexps
 - added methods to `Syntax` classes to show relative feature sets
-  - e.g. `Regexp::Syntax::V3_2_0.added_features`
+  * e.g. `Regexp::Syntax::V3_2_0.added_features`
 - support for new unicode properties of Ruby 3.2 / Unicode 14.0
 ## [2.2.1] - 2022-02-11 - [Janosch Müller](mailto:janosch84@gmail.com)
@@ -73,14 +120,14 @@
 ### Fixed
 - fixed Syntax version of absence groups (`(?~...)`)
-  - the lexer accepted them for any Ruby version
-  - now they are only recognized for Ruby >= 2.4.1 in which they were introduced
+  * the lexer accepted them for any Ruby version
+  * now they are only recognized for Ruby >= 2.4.1 in which they were introduced
 - reduced gem size by excluding specs from package
 - removed deprecated `test_files` gemspec setting
 - no longer depend on `yaml`/`psych` (except for Ruby <= 2.4)
 - no longer depend on `set`
-  - `set` was removed from the stdlib and made a standalone gem as of Ruby 3
-  - this made it a hidden/undeclared dependency of `regexp_parser`
+  * `set` was removed from the stdlib and made a standalone gem as of Ruby 3
+  * this made it a hidden/undeclared dependency of `regexp_parser`
 ## [2.2.0] - 2021-12-04 - [Janosch Müller](mailto:janosch84@gmail.com)
@@ -318,8 +365,8 @@
 - Fixed missing quantifier in `Conditional::Expression` methods `#to_s`, `#to_re`
 - `Conditional::Condition` no longer lives outside the recursive `#expressions` tree
-  - it used to be the only expression stored in a custom ivar, complicating traversal
-  - its setter and getter (`#condition=`, `#condition`) still work as before
+  * it used to be the only expression stored in a custom ivar, complicating traversal
+  * its setter and getter (`#condition=`, `#condition`) still work as before
 ## [1.1.0] - 2018-09-17 - [Janosch Müller](mailto:janosch84@gmail.com)
@@ -327,8 +374,8 @@
 - Added `Quantifier` methods `#greedy?`, `#possessive?`, `#reluctant?`/`#lazy?`
 - Added `Group::Options#option_changes`
-  - shows the options enabled or disabled by the given options group
-  - as with all other expressions, `#options` shows the overall active options
+  * shows the options enabled or disabled by the given options group
+  * as with all other expressions, `#options` shows the overall active options
 - Added `Conditional#reference` and `Condition#reference`, indicating the determinative group
 - Added `Subexpression#dig`, acts like [`Array#dig`](http://ruby-doc.org/core-2.5.0/Array.html#method-i-dig)
@@ -512,7 +559,6 @@ This release includes several breaking changes, mostly to character sets, #map a
   * Fixed scanning of zero length comments (PR #12)
   * Fixed missing escape:codepoint_list syntax token (PR #14)
   * Fixed to_s for modified interval quantifiers (PR #17)
-- Added a note about MRI implementation quirks to Scanner section
 ## [0.3.2] - 2016-01-01 - [Ammar Ali](mailto:ammarabuali@gmail.com)
@@ -538,7 +584,6 @@ This release includes several breaking changes, mostly to character sets, #map a
 - Renamed Lexer's method to lex, added an alias to the old name (scan)
 - Use #map instead of #each to run the block in Lexer.lex.
 - Replaced VERSION.yml file with a constant.
-- Updated README
 - Update tokens and scanner with new additions in Unicode 7.0.
 ## [0.1.6] - 2014-10-06 - [Ammar Ali](mailto:ammarabuali@gmail.com)
@@ -548,20 +593,11 @@ This release includes several breaking changes, mostly to character sets, #map a
 - Added syntax files for missing ruby 2.x versions. These do not add
   extra syntax support, they just make the gem work with the newer
   ruby versions.
-- Added .travis.yml to project root.
-- README:
-  - Removed note purporting runtime support for ruby 1.8.6.
-  - Added a section identifying the main unsupported syntax features.
-  - Added sections for Testing and Building
-  - Added badges for gem version, Travis CI, and code climate.
-- Updated README, fixing broken examples, and converting it from a rdoc file to Github's flavor of Markdown.
 - Fixed a parser bug where an alternation sequence that contained nested expressions was incorrectly being appended to the parent expression when the nesting was exited. e.g. in /a|(b)c/, c was appended to the root.
 - Fixed a bug where character types were not being correctly scanned within character sets. e.g. in [\d], two tokens were scanned; one for the backslash '\' and one for the 'd'
 ## [0.1.5] - 2014-01-14 - [Ammar Ali](mailto:ammarabuali@gmail.com)
-- Correct ChangeLog.
 - Added syntax stubs for ruby versions 2.0 and 2.1
 - Added clone methods for deep copying expressions.
 - Added optional format argument for to_s on expressions to return the text of the expression with (:full, the default) or without (:base) its quantifier.
@@ -570,7 +606,6 @@ This release includes several breaking changes, mostly to character sets, #map a
 - Improved EOF handling in general and especially from sequences like hex and control escapes.
 - Fixed a bug where named groups with an empty name would return a blank token [].
 - Fixed a bug where member of a parent set where being added to its last subset.
-- Various code cleanups in scanner.rl
 - Fixed a few mutable string bugs by calling dup on the originals.
 - Made ruby 1.8.6 the base for all 1.8 syntax, and the 1.8 name a pointer to the latest (1.8.7 at this time)
 - Removed look-behind assertions (positive and negative) from 1.8 syntax

data/README.md CHANGED Viewed

@@ -9,8 +9,8 @@ A Ruby gem for tokenizing, parsing, and transforming regular expressions.
 * Multilayered
   * A scanner/tokenizer based on [Ragel](http://www.colm.net/open-source/ragel/)
-  * A lexer that produces a "stream" of token objects.
-  * A parser that produces a "tree" of Expression objects (OO API)
+  * A lexer that produces a "stream" of [Token objects](https://github.com/ammar/regexp_parser/wiki/Token-Objects)
+  * A parser that produces a "tree" of [Expression objects (OO API)](https://github.com/ammar/regexp_parser/wiki/Expression-Objects)
 * Runs on Ruby 2.x, 3.x and JRuby runtimes
 * Recognizes Ruby 1.8, 1.9, 2.x and 3.x regular expressions [See Supported Syntax](#supported-syntax)
@@ -36,14 +36,15 @@ Or, add it to your project's `Gemfile`:
 ```gem 'regexp_parser', '~> X.Y.Z'```
-See rubygems for the the [latest version number](https://rubygems.org/gems/regexp_parser)
+See the badge at the top of this README or [rubygems](https://rubygems.org/gems/regexp_parser)
+for the the latest version number.
 ---
 ## Usage
 The three main modules are **Scanner**, **Lexer**, and **Parser**. Each of them
-provides a single method that takes a regular expression (as a RegExp object or
+provides a single method that takes a regular expression (as a Regexp object or
 a string) and returns its results. The **Lexer** and the **Parser** accept an
 optional second argument that specifies the syntax version, like 'ruby/2.0',
 which defaults to the host Ruby version (using RUBY_VERSION).
@@ -79,7 +80,7 @@ All three methods accept either a `Regexp` or `String` (containing the pattern)
 require 'regexp_parser'
 Regexp::Parser.parse(
-  "a+ # Recognises a and A...",
+  "a+ # Recognizes a and A...",
   options: ::Regexp::EXTENDED | ::Regexp::IGNORECASE
 )
 ```
@@ -101,7 +102,7 @@ start/end offsets for each token found.
 ```ruby
 require 'regexp_parser'
-Regexp::Scanner.scan /(ab?(cd)*[e-h]+)/  do |type, token, text, ts, te|
+Regexp::Scanner.scan(/(ab?(cd)*[e-h]+)/) do |type, token, text, ts, te|
   puts "type: #{type}, token: #{token}, text: '#{text}' [#{ts}..#{te}]"
 end
@@ -124,7 +125,7 @@ A one-liner that uses map on the result of the scan to return the textual
 parts of the pattern:
 ```ruby
-Regexp::Scanner.scan( /(cat?([bhm]at)){3,5}/ ).map {|token| token[2]}
+Regexp::Scanner.scan(/(cat?([bhm]at)){3,5}/).map { |token| token[2] }
 #=> ["(", "cat", "?", "(", "[", "b", "h", "m", "]", "at", ")", ")", "{3,5}"]
 ```
@@ -220,7 +221,7 @@ syntax, and prints the token objects' text indented to their level.
 ```ruby
 require 'regexp_parser'
-Regexp::Lexer.lex /a?(b(c))*[d]+/, 'ruby/1.9' do |token|
+Regexp::Lexer.lex(/a?(b(c))*[d]+/, 'ruby/1.9') do |token|
   puts "#{'  ' * token.level}#{token.text}"
 end
@@ -246,7 +247,7 @@ how the sequence 'cat' is treated. The 't' is separated because it's followed
 by a quantifier that only applies to it.
 ```ruby
-Regexp::Lexer.scan( /(cat?([b]at)){3,5}/ ).map {|token| token.text}
+Regexp::Lexer.scan(/(cat?([b]at)){3,5}/).map { |token| token.text }
 #=> ["(", "ca", "t", "?", "(", "[", "b", "]", "at", ")", ")", "{3,5}"]
 ```
@@ -274,7 +275,7 @@ require 'regexp_parser'
 regex = /a?(b+(c)d)*(?<name>[0-9]+)/
-tree = Regexp::Parser.parse( regex, 'ruby/2.1' )
+tree = Regexp::Parser.parse(regex, 'ruby/2.1')
 tree.traverse do |event, exp|
   puts "#{event}: #{exp.type} `#{exp.to_s}`"
@@ -355,7 +356,7 @@ _Note that not all of these are available in all versions of Ruby_
 | &emsp;&emsp;_Nest Level_              | `\k<n-1>`                                               | &#x2713; |
 | &emsp;&emsp;_Numbered_                | `\k<1>`                                                 | &#x2713; |
 | &emsp;&emsp;_Relative_                | `\k<-2>`                                                | &#x2713; |
-| &emsp;&emsp;_Traditional_             | `\1` thru `\9`                                          | &#x2713; |
+| &emsp;&emsp;_Traditional_             | `\1` through `\9`                                       | &#x2713; |
 | &emsp;&nbsp;_**Capturing**_           | `(abc)`                                                 | &#x2713; |
 | &emsp;&nbsp;_**Comments**_            | `(?# comment text)`                                     | &#x2713; |
 | &emsp;&nbsp;_**Named**_               | `(?<name>abc)`, `(?'name'abc)`                          | &#x2713; |
@@ -375,7 +376,7 @@ _Note that not all of these are available in all versions of Ruby_
 | &emsp;&nbsp;_**Meta** \[2\]_          | `\M-c`, `\M-\C-C`, `\M-\cC`, `\C-\M-C`, `\c\M-C`        | &#x2713; |
 | &emsp;&nbsp;_**Octal**_               | `\0`, `\01`, `\012`                                     | &#x2713; |
 | &emsp;&nbsp;_**Unicode**_             | `\uHHHH`, `\u{H+ H+}`                                   | &#x2713; |
-| **Unicode Properties**                | _<sub>([Unicode 13.0.0](https://www.unicode.org/versions/Unicode13.0.0/))</sub>_ | &#x22f1; |
+| **Unicode Properties**                | _<sub>([Unicode 13.0.0])</sub>_                         | &#x22f1; |
 | &emsp;&nbsp;_**Age**_                 | `\p{Age=5.2}`, `\P{age=7.0}`, `\p{^age=8.0}`            | &#x2713; |
 | &emsp;&nbsp;_**Blocks**_              | `\p{InArmenian}`, `\P{InKhmer}`, `\p{^InThai}`          | &#x2713; |
 | &emsp;&nbsp;_**Classes**_             | `\p{Alpha}`, `\P{Space}`, `\p{^Alnum}`                  | &#x2713; |
@@ -384,13 +385,17 @@ _Note that not all of these are available in all versions of Ruby_
 | &emsp;&nbsp;_**Scripts**_             | `\p{Arabic}`, `\P{Hiragana}`, `\p{^Greek}`              | &#x2713; |
 | &emsp;&nbsp;_**Simple**_              | `\p{Dash}`, `\p{Extender}`, `\p{^Hyphen}`               | &#x2713; |
-**\[1\]**: Ruby does not support lazy or possessive interval quantifiers. Any `+` or `?` that follows an interval
-quantifier will be treated as another, chained quantifier. See also [#3](https://github.com/ammar/regexp_parser/issue/3),
+[Unicode 13.0.0]: https://www.unicode.org/versions/Unicode13.0.0/
+**\[1\]**: Ruby does not support lazy or possessive interval quantifiers.
+Any `+` or `?` that follows an interval quantifier will be treated as another,
+chained quantifier. See also [#3](https://github.com/ammar/regexp_parser/issue/3),
 [#69](https://github.com/ammar/regexp_parser/pull/69).
-**\[2\]**: As of Ruby 3.1, meta and control sequences are [pre-processed to hex escapes when used in Regexp literals](
- https://github.com/ruby/ruby/commit/11ae581a4a7f5d5f5ec6378872eab8f25381b1b9 ), so they will only reach the
-scanner and will only be emitted if a String or a Regexp that has been built with the `::new` constructor is scanned.
+**\[2\]**: As of Ruby 3.1, meta and control sequences are [pre-processed to hex
+escapes when used in Regexp literals](https://github.com/ruby/ruby/commit/11ae581),
+so they will only reach the scanner and will only be emitted if a String or a Regexp
+that has been built with the `::new` constructor is scanned.
 ##### Inapplicable Features
@@ -407,25 +412,27 @@ expressions library (Onigmo). They are not supported by the scanner.
 See something missing? Please submit an [issue](https://github.com/ammar/regexp_parser/issues)
-_**Note**: Attempting to process expressions with unsupported syntax features can raise an error,
-or incorrectly return tokens/objects as literals._
+_**Note**: Attempting to process expressions with unsupported syntax features can raise
+an error, or incorrectly return tokens/objects as literals._
 ## Testing
 To run the tests simply run rake from the root directory.
-The default task generates the scanner's code from the Ragel source files and runs all the specs, thus it requires Ragel to be installed.
+The default task generates the scanner's code from the Ragel source files and runs
+all the specs, thus it requires Ragel to be installed.
-Note that changes to Ragel files will not be reflected when running `rspec` on its own, so to run individual tests you might want to run:
+Note that changes to Ragel files will not be reflected when running `rspec` on its own,
+so to run individual tests you might want to run:
 ```
 rake ragel:rb && rspec spec/scanner/properties_spec.rb
 ```
 ## Building
-Building the scanner and the gem requires [Ragel](http://www.colm.net/open-source/ragel/) to be
-installed. The build tasks will automatically invoke the 'ragel:rb' task to generate the
-Ruby scanner code.
+Building the scanner and the gem requires [Ragel](http://www.colm.net/open-source/ragel/)
+to be installed. The build tasks will automatically invoke the 'ragel:rb' task to generate
+the Ruby scanner code.
 The project uses the standard rubygems package tasks, so:
@@ -445,19 +452,26 @@ rake install
 ## Example Projects
 Projects using regexp_parser.
-- [capybara](https://github.com/teamcapybara/capybara) is an integration testing tool that uses regexp_parser to convert Regexps to css/xpath selectors.
+- [capybara](https://github.com/teamcapybara/capybara) is an integration testing tool
+that uses regexp_parser to convert Regexps to css/xpath selectors.
-- [js_regex](https://github.com/jaynetics/js_regex) converts Ruby regular expressions to JavaScript-compatible regular expressions.
+- [js_regex](https://github.com/jaynetics/js_regex) converts Ruby regular expressions
+to JavaScript-compatible regular expressions.
-- [meta_re](https://github.com/ammar/meta_re) is a regular expression preprocessor with alias support.
+- [meta_re](https://github.com/ammar/meta_re) is a regular expression preprocessor
+with alias support.
-- [mutant](https://github.com/mbj/mutant) manipulates your regular expressions (amongst others) to see if your tests cover their behavior.
+- [mutant](https://github.com/mbj/mutant) manipulates your regular expressions
+(amongst others) to see if your tests cover their behavior.
-- [repper](https://github.com/jaynetics/repper) is a regular expression pretty-printer for Ruby.
+- [repper](https://github.com/jaynetics/repper) is a regular expression
+pretty-printer and formatter for Ruby.
-- [rubocop](https://github.com/rubocop-hq/rubocop) is a linter for Ruby that uses regexp_parser to lint Regexps.
+- [rubocop](https://github.com/rubocop-hq/rubocop) is a linter for Ruby that
+uses regexp_parser to lint Regexps.
-- [twitter-cldr-rb](https://github.com/twitter/twitter-cldr-rb) is a localization helper that uses regexp_parser to generate examples of postal codes.
+- [twitter-cldr-rb](https://github.com/twitter/twitter-cldr-rb) is a localization helper
+that uses regexp_parser to generate examples of postal codes.
 ## References

data/lib/regexp_parser/expression/base.rb CHANGED Viewed

@@ -14,6 +14,10 @@ module Regexp::Expression
     end
     def to_re(format = :full)
+      if set_level > 0
+        warn "Calling #to_re on character set members is deprecated - "\
+             "their behavior might not be equivalent outside of the set."
+      end
       ::Regexp.new(to_s(format))
     end
@@ -32,15 +36,19 @@ module Regexp::Expression
     end
     def repetitions
-      return 1..1 unless quantified?
-      min = quantifier.min
-      max = quantifier.max < 0 ? Float::INFINITY : quantifier.max
-      range = min..max
-      # fix Range#minmax on old Rubies - https://bugs.ruby-lang.org/issues/15807
-      if RUBY_VERSION.to_f < 2.7
-        range.define_singleton_method(:minmax) { [min, max] }
-      end
-      range
+      @repetitions ||=
+        if quantified?
+          min = quantifier.min
+          max = quantifier.max < 0 ? Float::INFINITY : quantifier.max
+          range = min..max
+          # fix Range#minmax on old Rubies - https://bugs.ruby-lang.org/issues/15807
+          if RUBY_VERSION.to_f < 2.7
+            range.define_singleton_method(:minmax) { [min, max] }
+          end
+          range
+        else
+          1..1
+        end
     end
     def greedy?

data/lib/regexp_parser/expression/classes/backreference.rb CHANGED Viewed

@@ -5,7 +5,19 @@ module Regexp::Expression
       attr_accessor :referenced_expression
       def initialize_copy(orig)
-        self.referenced_expression = orig.referenced_expression.dup
+        exp_id = [self.class, self.starts_at]
+        # prevent infinite recursion for recursive subexp calls
+        copied = @@copied ||= {}
+        self.referenced_expression =
+          if copied[exp_id]
+            orig.referenced_expression
+          else
+            copied[exp_id] = true
+            orig.referenced_expression.dup
+          end
+        copied.clear
         super
       end
     end
@@ -39,7 +51,7 @@ module Regexp::Expression
     class NameCall           < Backreference::Name; end
     class NumberCallRelative < Backreference::NumberRelative; end
-    class NumberRecursionLevel < Backreference::Number
+    class NumberRecursionLevel < Backreference::NumberRelative
       attr_reader :recursion_level
       def initialize(token, options = {})

data/lib/regexp_parser/expression/classes/escape_sequence.rb CHANGED Viewed

@@ -1,5 +1,5 @@
 module Regexp::Expression
-  # TODO: unify naming with Token::Escape, on way or the other, in v3.0.0
+  # TODO: unify naming with Token::Escape, one way or the other, in v3.0.0
   module EscapeSequence
     class Base < Regexp::Expression::Base
       def codepoint

data/lib/regexp_parser/expression/classes/group.rb CHANGED Viewed

@@ -33,6 +33,8 @@ module Regexp::Expression
     class Absence < Group::Base; end
     class Atomic  < Group::Base; end
+    # TODO: should split off OptionsSwitch in v3.0.0. Maybe even make it no
+    # longer inherit from Group because it is effectively a terminal expression.
     class Options < Group::Base
       attr_accessor :option_changes
@@ -40,6 +42,14 @@ module Regexp::Expression
         self.option_changes = orig.option_changes.dup
         super
       end
+      def quantify(*args)
+        if token == :options_switch
+          raise Regexp::Parser::Error, 'Can not quantify an option switch'
+        else
+          super
+        end
+      end
     end
     class Capture < Group::Base

data/lib/regexp_parser/expression/classes/unicode_property.rb CHANGED Viewed

@@ -1,5 +1,5 @@
 module Regexp::Expression
-  # TODO: unify name with token :property, on way or the other, in v3.0.0
+  # TODO: unify name with token :property, one way or the other, in v3.0.0
   module UnicodeProperty
     class Base < Regexp::Expression::Base
       def negative?

data/lib/regexp_parser/expression/methods/human_name.rb ADDED Viewed

@@ -0,0 +1,43 @@
+module Regexp::Expression
+  module Shared
+    # default implementation, e.g. "atomic group", "hex escape", "word type", ..
+    def human_name
+      [token, type].compact.join(' ').tr('_', ' ')
+    end
+  end
+  Alternation.class_eval                       { def human_name; 'alternation'                 end }
+  Alternative.class_eval                       { def human_name; 'alternative'                 end }
+  Anchor::BOL.class_eval                       { def human_name; 'beginning of line'           end }
+  Anchor::BOS.class_eval                       { def human_name; 'beginning of string'         end }
+  Anchor::EOL.class_eval                       { def human_name; 'end of line'                 end }
+  Anchor::EOS.class_eval                       { def human_name; 'end of string'               end }
+  Anchor::EOSobEOL.class_eval                  { def human_name; 'newline-ready end of string' end }
+  Anchor::MatchStart.class_eval                { def human_name; 'match start'                 end }
+  Anchor::NonWordBoundary.class_eval           { def human_name; 'no word boundary'            end }
+  Anchor::WordBoundary.class_eval              { def human_name; 'word boundary'               end }
+  Assertion::Lookahead.class_eval              { def human_name; 'lookahead'                   end }
+  Assertion::Lookbehind.class_eval             { def human_name; 'lookbehind'                  end }
+  Assertion::NegativeLookahead.class_eval      { def human_name; 'negative lookahead'          end }
+  Assertion::NegativeLookbehind.class_eval     { def human_name; 'negative lookbehind'         end }
+  Backreference::Name.class_eval               { def human_name; 'backreference by name'       end }
+  Backreference::NameCall.class_eval           { def human_name; 'subexpression call by name'  end }
+  Backreference::Number.class_eval             { def human_name; 'backreference'               end }
+  Backreference::NumberRelative.class_eval     { def human_name; 'relative backreference'      end }
+  Backreference::NumberCall.class_eval         { def human_name; 'subexpression call'          end }
+  Backreference::NumberCallRelative.class_eval { def human_name; 'relative subexpression call' end }
+  CharacterSet::IntersectedSequence.class_eval { def human_name; 'intersected sequence'        end }
+  CharacterSet::Intersection.class_eval        { def human_name; 'intersection'                end }
+  CharacterSet::Range.class_eval               { def human_name; 'character range'             end }
+  CharacterType::Any.class_eval                { def human_name; 'match-all'                   end }
+  Comment.class_eval                           { def human_name; 'comment'                     end }
+  Conditional::Branch.class_eval               { def human_name; 'conditional branch'          end }
+  Conditional::Condition.class_eval            { def human_name; 'condition'                   end }
+  Conditional::Expression.class_eval           { def human_name; 'conditional'                 end }
+  Group::Capture.class_eval                    { def human_name; "capture group #{number}"     end }
+  Group::Named.class_eval                      { def human_name; 'named capture group'         end }
+  Keep::Mark.class_eval                        { def human_name; 'keep-mark lookbehind'        end }
+  Literal.class_eval                           { def human_name; 'literal'                     end }
+  Root.class_eval                              { def human_name; 'root'                        end }
+  WhiteSpace.class_eval                        { def human_name; 'free space'                  end }
+end

data/lib/regexp_parser/expression/methods/match_length.rb CHANGED Viewed

@@ -63,16 +63,20 @@ class Regexp::MatchLength
   end
   def to_re
-    "(?:#{reify.call}){#{min_rep},#{max_rep unless max_rep == Float::INFINITY}}"
+    /(?:#{reify.call}){#{min_rep},#{max_rep unless max_rep == Float::INFINITY}}/
   end
   private
   attr_accessor :base_min, :base_max, :min_rep, :max_rep, :exp_class, :reify
-  def test_regexp
-    @test_regexp ||= Regexp.new("^#{to_re}$").tap do |regexp|
-      regexp.respond_to?(:match?) || def regexp.match?(str); !!match(str) end
+  if Regexp.method_defined?(:match?) # ruby >= 2.4
+    def test_regexp
+      @test_regexp ||= /^#{to_re}$/
+    end
+  else
+    def test_regexp
+      @test_regexp ||= /^#{to_re}$/.tap { |r| def r.match?(s); !!match(s) end }
     end
   end
 end

data/lib/regexp_parser/expression/shared.rb CHANGED Viewed

@@ -8,9 +8,9 @@ module Regexp::Expression
         attr_accessor :type, :token, :text, :ts, :te,
                       :level, :set_level, :conditional_level,
-                      :options, :quantifier
+                      :options
-        attr_reader   :nesting_level
+        attr_reader   :nesting_level, :quantifier
       end
     end
@@ -64,6 +64,10 @@ module Regexp::Expression
       !quantifier.nil?
     end
+    def optional?
+      quantified? && quantifier.min == 0
+    end
     def offset
       [starts_at, full_length]
     end
@@ -81,5 +85,10 @@ module Regexp::Expression
       quantifier && quantifier.nesting_level = lvl
       terminal? || each { |subexp| subexp.nesting_level = lvl + 1 }
     end
+    def quantifier=(qtf)
+      @quantifier = qtf
+      @repetitions = nil # clear memoized value
+    end
   end
 end

data/lib/regexp_parser/expression.rb CHANGED Viewed

@@ -25,6 +25,7 @@ require 'regexp_parser/expression/classes/root'
 require 'regexp_parser/expression/classes/unicode_property'
 require 'regexp_parser/expression/methods/construct'
+require 'regexp_parser/expression/methods/human_name'
 require 'regexp_parser/expression/methods/match'
 require 'regexp_parser/expression/methods/match_length'
 require 'regexp_parser/expression/methods/options'