RubyGems - regexp_parser - Versions diffs - 2.4.0 → 2.6.0 - Mend

regexp_parser 2.4.0 → 2.6.0

Files changed (22) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +66 -41
data/README.md +46 -30
data/lib/regexp_parser/expression/base.rb +17 -9
data/lib/regexp_parser/expression/classes/backreference.rb +2 -1
data/lib/regexp_parser/expression/classes/{type.rb → character_type.rb} +0 -0
data/lib/regexp_parser/expression/classes/escape_sequence.rb +1 -1
data/lib/regexp_parser/expression/classes/group.rb +10 -0
data/lib/regexp_parser/expression/classes/keep.rb +2 -0
data/lib/regexp_parser/expression/classes/root.rb +3 -5
data/lib/regexp_parser/expression/classes/{property.rb → unicode_property.rb} +1 -0
data/lib/regexp_parser/expression/methods/construct.rb +43 -0
data/lib/regexp_parser/expression/methods/human_name.rb +43 -0
data/lib/regexp_parser/expression/methods/match_length.rb +1 -1
data/lib/regexp_parser/expression/quantifier.rb +6 -5
data/lib/regexp_parser/expression/sequence.rb +7 -21
data/lib/regexp_parser/expression/shared.rb +15 -2
data/lib/regexp_parser/expression.rb +4 -2
data/lib/regexp_parser/parser.rb +26 -17
data/lib/regexp_parser/syntax/token/escape.rb +1 -1
data/lib/regexp_parser/version.rb +1 -1
metadata +6 -4

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 8b84a4bb274f31b8608c7dc9d55ff6f1b8d92d0d147976f38079ae7701a6debe
-  data.tar.gz: 41db5f094d0beafade30a1fac2707cbc827831e818c485ad35d7173f18c6a91a
+  metadata.gz: cadf1761e17469c6bf76db652a4f6fc97a3d33b7eaa46e6ea16f95ee6661743d
+  data.tar.gz: 3d6252f67f201b3cb6a3b94721c65b39abfe7b13bf0097fc9144498f6fdf8837
 SHA512:
-  metadata.gz: 5dcde6135ac42db609402e47e04ee3be1da8854de286d2baad15dafee04d451814fd7a3bae7adc5440a1fced811e242b69f5fd14bcfc4f3bd5091f86769d56be
-  data.tar.gz: 2660d0fb28a972a1de53b71b16f8591e573d4214724b5eea8a452549598ff5d0fc5b731149e8332f65bce01c812f4d0d72135bba7e3016064d9f05202a8b5580
+  metadata.gz: 3fb24f56b5d8da354aa5825dc2e9432c7e8bd836c9c2a7009c8883e367fb8ca61020a04854c714cacff913281b1156b4663334696edcb1d7e9239d8c8184d439
+  data.tar.gz: e793b72a9394e26bf0b9e6cb58c7536b72c30562382713f8b60735969f3b3b9b3aea78bf45efa661397d7141c2684a6df2b32cc8b449c413ea9d11c90c5396db

data/CHANGELOG.md CHANGED Viewed

@@ -1,33 +1,68 @@
+# Changelog
+All notable changes to this project will be documented in this file.
+The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
+and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 ## [Unreleased]
+## [2.6.0] - 2022-09-26 - [Janosch Müller](mailto:janosch84@gmail.com)
+### Fixed
+- fixed `#referenced_expression` for `\g<0>` (was `nil`, is now the `Root` exp)
+- fixed `#reference`, `#referenced_expression` for recursion level backrefs
+  * e.g. `(a)(b)\k<-1+1>`
+  * `#referenced_expression` was `nil`, now it is the correct `Group` exp
+- detect and raise for two more syntax errors when parsing String input
+  * quantification of option switches (e.g. `(?i)+`)
+  * invalid references (e.g. `/\k<1>/`)
+  * these are a `SyntaxError` in Ruby, so could only be passed as a String
+### Added
+- `Regexp::Expression::Base#human_name`
+  * returns a nice, human-readable description of the expression
+- `Regexp::Expression::Base#optional?`
+  * returns `true` if the expression is quantified accordingly (e.g. with `*`, `{,n}`)
+- added a deprecation warning when calling `#to_re` on set members
+## [2.5.0] - 2022-05-27 - [Janosch Müller](mailto:janosch84@gmail.com)
+### Added
+- `Regexp::Expression::Base.construct` and `.token_class` methods
+  * see the [wiki](https://github.com/ammar/regexp_parser/wiki) for details
 ## [2.4.0] - 2022-05-09 - [Janosch Müller](mailto:janosch84@gmail.com)
 ### Fixed
 - fixed interpretation of `+` and `?` after interval quantifiers (`{n,n}`)
-  - they used to be treated as reluctant or possessive mode indicators
-  - however, Ruby does not support these modes for interval quantifiers
-  - they are now treated as chained quantifiers instead, as Ruby does it
-  - c.f. [#3](https://github.com/ammar/regexp_parser/issues/3)
+  * they used to be treated as reluctant or possessive mode indicators
+  * however, Ruby does not support these modes for interval quantifiers
+  * they are now treated as chained quantifiers instead, as Ruby does it
+  * c.f. [#3](https://github.com/ammar/regexp_parser/issues/3)
 - fixed `Expression::Base#nesting_level` for some tree rewrite cases
-  - e.g. the alternatives in `/a|[b]/` had an inconsistent nesting_level
+  * e.g. the alternatives in `/a|[b]/` had an inconsistent nesting_level
 - fixed `Scanner` accepting invalid posix classes, e.g. `[[:foo:]]`
-  - they raise a `SyntaxError` when used in a Regexp, so could only be passed as String
-  - they now raise a `Regexp::Scanner::ValidationError` in the `Scanner`
+  * they raise a `SyntaxError` when used in a Regexp, so could only be passed as String
+  * they now raise a `Regexp::Scanner::ValidationError` in the `Scanner`
 ### Added
 - added `Expression::Base#==` for (deep) comparison of expressions
 - added `Expression::Base#parts`
-  - returns the text elements and subexpressions of an expression
-  - e.g. `parse(/(a)/)[0].parts # => ["(", #<Literal @text="a"...>, ")"]`
+  * returns the text elements and subexpressions of an expression
+  * e.g. `parse(/(a)/)[0].parts # => ["(", #<Literal @text="a"...>, ")"]`
 - added `Expression::Base#te` (a.k.a. token end index)
-  - `Expression::Subexpression` always had `#te`, only terminal nodes lacked it so far
+  * `Expression::Subexpression` always had `#te`, only terminal nodes lacked it so far
 - made some `Expression::Base` methods available on `Quantifier` instances, too
-  - `#type`, `#type?`, `#is?`, `#one_of?`, `#options`, `#terminal?`
-  - `#base_length`, `#full_length`, `#starts_at`, `#te`, `#ts`, `#offset`
-  - `#conditional_level`, `#level`, `#nesting_level` , `#set_level`
-  - this allows a more unified handling with `Expression::Base` instances
+  * `#type`, `#type?`, `#is?`, `#one_of?`, `#options`, `#terminal?`
+  * `#base_length`, `#full_length`, `#starts_at`, `#te`, `#ts`, `#offset`
+  * `#conditional_level`, `#level`, `#nesting_level` , `#set_level`
+  * this allows a more unified handling with `Expression::Base` instances
 - allowed `Quantifier#initialize` to take a token and options Hash like other nodes
 - added a deprecation warning for initializing Quantifiers with 4+ arguments:
@@ -36,10 +71,12 @@
     It will no longer be supported in regexp_parser v3.0.0.
-    Please pass a Regexp::Token instead, e.g. replace `type, text, min, max, mode`
-    with `::Regexp::Token.new(:quantifier, type, text)`. min, max, and mode
+    Please pass a Regexp::Token instead, e.g. replace `token, text, min, max, mode`
+    with `::Regexp::Token.new(:quantifier, token, text)`. min, max, and mode
     will be derived automatically.
+    Or do `exp.quantifier = Quantifier.construct(token: token, text: str)`.
     This is consistent with how Expression::Base instances are created.
@@ -48,18 +85,18 @@
 ### Fixed
 - removed five inexistent unicode properties from `Syntax#features`
-  - these were never supported by Ruby or the `Regexp::Scanner`
-  - thanks to [Markus Schirp](https://github.com/mbj) for the report
+  * these were never supported by Ruby or the `Regexp::Scanner`
+  * thanks to [Markus Schirp](https://github.com/mbj) for the report
 ## [2.3.0] - 2022-04-08 - [Janosch Müller](mailto:janosch84@gmail.com)
 ### Added
 - improved parsing performance through `Syntax` refactoring
-  - instead of fresh `Syntax` instances, pre-loaded constants are now re-used
-  - this approximately doubles the parsing speed for simple regexps
+  * instead of fresh `Syntax` instances, pre-loaded constants are now re-used
+  * this approximately doubles the parsing speed for simple regexps
 - added methods to `Syntax` classes to show relative feature sets
-  - e.g. `Regexp::Syntax::V3_2_0.added_features`
+  * e.g. `Regexp::Syntax::V3_2_0.added_features`
 - support for new unicode properties of Ruby 3.2 / Unicode 14.0
 ## [2.2.1] - 2022-02-11 - [Janosch Müller](mailto:janosch84@gmail.com)
@@ -67,14 +104,14 @@
 ### Fixed
 - fixed Syntax version of absence groups (`(?~...)`)
-  - the lexer accepted them for any Ruby version
-  - now they are only recognized for Ruby >= 2.4.1 in which they were introduced
+  * the lexer accepted them for any Ruby version
+  * now they are only recognized for Ruby >= 2.4.1 in which they were introduced
 - reduced gem size by excluding specs from package
 - removed deprecated `test_files` gemspec setting
 - no longer depend on `yaml`/`psych` (except for Ruby <= 2.4)
 - no longer depend on `set`
-  - `set` was removed from the stdlib and made a standalone gem as of Ruby 3
-  - this made it a hidden/undeclared dependency of `regexp_parser`
+  * `set` was removed from the stdlib and made a standalone gem as of Ruby 3
+  * this made it a hidden/undeclared dependency of `regexp_parser`
 ## [2.2.0] - 2021-12-04 - [Janosch Müller](mailto:janosch84@gmail.com)
@@ -312,8 +349,8 @@
 - Fixed missing quantifier in `Conditional::Expression` methods `#to_s`, `#to_re`
 - `Conditional::Condition` no longer lives outside the recursive `#expressions` tree
-  - it used to be the only expression stored in a custom ivar, complicating traversal
-  - its setter and getter (`#condition=`, `#condition`) still work as before
+  * it used to be the only expression stored in a custom ivar, complicating traversal
+  * its setter and getter (`#condition=`, `#condition`) still work as before
 ## [1.1.0] - 2018-09-17 - [Janosch Müller](mailto:janosch84@gmail.com)
@@ -321,8 +358,8 @@
 - Added `Quantifier` methods `#greedy?`, `#possessive?`, `#reluctant?`/`#lazy?`
 - Added `Group::Options#option_changes`
-  - shows the options enabled or disabled by the given options group
-  - as with all other expressions, `#options` shows the overall active options
+  * shows the options enabled or disabled by the given options group
+  * as with all other expressions, `#options` shows the overall active options
 - Added `Conditional#reference` and `Condition#reference`, indicating the determinative group
 - Added `Subexpression#dig`, acts like [`Array#dig`](http://ruby-doc.org/core-2.5.0/Array.html#method-i-dig)
@@ -506,7 +543,6 @@ This release includes several breaking changes, mostly to character sets, #map a
   * Fixed scanning of zero length comments (PR #12)
   * Fixed missing escape:codepoint_list syntax token (PR #14)
   * Fixed to_s for modified interval quantifiers (PR #17)
-- Added a note about MRI implementation quirks to Scanner section
 ## [0.3.2] - 2016-01-01 - [Ammar Ali](mailto:ammarabuali@gmail.com)
@@ -532,7 +568,6 @@ This release includes several breaking changes, mostly to character sets, #map a
 - Renamed Lexer's method to lex, added an alias to the old name (scan)
 - Use #map instead of #each to run the block in Lexer.lex.
 - Replaced VERSION.yml file with a constant.
-- Updated README
 - Update tokens and scanner with new additions in Unicode 7.0.
 ## [0.1.6] - 2014-10-06 - [Ammar Ali](mailto:ammarabuali@gmail.com)
@@ -542,20 +577,11 @@ This release includes several breaking changes, mostly to character sets, #map a
 - Added syntax files for missing ruby 2.x versions. These do not add
   extra syntax support, they just make the gem work with the newer
   ruby versions.
-- Added .travis.yml to project root.
-- README:
-  - Removed note purporting runtime support for ruby 1.8.6.
-  - Added a section identifying the main unsupported syntax features.
-  - Added sections for Testing and Building
-  - Added badges for gem version, Travis CI, and code climate.
-- Updated README, fixing broken examples, and converting it from a rdoc file to Github's flavor of Markdown.
 - Fixed a parser bug where an alternation sequence that contained nested expressions was incorrectly being appended to the parent expression when the nesting was exited. e.g. in /a|(b)c/, c was appended to the root.
 - Fixed a bug where character types were not being correctly scanned within character sets. e.g. in [\d], two tokens were scanned; one for the backslash '\' and one for the 'd'
 ## [0.1.5] - 2014-01-14 - [Ammar Ali](mailto:ammarabuali@gmail.com)
-- Correct ChangeLog.
 - Added syntax stubs for ruby versions 2.0 and 2.1
 - Added clone methods for deep copying expressions.
 - Added optional format argument for to_s on expressions to return the text of the expression with (:full, the default) or without (:base) its quantifier.
@@ -564,7 +590,6 @@ This release includes several breaking changes, mostly to character sets, #map a
 - Improved EOF handling in general and especially from sequences like hex and control escapes.
 - Fixed a bug where named groups with an empty name would return a blank token [].
 - Fixed a bug where member of a parent set where being added to its last subset.
-- Various code cleanups in scanner.rl
 - Fixed a few mutable string bugs by calling dup on the originals.
 - Made ruby 1.8.6 the base for all 1.8 syntax, and the 1.8 name a pointer to the latest (1.8.7 at this time)
 - Removed look-behind assertions (positive and negative) from 1.8 syntax

data/README.md CHANGED Viewed

@@ -9,8 +9,8 @@ A Ruby gem for tokenizing, parsing, and transforming regular expressions.
 * Multilayered
   * A scanner/tokenizer based on [Ragel](http://www.colm.net/open-source/ragel/)
-  * A lexer that produces a "stream" of token objects.
-  * A parser that produces a "tree" of Expression objects (OO API)
+  * A lexer that produces a "stream" of [Token objects](https://github.com/ammar/regexp_parser/wiki/Token-Objects)
+  * A parser that produces a "tree" of [Expression objects (OO API)](https://github.com/ammar/regexp_parser/wiki/Expression-Objects)
 * Runs on Ruby 2.x, 3.x and JRuby runtimes
 * Recognizes Ruby 1.8, 1.9, 2.x and 3.x regular expressions [See Supported Syntax](#supported-syntax)
@@ -36,14 +36,15 @@ Or, add it to your project's `Gemfile`:
 ```gem 'regexp_parser', '~> X.Y.Z'```
-See rubygems for the the [latest version number](https://rubygems.org/gems/regexp_parser)
+See the badge at the top of this README or [rubygems](https://rubygems.org/gems/regexp_parser)
+for the the latest version number.
 ---
 ## Usage
 The three main modules are **Scanner**, **Lexer**, and **Parser**. Each of them
-provides a single method that takes a regular expression (as a RegExp object or
+provides a single method that takes a regular expression (as a Regexp object or
 a string) and returns its results. The **Lexer** and the **Parser** accept an
 optional second argument that specifies the syntax version, like 'ruby/2.0',
 which defaults to the host Ruby version (using RUBY_VERSION).
@@ -79,7 +80,7 @@ All three methods accept either a `Regexp` or `String` (containing the pattern)
 require 'regexp_parser'
 Regexp::Parser.parse(
-  "a+ # Recognises a and A...",
+  "a+ # Recognizes a and A...",
   options: ::Regexp::EXTENDED | ::Regexp::IGNORECASE
 )
 ```
@@ -101,7 +102,7 @@ start/end offsets for each token found.
 ```ruby
 require 'regexp_parser'
-Regexp::Scanner.scan /(ab?(cd)*[e-h]+)/  do |type, token, text, ts, te|
+Regexp::Scanner.scan(/(ab?(cd)*[e-h]+)/) do |type, token, text, ts, te|
   puts "type: #{type}, token: #{token}, text: '#{text}' [#{ts}..#{te}]"
 end
@@ -124,7 +125,7 @@ A one-liner that uses map on the result of the scan to return the textual
 parts of the pattern:
 ```ruby
-Regexp::Scanner.scan( /(cat?([bhm]at)){3,5}/ ).map {|token| token[2]}
+Regexp::Scanner.scan(/(cat?([bhm]at)){3,5}/).map { |token| token[2] }
 #=> ["(", "cat", "?", "(", "[", "b", "h", "m", "]", "at", ")", ")", "{3,5}"]
 ```
@@ -220,7 +221,7 @@ syntax, and prints the token objects' text indented to their level.
 ```ruby
 require 'regexp_parser'
-Regexp::Lexer.lex /a?(b(c))*[d]+/, 'ruby/1.9' do |token|
+Regexp::Lexer.lex(/a?(b(c))*[d]+/, 'ruby/1.9') do |token|
   puts "#{'  ' * token.level}#{token.text}"
 end
@@ -246,7 +247,7 @@ how the sequence 'cat' is treated. The 't' is separated because it's followed
 by a quantifier that only applies to it.
 ```ruby
-Regexp::Lexer.scan( /(cat?([b]at)){3,5}/ ).map {|token| token.text}
+Regexp::Lexer.scan(/(cat?([b]at)){3,5}/).map { |token| token.text }
 #=> ["(", "ca", "t", "?", "(", "[", "b", "]", "at", ")", ")", "{3,5}"]
 ```
@@ -274,7 +275,7 @@ require 'regexp_parser'
 regex = /a?(b+(c)d)*(?<name>[0-9]+)/
-tree = Regexp::Parser.parse( regex, 'ruby/2.1' )
+tree = Regexp::Parser.parse(regex, 'ruby/2.1')
 tree.traverse do |event, exp|
   puts "#{event}: #{exp.type} `#{exp.to_s}`"
@@ -355,7 +356,7 @@ _Note that not all of these are available in all versions of Ruby_
 | &emsp;&emsp;_Nest Level_              | `\k<n-1>`                                               | &#x2713; |
 | &emsp;&emsp;_Numbered_                | `\k<1>`                                                 | &#x2713; |
 | &emsp;&emsp;_Relative_                | `\k<-2>`                                                | &#x2713; |
-| &emsp;&emsp;_Traditional_             | `\1` thru `\9`                                          | &#x2713; |
+| &emsp;&emsp;_Traditional_             | `\1` through `\9`                                       | &#x2713; |
 | &emsp;&nbsp;_**Capturing**_           | `(abc)`                                                 | &#x2713; |
 | &emsp;&nbsp;_**Comments**_            | `(?# comment text)`                                     | &#x2713; |
 | &emsp;&nbsp;_**Named**_               | `(?<name>abc)`, `(?'name'abc)`                          | &#x2713; |
@@ -375,7 +376,7 @@ _Note that not all of these are available in all versions of Ruby_
 | &emsp;&nbsp;_**Meta** \[2\]_          | `\M-c`, `\M-\C-C`, `\M-\cC`, `\C-\M-C`, `\c\M-C`        | &#x2713; |
 | &emsp;&nbsp;_**Octal**_               | `\0`, `\01`, `\012`                                     | &#x2713; |
 | &emsp;&nbsp;_**Unicode**_             | `\uHHHH`, `\u{H+ H+}`                                   | &#x2713; |
-| **Unicode Properties**                | _<sub>([Unicode 13.0.0](https://www.unicode.org/versions/Unicode13.0.0/))</sub>_ | &#x22f1; |
+| **Unicode Properties**                | _<sub>([Unicode 13.0.0])</sub>_                         | &#x22f1; |
 | &emsp;&nbsp;_**Age**_                 | `\p{Age=5.2}`, `\P{age=7.0}`, `\p{^age=8.0}`            | &#x2713; |
 | &emsp;&nbsp;_**Blocks**_              | `\p{InArmenian}`, `\P{InKhmer}`, `\p{^InThai}`          | &#x2713; |
 | &emsp;&nbsp;_**Classes**_             | `\p{Alpha}`, `\P{Space}`, `\p{^Alnum}`                  | &#x2713; |
@@ -384,13 +385,17 @@ _Note that not all of these are available in all versions of Ruby_
 | &emsp;&nbsp;_**Scripts**_             | `\p{Arabic}`, `\P{Hiragana}`, `\p{^Greek}`              | &#x2713; |
 | &emsp;&nbsp;_**Simple**_              | `\p{Dash}`, `\p{Extender}`, `\p{^Hyphen}`               | &#x2713; |
-**\[1\]**: Ruby does not support lazy or possessive interval quantifiers. Any `+` or `?` that follows an interval
-quantifier will be treated as another, chained quantifier. See also [#3](https://github.com/ammar/regexp_parser/issue/3),
+[Unicode 13.0.0]: https://www.unicode.org/versions/Unicode13.0.0/
+**\[1\]**: Ruby does not support lazy or possessive interval quantifiers.
+Any `+` or `?` that follows an interval quantifier will be treated as another,
+chained quantifier. See also [#3](https://github.com/ammar/regexp_parser/issue/3),
 [#69](https://github.com/ammar/regexp_parser/pull/69).
-**\[2\]**: As of Ruby 3.1, meta and control sequences are [pre-processed to hex escapes when used in Regexp literals](
- https://github.com/ruby/ruby/commit/11ae581a4a7f5d5f5ec6378872eab8f25381b1b9 ), so they will only reach the
-scanner and will only be emitted if a String or a Regexp that has been built with the `::new` constructor is scanned.
+**\[2\]**: As of Ruby 3.1, meta and control sequences are [pre-processed to hex
+escapes when used in Regexp literals](https://github.com/ruby/ruby/commit/11ae581),
+so they will only reach the scanner and will only be emitted if a String or a Regexp
+that has been built with the `::new` constructor is scanned.
 ##### Inapplicable Features
@@ -407,25 +412,27 @@ expressions library (Onigmo). They are not supported by the scanner.
 See something missing? Please submit an [issue](https://github.com/ammar/regexp_parser/issues)
-_**Note**: Attempting to process expressions with unsupported syntax features can raise an error,
-or incorrectly return tokens/objects as literals._
+_**Note**: Attempting to process expressions with unsupported syntax features can raise
+an error, or incorrectly return tokens/objects as literals._
 ## Testing
 To run the tests simply run rake from the root directory.
-The default task generates the scanner's code from the Ragel source files and runs all the specs, thus it requires Ragel to be installed.
+The default task generates the scanner's code from the Ragel source files and runs
+all the specs, thus it requires Ragel to be installed.
-Note that changes to Ragel files will not be reflected when running `rspec` on its own, so to run individual tests you might want to run:
+Note that changes to Ragel files will not be reflected when running `rspec` on its own,
+so to run individual tests you might want to run:
 ```
 rake ragel:rb && rspec spec/scanner/properties_spec.rb
 ```
 ## Building
-Building the scanner and the gem requires [Ragel](http://www.colm.net/open-source/ragel/) to be
-installed. The build tasks will automatically invoke the 'ragel:rb' task to generate the
-Ruby scanner code.
+Building the scanner and the gem requires [Ragel](http://www.colm.net/open-source/ragel/)
+to be installed. The build tasks will automatically invoke the 'ragel:rb' task to generate
+the Ruby scanner code.
 The project uses the standard rubygems package tasks, so:
@@ -445,17 +452,26 @@ rake install
 ## Example Projects
 Projects using regexp_parser.
-- [capybara](https://github.com/teamcapybara/capybara) is an integration testing tool that uses regexp_parser to convert Regexps to css/xpath selectors.
+- [capybara](https://github.com/teamcapybara/capybara) is an integration testing tool
+that uses regexp_parser to convert Regexps to css/xpath selectors.
+- [js_regex](https://github.com/jaynetics/js_regex) converts Ruby regular expressions
+to JavaScript-compatible regular expressions.
-- [js_regex](https://github.com/janosch-x/js_regex) converts Ruby regular expressions to JavaScript-compatible regular expressions.
+- [meta_re](https://github.com/ammar/meta_re) is a regular expression preprocessor
+with alias support.
-- [meta_re](https://github.com/ammar/meta_re) is a regular expression preprocessor with alias support.
+- [mutant](https://github.com/mbj/mutant) manipulates your regular expressions
+(amongst others) to see if your tests cover their behavior.
-- [mutant](https://github.com/mbj/mutant) manipulates your regular expressions (amongst others) to see if your tests cover their behavior.
+- [repper](https://github.com/jaynetics/repper) is a regular expression
+pretty-printer and formatter for Ruby.
-- [rubocop](https://github.com/rubocop-hq/rubocop) is a linter for Ruby that uses regexp_parser to lint Regexps.
+- [rubocop](https://github.com/rubocop-hq/rubocop) is a linter for Ruby that
+uses regexp_parser to lint Regexps.
-- [twitter-cldr-rb](https://github.com/twitter/twitter-cldr-rb) is a localization helper that uses regexp_parser to generate examples of postal codes.
+- [twitter-cldr-rb](https://github.com/twitter/twitter-cldr-rb) is a localization helper
+that uses regexp_parser to generate examples of postal codes.
 ## References

data/lib/regexp_parser/expression/base.rb CHANGED Viewed

@@ -14,6 +14,10 @@ module Regexp::Expression
     end
     def to_re(format = :full)
+      if set_level > 0
+        warn "Calling #to_re on character set members is deprecated - "\
+             "their behavior might not be equivalent outside of the set."
+      end
       ::Regexp.new(to_s(format))
     end
@@ -32,15 +36,19 @@ module Regexp::Expression
     end
     def repetitions
-      return 1..1 unless quantified?
-      min = quantifier.min
-      max = quantifier.max < 0 ? Float::INFINITY : quantifier.max
-      range = min..max
-      # fix Range#minmax on old Rubies - https://bugs.ruby-lang.org/issues/15807
-      if RUBY_VERSION.to_f < 2.7
-        range.define_singleton_method(:minmax) { [min, max] }
-      end
-      range
+      @repetitions ||=
+        if quantified?
+          min = quantifier.min
+          max = quantifier.max < 0 ? Float::INFINITY : quantifier.max
+          range = min..max
+          # fix Range#minmax on old Rubies - https://bugs.ruby-lang.org/issues/15807
+          if RUBY_VERSION.to_f < 2.7
+            range.define_singleton_method(:minmax) { [min, max] }
+          end
+          range
+        else
+          1..1
+        end
     end
     def greedy?

data/lib/regexp_parser/expression/classes/backreference.rb CHANGED Viewed

@@ -1,4 +1,5 @@
 module Regexp::Expression
+  # TODO: unify name with token :backref, one way or the other, in v3.0.0
   module Backreference
     class Base < Regexp::Expression::Base
       attr_accessor :referenced_expression
@@ -38,7 +39,7 @@ module Regexp::Expression
     class NameCall           < Backreference::Name; end
     class NumberCallRelative < Backreference::NumberRelative; end
-    class NumberRecursionLevel < Backreference::Number
+    class NumberRecursionLevel < Backreference::NumberRelative
       attr_reader :recursion_level
       def initialize(token, options = {})

data/lib/regexp_parser/expression/classes/{type.rb → character_type.rb} RENAMED Viewed

File without changes

data/lib/regexp_parser/expression/classes/escape_sequence.rb CHANGED Viewed

@@ -1,5 +1,5 @@
 module Regexp::Expression
-  # TODO: unify naming with Token::Escape, on way or the other, in v3.0.0
+  # TODO: unify naming with Token::Escape, one way or the other, in v3.0.0
   module EscapeSequence
     class Base < Regexp::Expression::Base
       def codepoint

data/lib/regexp_parser/expression/classes/group.rb CHANGED Viewed

@@ -33,6 +33,8 @@ module Regexp::Expression
     class Absence < Group::Base; end
     class Atomic  < Group::Base; end
+    # TODO: should split off OptionsSwitch in v3.0.0. Maybe even make it no
+    # longer inherit from Group because it is effectively a terminal expression.
     class Options < Group::Base
       attr_accessor :option_changes
@@ -40,6 +42,14 @@ module Regexp::Expression
         self.option_changes = orig.option_changes.dup
         super
       end
+      def quantify(*args)
+        if token == :options_switch
+          raise Regexp::Parser::Error, 'Can not quantify an option switch'
+        else
+          super
+        end
+      end
     end
     class Capture < Group::Base

data/lib/regexp_parser/expression/classes/keep.rb CHANGED Viewed

@@ -1,5 +1,7 @@
 module Regexp::Expression
   module Keep
+    # TOOD: in regexp_parser v3.0.0 this should possibly be a Subexpression
+    #       that contains all expressions to its left.
     class Mark < Regexp::Expression::Base; end
   end
 end

data/lib/regexp_parser/expression/classes/root.rb CHANGED Viewed

@@ -1,11 +1,9 @@
 module Regexp::Expression
   class Root < Regexp::Expression::Subexpression
     def self.build(options = {})
-      new(build_token, options)
-    end
-    def self.build_token
-      Regexp::Token.new(:expression, :root, '', 0)
+      warn "`#{self.class}.build(options)` is deprecated and will raise in "\
+           "regexp_parser v3.0.0. Please use `.construct(options: options)`."
+      construct(options: options)
     end
   end
 end

data/lib/regexp_parser/expression/classes/{property.rb → unicode_property.rb} RENAMED Viewed

@@ -1,4 +1,5 @@
 module Regexp::Expression
+  # TODO: unify name with token :property, one way or the other, in v3.0.0
   module UnicodeProperty
     class Base < Regexp::Expression::Base
       def negative?

data/lib/regexp_parser/expression/methods/construct.rb ADDED Viewed

@@ -0,0 +1,43 @@
+module Regexp::Expression
+  module Shared
+    module ClassMethods
+      # Convenience method to init a valid Expression without a Regexp::Token
+      def construct(params = {})
+        attrs = construct_defaults.merge(params)
+        options = attrs.delete(:options)
+        token_args = Regexp::TOKEN_KEYS.map { |k| attrs.delete(k) }
+        token = Regexp::Token.new(*token_args)
+        raise ArgumentError, "unsupported attribute(s): #{attrs}" if attrs.any?
+        new(token, options)
+      end
+      def construct_defaults
+        if self == Root
+          { type: :expression, token: :root, ts: 0 }
+        elsif self < Sequence
+          { type: :expression, token: :sequence }
+        else
+          { type: token_class::Type }
+        end.merge(level: 0, set_level: 0, conditional_level: 0, text: '')
+      end
+      def token_class
+        if self == Root || self < Sequence
+          nil # no token class because these objects are Parser-generated
+        # TODO: synch exp & token class names for alt., dot, escapes in v3.0.0
+        elsif self == Alternation || self == CharacterType::Any
+          Regexp::Syntax::Token::Meta
+        elsif self <= EscapeSequence::Base
+          Regexp::Syntax::Token::Escape
+        else
+          Regexp::Syntax::Token.const_get(name.split('::')[2])
+        end
+      end
+    end
+    def token_class
+      self.class.token_class
+    end
+  end
+end

data/lib/regexp_parser/expression/methods/human_name.rb ADDED Viewed

@@ -0,0 +1,43 @@
+module Regexp::Expression
+  module Shared
+    # default implementation, e.g. "atomic group", "hex escape", "word type", ..
+    def human_name
+      [token, type].compact.join(' ').tr('_', ' ')
+    end
+  end
+  Alternation.class_eval                       { def human_name; 'alternation'                 end }
+  Alternative.class_eval                       { def human_name; 'alternative'                 end }
+  Anchor::BOL.class_eval                       { def human_name; 'beginning of line'           end }
+  Anchor::BOS.class_eval                       { def human_name; 'beginning of string'         end }
+  Anchor::EOL.class_eval                       { def human_name; 'end of line'                 end }
+  Anchor::EOS.class_eval                       { def human_name; 'end of string'               end }
+  Anchor::EOSobEOL.class_eval                  { def human_name; 'newline-ready end of string' end }
+  Anchor::MatchStart.class_eval                { def human_name; 'match start'                 end }
+  Anchor::NonWordBoundary.class_eval           { def human_name; 'no word boundary'            end }
+  Anchor::WordBoundary.class_eval              { def human_name; 'word boundary'               end }
+  Assertion::Lookahead.class_eval              { def human_name; 'lookahead'                   end }
+  Assertion::Lookbehind.class_eval             { def human_name; 'lookbehind'                  end }
+  Assertion::NegativeLookahead.class_eval      { def human_name; 'negative lookahead'          end }
+  Assertion::NegativeLookbehind.class_eval     { def human_name; 'negative lookbehind'         end }
+  Backreference::Name.class_eval               { def human_name; 'backreference by name'       end }
+  Backreference::NameCall.class_eval           { def human_name; 'subexpression call by name'  end }
+  Backreference::Number.class_eval             { def human_name; 'backreference'               end }
+  Backreference::NumberRelative.class_eval     { def human_name; 'relative backreference'      end }
+  Backreference::NumberCall.class_eval         { def human_name; 'subexpression call'          end }
+  Backreference::NumberCallRelative.class_eval { def human_name; 'relative subexpression call' end }
+  CharacterSet::IntersectedSequence.class_eval { def human_name; 'intersected sequence'        end }
+  CharacterSet::Intersection.class_eval        { def human_name; 'intersection'                end }
+  CharacterSet::Range.class_eval               { def human_name; 'character range'             end }
+  CharacterType::Any.class_eval                { def human_name; 'match-all'                   end }
+  Comment.class_eval                           { def human_name; 'comment'                     end }
+  Conditional::Branch.class_eval               { def human_name; 'conditional branch'          end }
+  Conditional::Condition.class_eval            { def human_name; 'condition'                   end }
+  Conditional::Expression.class_eval           { def human_name; 'conditional'                 end }
+  Group::Capture.class_eval                    { def human_name; "capture group #{number}"     end }
+  Group::Named.class_eval                      { def human_name; 'named capture group'         end }
+  Keep::Mark.class_eval                        { def human_name; 'keep-mark lookbehind'        end }
+  Literal.class_eval                           { def human_name; 'literal'                     end }
+  Root.class_eval                              { def human_name; 'root'                        end }
+  WhiteSpace.class_eval                        { def human_name; 'free space'                  end }
+end

data/lib/regexp_parser/expression/methods/match_length.rb CHANGED Viewed

@@ -112,7 +112,7 @@ module Regexp::Expression
     end
     def inner_match_length
-      dummy = Regexp::Expression::Root.build
+      dummy = Regexp::Expression::Root.construct
       dummy.expressions = expressions.map(&:clone)
       dummy.quantifier = quantifier && quantifier.clone
       dummy.match_length

data/lib/regexp_parser/expression/quantifier.rb CHANGED Viewed

@@ -14,7 +14,7 @@ module Regexp::Expression
       deprecated_old_init(*args) and return if args.count == 4 || args.count == 5
       init_from_token_and_options(*args)
-      @mode = (token[/greedy|reluctant|possessive/] || :greedy).to_sym
+      @mode = (token.to_s[/greedy|reluctant|possessive/] || :greedy).to_sym
       @min, @max = minmax
       # TODO: remove in v3.0.0, stop removing parts of #token (?)
       self.token = token.to_s.sub(/_(greedy|possessive|reluctant)/, '').to_sym
@@ -44,10 +44,11 @@ module Regexp::Expression
     def deprecated_old_init(token, text, min, max, mode = :greedy)
       warn "Calling `Expression::Base#quantify` or `#{self.class}.new` with 4+ arguments "\
            "is deprecated.\nIt will no longer be supported in regexp_parser v3.0.0.\n"\
-           "Please pass a Regexp::Token instead, e.g. replace `type, text, min, max, mode` "\
-           "with `::Regexp::Token.new(:quantifier, type, text)`. min, max, and mode "\
-           "will be derived automatically. \nThis is consistent with how Expression::Base "\
-           "instances are created."
+           "Please pass a Regexp::Token instead, e.g. replace `token, text, min, max, mode` "\
+           "with `::Regexp::Token.new(:quantifier, token, text)`. min, max, and mode "\
+           "will be derived automatically.\n"\
+           "Or do `exp.quantifier = #{self.class}.construct(token: token, text: str)`.\n"\
+           "This is consistent with how Expression::Base instances are created. "
       @token = token
       @text  = text
       @min   = min

data/lib/regexp_parser/expression/sequence.rb CHANGED Viewed

@@ -7,31 +7,17 @@ module Regexp::Expression
   # branches, and CharacterSet::Intersection intersected sequences.
   class Sequence < Regexp::Expression::Subexpression
     class << self
-      def add_to(subexpression, params = {}, active_opts = {})
-        sequence = at_levels(
-          subexpression.level,
-          subexpression.set_level,
-          params[:conditional_level] || subexpression.conditional_level
+      def add_to(exp, params = {}, active_opts = {})
+        sequence = construct(
+          level:             exp.level,
+          set_level:         exp.set_level,
+          conditional_level: params[:conditional_level] || exp.conditional_level,
         )
-        sequence.nesting_level = subexpression.nesting_level + 1
+        sequence.nesting_level = exp.nesting_level + 1
         sequence.options = active_opts
-        subexpression.expressions << sequence
+        exp.expressions << sequence
         sequence
       end
-      def at_levels(level, set_level, conditional_level)
-        token = Regexp::Token.new(
-          :expression,
-          :sequence,
-          '',
-          nil, # ts
-          nil, # te
-          level,
-          set_level,
-          conditional_level
-        )
-        new(token)
-      end
     end
     def starts_at

data/lib/regexp_parser/expression/shared.rb CHANGED Viewed

@@ -1,12 +1,16 @@
 module Regexp::Expression
   module Shared
+    module ClassMethods; end # filled in ./methods/*.rb
     def self.included(mod)
       mod.class_eval do
+        extend Shared::ClassMethods
         attr_accessor :type, :token, :text, :ts, :te,
                       :level, :set_level, :conditional_level,
-                      :options, :quantifier
+                      :options
-        attr_reader   :nesting_level
+        attr_reader   :nesting_level, :quantifier
       end
     end
@@ -60,6 +64,10 @@ module Regexp::Expression
       !quantifier.nil?
     end
+    def optional?
+      quantified? && quantifier.min == 0
+    end
     def offset
       [starts_at, full_length]
     end
@@ -77,5 +85,10 @@ module Regexp::Expression
       quantifier && quantifier.nesting_level = lvl
       terminal? || each { |subexp| subexp.nesting_level = lvl + 1 }
     end
+    def quantifier=(qtf)
+      @quantifier = qtf
+      @repetitions = nil # clear memoized value
+    end
   end
 end

data/lib/regexp_parser/expression.rb CHANGED Viewed

@@ -13,6 +13,7 @@ require 'regexp_parser/expression/classes/backreference'
 require 'regexp_parser/expression/classes/character_set'
 require 'regexp_parser/expression/classes/character_set/intersection'
 require 'regexp_parser/expression/classes/character_set/range'
+require 'regexp_parser/expression/classes/character_type'
 require 'regexp_parser/expression/classes/conditional'
 require 'regexp_parser/expression/classes/escape_sequence'
 require 'regexp_parser/expression/classes/free_space'
@@ -20,10 +21,11 @@ require 'regexp_parser/expression/classes/group'
 require 'regexp_parser/expression/classes/keep'
 require 'regexp_parser/expression/classes/literal'
 require 'regexp_parser/expression/classes/posix_class'
-require 'regexp_parser/expression/classes/property'
 require 'regexp_parser/expression/classes/root'
-require 'regexp_parser/expression/classes/type'
+require 'regexp_parser/expression/classes/unicode_property'
+require 'regexp_parser/expression/methods/construct'
+require 'regexp_parser/expression/methods/human_name'
 require 'regexp_parser/expression/methods/match'
 require 'regexp_parser/expression/methods/match_length'
 require 'regexp_parser/expression/methods/options'

data/lib/regexp_parser/parser.rb CHANGED Viewed

@@ -23,7 +23,7 @@ class Regexp::Parser
   end
   def parse(input, syntax = "ruby/#{RUBY_VERSION}", options: nil, &block)
-    root = Root.build(extract_options(input, options))
+    root = Root.construct(options: extract_options(input, options))
     self.root = root
     self.node = root
@@ -200,11 +200,11 @@ class Regexp::Parser
   end
   def captured_group_count_at_level
-    captured_group_counts[node.level]
+    captured_group_counts[node]
   end
   def count_captured_group
-    captured_group_counts[node.level] += 1
+    captured_group_counts[node] += 1
   end
   def close_group
@@ -235,7 +235,15 @@ class Regexp::Parser
     when :number, :number_ref
       node << Backreference::Number.new(token, active_opts)
     when :number_recursion_ref
-      node << Backreference::NumberRecursionLevel.new(token, active_opts)
+      node << Backreference::NumberRecursionLevel.new(token, active_opts).tap do |exp|
+        # TODO: should split off new token number_recursion_rel_ref and new
+        # class NumberRelativeRecursionLevel in v3.0.0 to get rid of this
+        if exp.text =~ /[<'][+-]/
+          assign_effective_number(exp)
+        else
+          exp.effective_number = exp.number
+        end
+      end
     when :number_call
       node << Backreference::NumberCall.new(token, active_opts)
     when :number_rel_ref
@@ -254,6 +262,8 @@ class Regexp::Parser
   def assign_effective_number(exp)
     exp.effective_number =
       exp.number + total_captured_group_count + (exp.number < 0 ? 1 : 0)
+    exp.effective_number > 0 ||
+      raise(ParserError, "Invalid reference: #{exp.reference}")
   end
   def conditional(token)
@@ -475,17 +485,14 @@ class Regexp::Parser
     # description of the problem: https://github.com/ammar/regexp_parser/issues/3
     # rationale for this solution: https://github.com/ammar/regexp_parser/pull/69
     if target_node.quantified?
-      new_token = Regexp::Token.new(
-        :group,
-        :passive,
-        '', # text (none because this group is implicit)
-        target_node.ts,
-        nil, # te (unused)
-        target_node.level,
-        target_node.set_level,
-        target_node.conditional_level
+      new_group = Group::Passive.construct(
+        token:             :passive,
+        ts:                target_node.ts,
+        level:             target_node.level,
+        set_level:         target_node.set_level,
+        conditional_level: target_node.conditional_level,
+        options:           active_opts,
       )
-      new_group = Group::Passive.new(new_token, active_opts)
       new_group.implicit = true
       new_group << target_node
       increase_group_level(target_node)
@@ -572,15 +579,17 @@ class Regexp::Parser
   # an instance of Backreference::Number, its #referenced_expression is set to
   # the instance of Group::Capture that it refers to via its number.
   def assign_referenced_expressions
-    targets = {}
     # find all referencable expressions
+    targets = { 0 => root }
     root.each_expression do |exp|
       exp.is_a?(Group::Capture) && targets[exp.identifier] = exp
     end
     # assign them to any refering expressions
     root.each_expression do |exp|
-      exp.respond_to?(:reference) &&
-        exp.referenced_expression = targets[exp.reference]
+      next unless exp.respond_to?(:reference)
+      exp.referenced_expression = targets[exp.reference] ||
+        raise(ParserError, "Invalid reference: #{exp.reference}")
     end
   end
 end # module Regexp::Parser

data/lib/regexp_parser/syntax/token/escape.rb CHANGED Viewed

@@ -1,6 +1,6 @@
 module Regexp::Syntax
   module Token
-    # TODO: unify naming with RE::EscapeSequence, on way or the other, in v3.0.0
+    # TODO: unify naming with RE::EscapeSequence, one way or the other, in v3.0.0
     module Escape
       Basic = %i[backslash literal]

data/lib/regexp_parser/version.rb CHANGED Viewed

@@ -1,5 +1,5 @@
 class Regexp
   class Parser
-    VERSION = '2.4.0'
+    VERSION = '2.6.0'
   end
 end

metadata CHANGED Viewed

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: regexp_parser
 version: !ruby/object:Gem::Version
-  version: 2.4.0
+  version: 2.6.0
 platform: ruby
 authors:
 - Ammar Ali
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2022-05-09 00:00:00.000000000 Z
+date: 2022-09-26 00:00:00.000000000 Z
 dependencies: []
 description: A library for tokenizing, lexing, and parsing Ruby regular expressions.
 email:
@@ -32,6 +32,7 @@ files:
 - lib/regexp_parser/expression/classes/character_set.rb
 - lib/regexp_parser/expression/classes/character_set/intersection.rb
 - lib/regexp_parser/expression/classes/character_set/range.rb
+- lib/regexp_parser/expression/classes/character_type.rb
 - lib/regexp_parser/expression/classes/conditional.rb
 - lib/regexp_parser/expression/classes/escape_sequence.rb
 - lib/regexp_parser/expression/classes/free_space.rb
@@ -39,9 +40,10 @@ files:
 - lib/regexp_parser/expression/classes/keep.rb
 - lib/regexp_parser/expression/classes/literal.rb
 - lib/regexp_parser/expression/classes/posix_class.rb
-- lib/regexp_parser/expression/classes/property.rb
 - lib/regexp_parser/expression/classes/root.rb
-- lib/regexp_parser/expression/classes/type.rb
+- lib/regexp_parser/expression/classes/unicode_property.rb
+- lib/regexp_parser/expression/methods/construct.rb
+- lib/regexp_parser/expression/methods/human_name.rb
 - lib/regexp_parser/expression/methods/match.rb
 - lib/regexp_parser/expression/methods/match_length.rb
 - lib/regexp_parser/expression/methods/options.rb