RubyGems - regexp_parser - Versions diffs - 2.0.3 → 2.1.0 - Mend

regexp_parser 2.0.3 → 2.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (34) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +34 -3
data/Gemfile +5 -1
data/README.md +1 -1
data/Rakefile +6 -6
data/lib/regexp_parser.rb +1 -0
data/lib/regexp_parser/error.rb +4 -0
data/lib/regexp_parser/expression.rb +1 -1
data/lib/regexp_parser/expression/classes/backref.rb +5 -0
data/lib/regexp_parser/expression/classes/conditional.rb +11 -1
data/lib/regexp_parser/expression/classes/free_space.rb +1 -1
data/lib/regexp_parser/expression/classes/group.rb +6 -1
data/lib/regexp_parser/expression/classes/property.rb +1 -1
data/lib/regexp_parser/expression/classes/set/range.rb +2 -1
data/lib/regexp_parser/expression/quantifier.rb +1 -1
data/lib/regexp_parser/expression/sequence.rb +3 -9
data/lib/regexp_parser/expression/subexpression.rb +1 -1
data/lib/regexp_parser/parser.rb +281 -332
data/lib/regexp_parser/scanner.rb +1015 -1003
data/lib/regexp_parser/scanner/scanner.rl +53 -77
data/lib/regexp_parser/syntax.rb +6 -6
data/lib/regexp_parser/syntax/any.rb +1 -1
data/lib/regexp_parser/syntax/versions.rb +1 -1
data/lib/regexp_parser/version.rb +1 -1
data/spec/expression/clone_spec.rb +36 -4
data/spec/expression/free_space_spec.rb +2 -2
data/spec/expression/methods/match_length_spec.rb +2 -2
data/spec/lexer/refcalls_spec.rb +5 -0
data/spec/parser/all_spec.rb +2 -2
data/spec/parser/refcalls_spec.rb +5 -0
data/spec/scanner/escapes_spec.rb +1 -1
data/spec/scanner/refcalls_spec.rb +19 -0
data/spec/scanner/sets_spec.rb +42 -11
metadata +4 -3

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 42283562f90dc131bff21d7988b76867d1bd3bfc828373be9dce75c336300e1e
-  data.tar.gz: c7d0122495e338d2535ac7569f20257b70bbdce15a50a5ede6677897cfacc736
+  metadata.gz: 79c8b7838ef53335c9d0fbd21ffdf6815473ee560380a3687e8fab514d031d53
+  data.tar.gz: 2a91f7c7640fc5f2d304c2cbf240886d8e8642994861a9c092f1d4db2ae6b77a
 SHA512:
-  metadata.gz: cd29fd59a5bdad5344d19a86c39680d9d22e961c7478d702643e9b2340a0c0c8d62b61ff7fb44b404096a079267e0126c9fd92797306062a1c66711e29af1a24
-  data.tar.gz: 752b4824e5104a29de6b8582b51f39fe72dc40ddd222a58b61612b0b4cc9e5fc0311b31d2523f7e516b36991f49843b682309178934b6306d6ba094856e9d50c
+  metadata.gz: 3559a8c7af9c0087ab7a54862c9913e40a3703ffa23f62e6919eec50042523424c2aa4c99b3de9d28d03fc0edd14af37e0dcd0eab7bf822b9af73113be468b59
+  data.tar.gz: 31ed468565bd41fe2d0bd7b82d53d64e213a15e1ade2108ddf813637c228c18f6f7b456725c7e359a08754188ee19c90d06e013be90775ee6a64723b04fa25f0

data/CHANGELOG.md CHANGED Viewed

@@ -1,14 +1,45 @@
 ## [Unreleased]
+## [2.1.0] - 2021-02-22 - [Janosch Müller](mailto:janosch84@gmail.com)
+### Added
+- common ancestor for all scanning/parsing/lexing errors
+  * `Regexp::Parser::Error` can now be rescued as a catch-all
+  * the following errors (and their many descendants) now inherit from it:
+    - `Regexp::Expression::Conditional::TooManyBranches`
+    - `Regexp::Parser::ParserError`
+    - `Regexp::Scanner::ScannerError`
+    - `Regexp::Scanner::ValidationError`
+    - `Regexp::Syntax::SyntaxError`
+  * it replaces `ArgumentError` in some rare cases (`Regexp::Parser.parse('?')`)
+  * thanks to [sandstrom](https://github.com/sandstrom) for the cue
+### Fixed
+- fixed scanning of whole-pattern recursion calls `\g<0>` and `\g'0'`
+  * a regression in v2.0.1 had caused them to be scanned as literals
+- fixed scanning of some backreference and subexpression call edge cases
+  * e.g. `\k<+1>`, `\g<x-1>`
+- fixed tokenization of some escapes in character sets
+  * `.`, `|`, `{`, `}`, `(`, `)`, `^`, `$`, `?`, `+`, `*`
+  * all of these correctly emitted `#type` `:literal` and `#token` `:literal` if *not* escaped
+  * if escaped, they emitted e.g. `#type` `:escape` and `#token` `:group_open` for `[\(]`
+  * the escaped versions now correctly emit `#type` `:escape` and `#token` `:literal`
+- fixed handling of control/metacontrol escapes in character sets
+  * e.g. `[\cX]`, `[\M-\C-X]`
+  * they were misread as bunch of individual literals, escapes, and ranges
+- fixed some cases where calling `#dup`/`#clone` on expressions led to shared state
 ## [2.0.3] - 2020-12-28 - [Janosch Müller](mailto:janosch84@gmail.com)
 ### Fixed
 - fixed error when scanning some unlikely and redundant but valid charset patterns
-  - e.g. `/[[.a-b.]]/`, `/[[=e=]]/`,
+  * e.g. `/[[.a-b.]]/`, `/[[=e=]]/`,
 - fixed ancestry of some error classes related to syntax version lookup
-  - `NotImplementedError`, `InvalidVersionNameError`, `UnknownSyntaxNameError`
-  - they now correctly inherit from `Regexp::Syntax::SyntaxError` instead of Rubys `::SyntaxError`
+  * `NotImplementedError`, `InvalidVersionNameError`, `UnknownSyntaxNameError`
+  * they now correctly inherit from `Regexp::Syntax::SyntaxError` instead of Rubys `::SyntaxError`
 ## [2.0.2] - 2020-12-25 - [Janosch Müller](mailto:janosch84@gmail.com)

data/Gemfile CHANGED Viewed

@@ -6,5 +6,9 @@ group :development, :test do
   gem 'ice_nine', '~> 0.11.2'
   gem 'rake', '~> 13.0'
   gem 'regexp_property_values', '~> 1.0'
-  gem 'rspec', '~> 3.8'
+  gem 'rspec', '~> 3.10'
+  if RUBY_VERSION.to_f >= 2.7
+    gem 'gouteur'
+    gem 'rubocop', '~> 1.7'
+  end
 end

data/README.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # Regexp::Parser
-[![Gem Version](https://badge.fury.io/rb/regexp_parser.svg)](http://badge.fury.io/rb/regexp_parser) [![Build Status](https://github.com/ammar/regexp_parser/workflows/tests/badge.svg)](https://github.com/ammar/regexp_parser/actions) [![Code Climate](https://codeclimate.com/github/ammar/regexp_parser.svg)](https://codeclimate.com/github/ammar/regexp_parser/badges)
+[![Gem Version](https://badge.fury.io/rb/regexp_parser.svg)](http://badge.fury.io/rb/regexp_parser) [![Build Status](https://github.com/ammar/regexp_parser/workflows/tests/badge.svg)](https://github.com/ammar/regexp_parser/actions) [![Build Status](https://github.com/ammar/regexp_parser/workflows/gouteur/badge.svg)](https://github.com/ammar/regexp_parser/actions) [![Code Climate](https://codeclimate.com/github/ammar/regexp_parser.svg)](https://codeclimate.com/github/ammar/regexp_parser/badges)
 A Ruby gem for tokenizing, parsing, and transforming regular expressions.

data/Rakefile CHANGED Viewed

@@ -7,8 +7,8 @@ require 'bundler'
 require 'rubygems/package_task'
-RAGEL_SOURCE_DIR = File.expand_path '../lib/regexp_parser/scanner', __FILE__
-RAGEL_OUTPUT_DIR = File.expand_path '../lib/regexp_parser', __FILE__
+RAGEL_SOURCE_DIR = File.join(__dir__, 'lib/regexp_parser/scanner')
+RAGEL_OUTPUT_DIR = File.join(__dir__, 'lib/regexp_parser')
 RAGEL_SOURCE_FILES = %w{scanner} # scanner.rl includes property.rl
@@ -26,10 +26,10 @@ end
 namespace :ragel do
   desc "Process the ragel source files and output ruby code"
   task :rb do
-    RAGEL_SOURCE_FILES.each do |file|
-      output_file = "#{RAGEL_OUTPUT_DIR}/#{file}.rb"
+    RAGEL_SOURCE_FILES.each do |source_file|
+      output_file = "#{RAGEL_OUTPUT_DIR}/#{source_file}.rb"
       # using faster flat table driven FSM, about 25% larger code, but about 30% faster
-      sh "ragel -F1 -R #{RAGEL_SOURCE_DIR}/#{file}.rl -o #{output_file}"
+      sh "ragel -F1 -R #{RAGEL_SOURCE_DIR}/#{source_file}.rl -o #{output_file}"
       contents = File.read(output_file)
@@ -61,7 +61,7 @@ namespace :props do
   task :update do
     require 'regexp_property_values'
     RegexpPropertyValues.update
-    dir = File.expand_path('../lib/regexp_parser/scanner/properties', __FILE__)
+    dir = File.join(__dir__, 'lib/regexp_parser/scanner/properties')
     require 'psych'
     write_hash_to_file = ->(hash, path) do

data/lib/regexp_parser.rb CHANGED Viewed

@@ -1,6 +1,7 @@
 # encoding: utf-8
 require 'regexp_parser/version'
+require 'regexp_parser/error'
 require 'regexp_parser/token'
 require 'regexp_parser/scanner'
 require 'regexp_parser/syntax'

data/lib/regexp_parser/error.rb ADDED Viewed

@@ -0,0 +1,4 @@
+class Regexp::Parser
+  # base class for all gem-specific errors (inherited but never raised itself)
+  class Error < StandardError; end
+end

data/lib/regexp_parser/expression.rb CHANGED Viewed

@@ -21,7 +21,7 @@ module Regexp::Expression
       self.options           = options
     end
-    def initialize_clone(orig)
+    def initialize_copy(orig)
       self.text       = (orig.text       ? orig.text.dup         : nil)
       self.options    = (orig.options    ? orig.options.dup      : nil)
       self.quantifier = (orig.quantifier ? orig.quantifier.clone : nil)

data/lib/regexp_parser/expression/classes/backref.rb CHANGED Viewed

@@ -2,6 +2,11 @@ module Regexp::Expression
   module Backreference
     class Base < Regexp::Expression::Base
       attr_accessor :referenced_expression
+      def initialize_copy(orig)
+        self.referenced_expression = orig.referenced_expression.dup
+        super
+      end
     end
     class Number < Backreference::Base

data/lib/regexp_parser/expression/classes/conditional.rb CHANGED Viewed

@@ -1,6 +1,6 @@
 module Regexp::Expression
   module Conditional
-    class TooManyBranches < StandardError
+    class TooManyBranches < Regexp::Parser::Error
       def initialize
         super('The conditional expression has more than 2 branches')
       end
@@ -15,6 +15,11 @@ module Regexp::Expression
         ref = text.tr("'<>()", "")
         ref =~ /\D/ ? ref : Integer(ref)
       end
+      def initialize_copy(orig)
+        self.referenced_expression = orig.referenced_expression.dup
+        super
+      end
     end
     class Branch < Regexp::Expression::Sequence; end
@@ -53,6 +58,11 @@ module Regexp::Expression
       def to_s(format = :full)
         "#{text}#{condition}#{branches.join('|')})#{quantifier_affix(format)}"
       end
+      def initialize_copy(orig)
+        self.referenced_expression = orig.referenced_expression.dup
+        super
+      end
     end
   end
 end

data/lib/regexp_parser/expression/classes/free_space.rb CHANGED Viewed

@@ -2,7 +2,7 @@ module Regexp::Expression
   class FreeSpace < Regexp::Expression::Base
     def quantify(_token, _text, _min = nil, _max = nil, _mode = :greedy)
-      raise "Can not quantify a free space object"
+      raise Regexp::Parser::Error, 'Can not quantify a free space object'
     end
   end

data/lib/regexp_parser/expression/classes/group.rb CHANGED Viewed

@@ -35,6 +35,11 @@ module Regexp::Expression
     class Atomic  < Group::Base; end
     class Options < Group::Base
       attr_accessor :option_changes
+      def initialize_copy(orig)
+        self.option_changes = orig.option_changes.dup
+        super
+      end
     end
     class Capture < Group::Base
@@ -53,7 +58,7 @@ module Regexp::Expression
         super
       end
-      def initialize_clone(orig)
+      def initialize_copy(orig)
         @name = orig.name.dup
         super
       end

data/lib/regexp_parser/expression/classes/property.rb CHANGED Viewed

@@ -7,7 +7,7 @@ module Regexp::Expression
       end
       def name
-        text =~ /\A\\[pP]\{([^}]+)\}\z/; $1
+        text[/\A\\[pP]\{([^}]+)\}\z/, 1]
       end
       def shortcut

data/lib/regexp_parser/expression/classes/set/range.rb CHANGED Viewed

@@ -7,7 +7,8 @@ module Regexp::Expression
       alias :ts :starts_at
       def <<(exp)
-        complete? && raise("Can't add more than 2 expressions to a Range")
+        complete? and raise Regexp::Parser::Error,
+          "Can't add more than 2 expressions to a Range"
         super
       end

data/lib/regexp_parser/expression/quantifier.rb CHANGED Viewed

@@ -12,7 +12,7 @@ module Regexp::Expression
       @max   = max
     end
-    def initialize_clone(orig)
+    def initialize_copy(orig)
       @text = orig.text.dup
       super
     end

data/lib/regexp_parser/expression/sequence.rb CHANGED Viewed

@@ -41,17 +41,11 @@ module Regexp::Expression
     alias :ts :starts_at
     def quantify(token, text, min = nil, max = nil, mode = :greedy)
-      offset = -1
-      target = expressions[offset]
-      while target.is_a?(FreeSpace)
-        target = expressions[offset -= 1]
-      end
-      target || raise(ArgumentError, "No valid target found for '#{text}' "\
-                                     'quantifier')
+      target = expressions.reverse.find { |exp| !exp.is_a?(FreeSpace) }
+      target or raise Regexp::Parser::Error,
+        "No valid target found for '#{text}' quantifier"
       target.quantify(token, text, min, max, mode)
     end
   end
 end

data/lib/regexp_parser/expression/subexpression.rb CHANGED Viewed

@@ -12,7 +12,7 @@ module Regexp::Expression
     end
     # Override base method to clone the expressions as well.
-    def initialize_clone(orig)
+    def initialize_copy(orig)
       self.expressions = orig.expressions.map(&:clone)
       super
     end

data/lib/regexp_parser/parser.rb CHANGED Viewed

@@ -2,9 +2,8 @@ require 'regexp_parser/expression'
 class Regexp::Parser
   include Regexp::Expression
-  include Regexp::Syntax
-  class ParserError < StandardError; end
+  class ParserError < Regexp::Parser::Error; end
   class UnknownTokenTypeError < ParserError
     def initialize(type, token)
@@ -70,93 +69,155 @@ class Regexp::Parser
     enabled_options
   end
-  def nest(exp)
-    nesting.push(exp)
-    node << exp
-    update_transplanted_subtree(exp, node)
-    self.node = exp
-  end
+  def parse_token(token)
+    case token.type
+    when :anchor;                     anchor(token)
+    when :assertion, :group;          group(token)
+    when :backref;                    backref(token)
+    when :conditional;                conditional(token)
+    when :escape;                     escape(token)
+    when :free_space;                 free_space(token)
+    when :keep;                       keep(token)
+    when :literal;                    literal(token)
+    when :meta;                       meta(token)
+    when :posixclass, :nonposixclass; posixclass(token)
+    when :property, :nonproperty;     property(token)
+    when :quantifier;                 quantifier(token)
+    when :set;                        set(token)
+    when :type;                       type(token)
+    else
+      raise UnknownTokenTypeError.new(token.type, token)
+    end
-  # subtrees are transplanted to build Alternations, Intersections, Ranges
-  def update_transplanted_subtree(exp, new_parent)
-    exp.nesting_level = new_parent.nesting_level + 1
-    exp.respond_to?(:each) &&
-      exp.each { |subexp| update_transplanted_subtree(subexp, exp) }
+    close_completed_character_set_range
   end
-  def decrease_nesting
-    while nesting.last.is_a?(SequenceOperation)
-      nesting.pop
-      self.node = nesting.last
+  def anchor(token)
+    case token.token
+    when :bol;              node << Anchor::BeginningOfLine.new(token, active_opts)
+    when :bos;              node << Anchor::BOS.new(token, active_opts)
+    when :eol;              node << Anchor::EndOfLine.new(token, active_opts)
+    when :eos;              node << Anchor::EOS.new(token, active_opts)
+    when :eos_ob_eol;       node << Anchor::EOSobEOL.new(token, active_opts)
+    when :match_start;      node << Anchor::MatchStart.new(token, active_opts)
+    when :nonword_boundary; node << Anchor::NonWordBoundary.new(token, active_opts)
+    when :word_boundary;    node << Anchor::WordBoundary.new(token, active_opts)
+    else
+      raise UnknownTokenError.new('Anchor', token)
     end
-    nesting.pop
-    yield(node) if block_given?
-    self.node = nesting.last
-    self.node = node.last if node.last.is_a?(SequenceOperation)
   end
-  def nest_conditional(exp)
-    conditional_nesting.push(exp)
-    nest(exp)
+  def group(token)
+    case token.token
+    when :options, :options_switch
+      options_group(token)
+    when :close
+      close_group
+    when :comment
+      node << Group::Comment.new(token, active_opts)
+    else
+      open_group(token)
+    end
   end
-  def parse_token(token)
-    close_completed_character_set_range
+  MOD_FLAGS = %w[i m x].map(&:to_sym)
+  ENC_FLAGS = %w[a d u].map(&:to_sym)
-    case token.type
-    when :meta;         meta(token)
-    when :quantifier;   quantifier(token)
-    when :anchor;       anchor(token)
-    when :escape;       escape(token)
-    when :group;        group(token)
-    when :assertion;    group(token)
-    when :set;          set(token)
-    when :type;         type(token)
-    when :backref;      backref(token)
-    when :conditional;  conditional(token)
-    when :keep;         keep(token)
-    when :posixclass, :nonposixclass
-      posixclass(token)
-    when :property, :nonproperty
-      property(token)
-    when :literal
-      node << Literal.new(token, active_opts)
-    when :free_space
-      free_space(token)
+  def options_group(token)
+    positive, negative = token.text.split('-', 2)
+    negative ||= ''
+    self.switching_options = token.token.equal?(:options_switch)
-    else
-      raise UnknownTokenTypeError.new(token.type, token)
+    opt_changes = {}
+    new_active_opts = active_opts.dup
+    MOD_FLAGS.each do |flag|
+      if positive.include?(flag.to_s)
+        opt_changes[flag] = new_active_opts[flag] = true
+      end
+      if negative.include?(flag.to_s)
+        opt_changes[flag] = false
+        new_active_opts.delete(flag)
+      end
+    end
+    if (enc_flag = positive.reverse[/[adu]/])
+      enc_flag = enc_flag.to_sym
+      (ENC_FLAGS - [enc_flag]).each do |other|
+        opt_changes[other] = false if new_active_opts[other]
+        new_active_opts.delete(other)
+      end
+      opt_changes[enc_flag] = new_active_opts[enc_flag] = true
     end
+    options_stack << new_active_opts
+    options_group = Group::Options.new(token, active_opts)
+    options_group.option_changes = opt_changes
+    nest(options_group)
   end
-  def set(token)
-    case token.token
-    when :open
-      open_set(token)
-    when :close
-      close_set
-    when :negate
-      negate_set
-    when :range
-      range(token)
-    when :intersection
-      intersection(token)
-    else
-      raise UnknownTokenError.new('CharacterSet', token)
+  def open_group(token)
+    group_class =
+      case token.token
+      when :absence;     Group::Absence
+      when :atomic;      Group::Atomic
+      when :capture;     Group::Capture
+      when :named;       Group::Named
+      when :passive;     Group::Passive
+      when :lookahead;   Assertion::Lookahead
+      when :lookbehind;  Assertion::Lookbehind
+      when :nlookahead;  Assertion::NegativeLookahead
+      when :nlookbehind; Assertion::NegativeLookbehind
+      else
+        raise UnknownTokenError.new('Group type open', token)
+      end
+    group = group_class.new(token, active_opts)
+    if group.capturing?
+      group.number          = total_captured_group_count + 1
+      group.number_at_level = captured_group_count_at_level + 1
+      count_captured_group
     end
+    # Push the active options to the stack again. This way we can simply pop the
+    # stack for any group we close, no matter if it had its own options or not.
+    options_stack << active_opts
+    nest(group)
   end
-  def meta(token)
-    case token.token
-    when :dot
-      node << CharacterType::Any.new(token, active_opts)
-    when :alternation
-      sequence_operation(Alternation, token)
-    else
-      raise UnknownTokenError.new('Meta', token)
+  def total_captured_group_count
+    captured_group_counts.values.reduce(0, :+)
+  end
+  def captured_group_count_at_level
+    captured_group_counts[node.level]
+  end
+  def count_captured_group
+    captured_group_counts[node.level] += 1
+  end
+  def close_group
+    options_stack.pop unless switching_options
+    self.switching_options = false
+    decrease_nesting
+  end
+  def decrease_nesting
+    while nesting.last.is_a?(SequenceOperation)
+      nesting.pop
+      self.node = nesting.last
     end
+    nesting.pop
+    yield(node) if block_given?
+    self.node = nesting.last
+    self.node = node.last if node.last.is_a?(SequenceOperation)
   end
   def backref(token)
@@ -186,31 +247,9 @@ class Regexp::Parser
     end
   end
-  def type(token)
-    case token.token
-    when :digit
-      node << CharacterType::Digit.new(token, active_opts)
-    when :nondigit
-      node << CharacterType::NonDigit.new(token, active_opts)
-    when :hex
-      node << CharacterType::Hex.new(token, active_opts)
-    when :nonhex
-      node << CharacterType::NonHex.new(token, active_opts)
-    when :space
-      node << CharacterType::Space.new(token, active_opts)
-    when :nonspace
-      node << CharacterType::NonSpace.new(token, active_opts)
-    when :word
-      node << CharacterType::Word.new(token, active_opts)
-    when :nonword
-      node << CharacterType::NonWord.new(token, active_opts)
-    when :linebreak
-      node << CharacterType::Linebreak.new(token, active_opts)
-    when :xgrapheme
-      node << CharacterType::ExtendedGrapheme.new(token, active_opts)
-    else
-      raise UnknownTokenError.new('CharacterType', token)
-    end
+  def assign_effective_number(exp)
+    exp.effective_number =
+      exp.number + total_captured_group_count + (exp.number < 0 ? 1 : 0)
   end
   def conditional(token)
@@ -238,11 +277,118 @@ class Regexp::Parser
     end
   end
+  def nest_conditional(exp)
+    conditional_nesting.push(exp)
+    nest(exp)
+  end
+  def nest(exp)
+    nesting.push(exp)
+    node << exp
+    update_transplanted_subtree(exp, node)
+    self.node = exp
+  end
+  # subtrees are transplanted to build Alternations, Intersections, Ranges
+  def update_transplanted_subtree(exp, new_parent)
+    exp.nesting_level = new_parent.nesting_level + 1
+    exp.respond_to?(:each) &&
+      exp.each { |subexp| update_transplanted_subtree(subexp, exp) }
+  end
+  def escape(token)
+    case token.token
+    when :backspace;      node << EscapeSequence::Backspace.new(token, active_opts)
+    when :escape;         node << EscapeSequence::AsciiEscape.new(token, active_opts)
+    when :bell;           node << EscapeSequence::Bell.new(token, active_opts)
+    when :form_feed;      node << EscapeSequence::FormFeed.new(token, active_opts)
+    when :newline;        node << EscapeSequence::Newline.new(token, active_opts)
+    when :carriage;       node << EscapeSequence::Return.new(token, active_opts)
+    when :tab;            node << EscapeSequence::Tab.new(token, active_opts)
+    when :vertical_tab;   node << EscapeSequence::VerticalTab.new(token, active_opts)
+    when :codepoint;      node << EscapeSequence::Codepoint.new(token, active_opts)
+    when :codepoint_list; node << EscapeSequence::CodepointList.new(token, active_opts)
+    when :hex;            node << EscapeSequence::Hex.new(token, active_opts)
+    when :octal;          node << EscapeSequence::Octal.new(token, active_opts)
+    when :control
+      if token.text =~ /\A(?:\\C-\\M|\\c\\M)/
+        node << EscapeSequence::MetaControl.new(token, active_opts)
+      else
+        node << EscapeSequence::Control.new(token, active_opts)
+      end
+    when :meta_sequence
+      if token.text =~ /\A\\M-\\[Cc]/
+        node << EscapeSequence::MetaControl.new(token, active_opts)
+      else
+        node << EscapeSequence::Meta.new(token, active_opts)
+      end
+    else
+      # treating everything else as a literal
+      # TODO: maybe split this up a bit more in v3.0.0?
+      # E.g. escaped quantifiers or set meta chars are not the same
+      # as stuff that would be a literal even without the backslash.
+      # Right now, they all end up here.
+      node << EscapeSequence::Literal.new(token, active_opts)
+    end
+  end
+  def free_space(token)
+    case token.token
+    when :comment
+      node << Comment.new(token, active_opts)
+    when :whitespace
+      if node.last.is_a?(WhiteSpace)
+        node.last.merge(WhiteSpace.new(token, active_opts))
+      else
+        node << WhiteSpace.new(token, active_opts)
+      end
+    else
+      raise UnknownTokenError.new('FreeSpace', token)
+    end
+  end
+  def keep(token)
+    node << Keep::Mark.new(token, active_opts)
+  end
+  def literal(token)
+    node << Literal.new(token, active_opts)
+  end
+  def meta(token)
+    case token.token
+    when :dot
+      node << CharacterType::Any.new(token, active_opts)
+    when :alternation
+      sequence_operation(Alternation, token)
+    else
+      raise UnknownTokenError.new('Meta', token)
+    end
+  end
+  def sequence_operation(klass, token)
+    unless node.is_a?(klass)
+      operator = klass.new(token, active_opts)
+      sequence = operator.add_sequence(active_opts)
+      sequence.expressions = node.expressions
+      node.expressions = []
+      nest(operator)
+    end
+    node.add_sequence(active_opts)
+  end
   def posixclass(token)
     node << PosixClass.new(token, active_opts)
   end
   include Regexp::Expression::UnicodeProperty
+  UPTokens = Regexp::Syntax::Token::UnicodeProperty
   def property(token)
     case token.token
@@ -314,127 +460,20 @@ class Regexp::Parser
     when :private_use;            node << Codepoint::PrivateUse.new(token, active_opts)
     when :unassigned;             node << Codepoint::Unassigned.new(token, active_opts)
-    when *Token::UnicodeProperty::Age
-      node << Age.new(token, active_opts)
-    when *Token::UnicodeProperty::Derived
-      node << Derived.new(token, active_opts)
-    when *Token::UnicodeProperty::Emoji
-      node << Emoji.new(token, active_opts)
-    when *Token::UnicodeProperty::Script
-      node << Script.new(token, active_opts)
-    when *Token::UnicodeProperty::UnicodeBlock
-      node << Block.new(token, active_opts)
+    when *UPTokens::Age;          node << Age.new(token, active_opts)
+    when *UPTokens::Derived;      node << Derived.new(token, active_opts)
+    when *UPTokens::Emoji;        node << Emoji.new(token, active_opts)
+    when *UPTokens::Script;       node << Script.new(token, active_opts)
+    when *UPTokens::UnicodeBlock; node << Block.new(token, active_opts)
     else
       raise UnknownTokenError.new('UnicodeProperty', token)
     end
   end
-  def anchor(token)
-    case token.token
-    when :bol
-      node << Anchor::BeginningOfLine.new(token, active_opts)
-    when :eol
-      node << Anchor::EndOfLine.new(token, active_opts)
-    when :bos
-      node << Anchor::BOS.new(token, active_opts)
-    when :eos
-      node << Anchor::EOS.new(token, active_opts)
-    when :eos_ob_eol
-      node << Anchor::EOSobEOL.new(token, active_opts)
-    when :word_boundary
-      node << Anchor::WordBoundary.new(token, active_opts)
-    when :nonword_boundary
-      node << Anchor::NonWordBoundary.new(token, active_opts)
-    when :match_start
-      node << Anchor::MatchStart.new(token, active_opts)
-    else
-      raise UnknownTokenError.new('Anchor', token)
-    end
-  end
-  def escape(token)
-    case token.token
-    when :backspace
-      node << EscapeSequence::Backspace.new(token, active_opts)
-    when :escape
-      node << EscapeSequence::AsciiEscape.new(token, active_opts)
-    when :bell
-      node << EscapeSequence::Bell.new(token, active_opts)
-    when :form_feed
-      node << EscapeSequence::FormFeed.new(token, active_opts)
-    when :newline
-      node << EscapeSequence::Newline.new(token, active_opts)
-    when :carriage
-      node << EscapeSequence::Return.new(token, active_opts)
-    when :tab
-      node << EscapeSequence::Tab.new(token, active_opts)
-    when :vertical_tab
-      node << EscapeSequence::VerticalTab.new(token, active_opts)
-    when :hex
-      node << EscapeSequence::Hex.new(token, active_opts)
-    when :octal
-      node << EscapeSequence::Octal.new(token, active_opts)
-    when :codepoint
-      node << EscapeSequence::Codepoint.new(token, active_opts)
-    when :codepoint_list
-      node << EscapeSequence::CodepointList.new(token, active_opts)
-    when :control
-      if token.text =~ /\A(?:\\C-\\M|\\c\\M)/
-        node << EscapeSequence::MetaControl.new(token, active_opts)
-      else
-        node << EscapeSequence::Control.new(token, active_opts)
-      end
-    when :meta_sequence
-      if token.text =~ /\A\\M-\\[Cc]/
-        node << EscapeSequence::MetaControl.new(token, active_opts)
-      else
-        node << EscapeSequence::Meta.new(token, active_opts)
-      end
-    else
-      # treating everything else as a literal
-      node << EscapeSequence::Literal.new(token, active_opts)
-    end
-  end
-  def keep(token)
-    node << Keep::Mark.new(token, active_opts)
-  end
-  def free_space(token)
-    case token.token
-    when :comment
-      node << Comment.new(token, active_opts)
-    when :whitespace
-      if node.last.is_a?(WhiteSpace)
-        node.last.merge(WhiteSpace.new(token, active_opts))
-      else
-        node << WhiteSpace.new(token, active_opts)
-      end
-    else
-      raise UnknownTokenError.new('FreeSpace', token)
-    end
-  end
   def quantifier(token)
-    offset = -1
-    target_node = node.expressions[offset]
-    while target_node.is_a?(FreeSpace)
-      target_node = node.expressions[offset -= 1]
-    end
-    target_node || raise(ArgumentError, 'No valid target found for '\
-                                        "'#{token.text}' ")
+    target_node = node.expressions.reverse.find { |exp| !exp.is_a?(FreeSpace) }
+    target_node or raise ParserError, "No valid target found for '#{token.text}'"
     # in case of chained quantifiers, wrap target in an implicit passive group
     # description of the problem: https://github.com/ammar/regexp_parser/issues/3
@@ -454,7 +493,7 @@ class Regexp::Parser
       new_group.implicit = true
       new_group << target_node
       increase_level(target_node)
-      node.expressions[offset] = new_group
+      node.expressions[node.expressions.index(target_node)] = new_group
       target_node = new_group
     end
@@ -515,100 +554,16 @@ class Regexp::Parser
     target_node.quantify(:interval, text, min.to_i, max.to_i, mode)
   end
-  def group(token)
-    case token.token
-    when :options, :options_switch
-      options_group(token)
-    when :close
-      close_group
-    when :comment
-      node << Group::Comment.new(token, active_opts)
-    else
-      open_group(token)
-    end
-  end
-  MOD_FLAGS = %w[i m x].map(&:to_sym)
-  ENC_FLAGS = %w[a d u].map(&:to_sym)
-  def options_group(token)
-    positive, negative = token.text.split('-', 2)
-    negative ||= ''
-    self.switching_options = token.token.equal?(:options_switch)
-    opt_changes = {}
-    new_active_opts = active_opts.dup
-    MOD_FLAGS.each do |flag|
-      if positive.include?(flag.to_s)
-        opt_changes[flag] = new_active_opts[flag] = true
-      end
-      if negative.include?(flag.to_s)
-        opt_changes[flag] = false
-        new_active_opts.delete(flag)
-      end
-    end
-    if (enc_flag = positive.reverse[/[adu]/])
-      enc_flag = enc_flag.to_sym
-      (ENC_FLAGS - [enc_flag]).each do |other|
-        opt_changes[other] = false if new_active_opts[other]
-        new_active_opts.delete(other)
-      end
-      opt_changes[enc_flag] = new_active_opts[enc_flag] = true
-    end
-    options_stack << new_active_opts
-    options_group = Group::Options.new(token, active_opts)
-    options_group.option_changes = opt_changes
-    nest(options_group)
-  end
-  def open_group(token)
+  def set(token)
     case token.token
-    when :passive
-      exp = Group::Passive.new(token, active_opts)
-    when :atomic
-      exp = Group::Atomic.new(token, active_opts)
-    when :named
-      exp = Group::Named.new(token, active_opts)
-    when :capture
-      exp = Group::Capture.new(token, active_opts)
-    when :absence
-      exp = Group::Absence.new(token, active_opts)
-    when :lookahead
-      exp = Assertion::Lookahead.new(token, active_opts)
-    when :nlookahead
-      exp = Assertion::NegativeLookahead.new(token, active_opts)
-    when :lookbehind
-      exp = Assertion::Lookbehind.new(token, active_opts)
-    when :nlookbehind
-      exp = Assertion::NegativeLookbehind.new(token, active_opts)
+    when :open;         open_set(token)
+    when :close;        close_set
+    when :negate;       negate_set
+    when :range;        range(token)
+    when :intersection; intersection(token)
     else
-      raise UnknownTokenError.new('Group type open', token)
-    end
-    if exp.capturing?
-      exp.number          = total_captured_group_count + 1
-      exp.number_at_level = captured_group_count_at_level + 1
-      count_captured_group
+      raise UnknownTokenError.new('CharacterSet', token)
     end
-    # Push the active options to the stack again. This way we can simply pop the
-    # stack for any group we close, no matter if it had its own options or not.
-    options_stack << active_opts
-    nest(exp)
-  end
-  def close_group
-    options_stack.pop unless switching_options
-    self.switching_options = false
-    decrease_nesting
   end
   def open_set(token)
@@ -631,51 +586,45 @@ class Regexp::Parser
     nest(exp)
   end
-  def close_completed_character_set_range
-    decrease_nesting if node.is_a?(CharacterSet::Range) && node.complete?
-  end
   def intersection(token)
     sequence_operation(CharacterSet::Intersection, token)
   end
-  def sequence_operation(klass, token)
-    unless node.is_a?(klass)
-      operator = klass.new(token, active_opts)
-      sequence = operator.add_sequence(active_opts)
-      sequence.expressions = node.expressions
-      node.expressions = []
-      nest(operator)
+  def type(token)
+    case token.token
+    when :digit;     node << CharacterType::Digit.new(token, active_opts)
+    when :hex;       node << CharacterType::Hex.new(token, active_opts)
+    when :linebreak; node << CharacterType::Linebreak.new(token, active_opts)
+    when :nondigit;  node << CharacterType::NonDigit.new(token, active_opts)
+    when :nonhex;    node << CharacterType::NonHex.new(token, active_opts)
+    when :nonspace;  node << CharacterType::NonSpace.new(token, active_opts)
+    when :nonword;   node << CharacterType::NonWord.new(token, active_opts)
+    when :space;     node << CharacterType::Space.new(token, active_opts)
+    when :word;      node << CharacterType::Word.new(token, active_opts)
+    when :xgrapheme; node << CharacterType::ExtendedGrapheme.new(token, active_opts)
+    else
+      raise UnknownTokenError.new('CharacterType', token)
     end
-    node.add_sequence(active_opts)
-  end
-  def active_opts
-    options_stack.last
-  end
-  def total_captured_group_count
-    captured_group_counts.values.reduce(0, :+)
-  end
-  def captured_group_count_at_level
-    captured_group_counts[node.level]
   end
-  def count_captured_group
-    captured_group_counts[node.level] += 1
+  def close_completed_character_set_range
+    decrease_nesting if node.is_a?(CharacterSet::Range) && node.complete?
   end
-  def assign_effective_number(exp)
-    exp.effective_number =
-      exp.number + total_captured_group_count + (exp.number < 0 ? 1 : 0)
+  def active_opts
+    options_stack.last
   end
+  # Assigns referenced expressions to refering expressions, e.g. if there is
+  # an instance of Backreference::Number, its #referenced_expression is set to
+  # the instance of Group::Capture that it refers to via its number.
   def assign_referenced_expressions
     targets = {}
+    # find all referencable expressions
     root.each_expression do |exp|
       exp.is_a?(Group::Capture) && targets[exp.identifier] = exp
     end
+    # assign them to any refering expressions
     root.each_expression do |exp|
       exp.respond_to?(:reference) &&
         exp.referenced_expression = targets[exp.reference]