RubyGems - regexp_parser - Versions diffs - 2.0.3 → 2.1.0 - Mend

regexp_parser 2.0.3 → 2.1.0

Files changed (34) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +34 -3
data/Gemfile +5 -1
data/README.md +1 -1
data/Rakefile +6 -6
data/lib/regexp_parser.rb +1 -0
data/lib/regexp_parser/error.rb +4 -0
data/lib/regexp_parser/expression.rb +1 -1
data/lib/regexp_parser/expression/classes/backref.rb +5 -0
data/lib/regexp_parser/expression/classes/conditional.rb +11 -1
data/lib/regexp_parser/expression/classes/free_space.rb +1 -1
data/lib/regexp_parser/expression/classes/group.rb +6 -1
data/lib/regexp_parser/expression/classes/property.rb +1 -1
data/lib/regexp_parser/expression/classes/set/range.rb +2 -1
data/lib/regexp_parser/expression/quantifier.rb +1 -1
data/lib/regexp_parser/expression/sequence.rb +3 -9
data/lib/regexp_parser/expression/subexpression.rb +1 -1
data/lib/regexp_parser/parser.rb +281 -332
data/lib/regexp_parser/scanner.rb +1015 -1003
data/lib/regexp_parser/scanner/scanner.rl +53 -77
data/lib/regexp_parser/syntax.rb +6 -6
data/lib/regexp_parser/syntax/any.rb +1 -1
data/lib/regexp_parser/syntax/versions.rb +1 -1
data/lib/regexp_parser/version.rb +1 -1
data/spec/expression/clone_spec.rb +36 -4
data/spec/expression/free_space_spec.rb +2 -2
data/spec/expression/methods/match_length_spec.rb +2 -2
data/spec/lexer/refcalls_spec.rb +5 -0
data/spec/parser/all_spec.rb +2 -2
data/spec/parser/refcalls_spec.rb +5 -0
data/spec/scanner/escapes_spec.rb +1 -1
data/spec/scanner/refcalls_spec.rb +19 -0
data/spec/scanner/sets_spec.rb +42 -11
metadata +4 -3

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 42283562f90dc131bff21d7988b76867d1bd3bfc828373be9dce75c336300e1e
-  data.tar.gz: c7d0122495e338d2535ac7569f20257b70bbdce15a50a5ede6677897cfacc736
+  metadata.gz: 79c8b7838ef53335c9d0fbd21ffdf6815473ee560380a3687e8fab514d031d53
+  data.tar.gz: 2a91f7c7640fc5f2d304c2cbf240886d8e8642994861a9c092f1d4db2ae6b77a
 SHA512:
-  metadata.gz: cd29fd59a5bdad5344d19a86c39680d9d22e961c7478d702643e9b2340a0c0c8d62b61ff7fb44b404096a079267e0126c9fd92797306062a1c66711e29af1a24
-  data.tar.gz: 752b4824e5104a29de6b8582b51f39fe72dc40ddd222a58b61612b0b4cc9e5fc0311b31d2523f7e516b36991f49843b682309178934b6306d6ba094856e9d50c
+  metadata.gz: 3559a8c7af9c0087ab7a54862c9913e40a3703ffa23f62e6919eec50042523424c2aa4c99b3de9d28d03fc0edd14af37e0dcd0eab7bf822b9af73113be468b59
+  data.tar.gz: 31ed468565bd41fe2d0bd7b82d53d64e213a15e1ade2108ddf813637c228c18f6f7b456725c7e359a08754188ee19c90d06e013be90775ee6a64723b04fa25f0

data/CHANGELOG.md CHANGED Viewed

@@ -1,14 +1,45 @@
 ## [Unreleased]
+## [2.1.0] - 2021-02-22 - [Janosch Müller](mailto:janosch84@gmail.com)
+### Added
+- common ancestor for all scanning/parsing/lexing errors
+  * `Regexp::Parser::Error` can now be rescued as a catch-all
+  * the following errors (and their many descendants) now inherit from it:
+    - `Regexp::Expression::Conditional::TooManyBranches`
+    - `Regexp::Parser::ParserError`
+    - `Regexp::Scanner::ScannerError`
+    - `Regexp::Scanner::ValidationError`
+    - `Regexp::Syntax::SyntaxError`
+  * it replaces `ArgumentError` in some rare cases (`Regexp::Parser.parse('?')`)
+  * thanks to [sandstrom](https://github.com/sandstrom) for the cue
+### Fixed
+- fixed scanning of whole-pattern recursion calls `\g<0>` and `\g'0'`
+  * a regression in v2.0.1 had caused them to be scanned as literals
+- fixed scanning of some backreference and subexpression call edge cases
+  * e.g. `\k<+1>`, `\g<x-1>`
+- fixed tokenization of some escapes in character sets
+  * `.`, `|`, `{`, `}`, `(`, `)`, `^`, `$`, `?`, `+`, `*`
+  * all of these correctly emitted `#type` `:literal` and `#token` `:literal` if *not* escaped
+  * if escaped, they emitted e.g. `#type` `:escape` and `#token` `:group_open` for `[\(]`
+  * the escaped versions now correctly emit `#type` `:escape` and `#token` `:literal`
+- fixed handling of control/metacontrol escapes in character sets
+  * e.g. `[\cX]`, `[\M-\C-X]`
+  * they were misread as bunch of individual literals, escapes, and ranges
+- fixed some cases where calling `#dup`/`#clone` on expressions led to shared state
 ## [2.0.3] - 2020-12-28 - [Janosch Müller](mailto:janosch84@gmail.com)
 ### Fixed
 - fixed error when scanning some unlikely and redundant but valid charset patterns
-  - e.g. `/[[.a-b.]]/`, `/[[=e=]]/`,
+  * e.g. `/[[.a-b.]]/`, `/[[=e=]]/`,
 - fixed ancestry of some error classes related to syntax version lookup
-  - `NotImplementedError`, `InvalidVersionNameError`, `UnknownSyntaxNameError`
-  - they now correctly inherit from `Regexp::Syntax::SyntaxError` instead of Rubys `::SyntaxError`
+  * `NotImplementedError`, `InvalidVersionNameError`, `UnknownSyntaxNameError`
+  * they now correctly inherit from `Regexp::Syntax::SyntaxError` instead of Rubys `::SyntaxError`
 ## [2.0.2] - 2020-12-25 - [Janosch Müller](mailto:janosch84@gmail.com)

data/Gemfile CHANGED Viewed

@@ -6,5 +6,9 @@ group :development, :test do
   gem 'ice_nine', '~> 0.11.2'
   gem 'rake', '~> 13.0'
   gem 'regexp_property_values', '~> 1.0'
-  gem 'rspec', '~> 3.8'
+  gem 'rspec', '~> 3.10'
+  if RUBY_VERSION.to_f >= 2.7
+    gem 'gouteur'
+    gem 'rubocop', '~> 1.7'
+  end
 end

data/README.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # Regexp::Parser
-[![Gem Version](https://badge.fury.io/rb/regexp_parser.svg)](http://badge.fury.io/rb/regexp_parser) [![Build Status](https://github.com/ammar/regexp_parser/workflows/tests/badge.svg)](https://github.com/ammar/regexp_parser/actions) [![Code Climate](https://codeclimate.com/github/ammar/regexp_parser.svg)](https://codeclimate.com/github/ammar/regexp_parser/badges)
+[![Gem Version](https://badge.fury.io/rb/regexp_parser.svg)](http://badge.fury.io/rb/regexp_parser) [![Build Status](https://github.com/ammar/regexp_parser/workflows/tests/badge.svg)](https://github.com/ammar/regexp_parser/actions) [![Build Status](https://github.com/ammar/regexp_parser/workflows/gouteur/badge.svg)](https://github.com/ammar/regexp_parser/actions) [![Code Climate](https://codeclimate.com/github/ammar/regexp_parser.svg)](https://codeclimate.com/github/ammar/regexp_parser/badges)
 A Ruby gem for tokenizing, parsing, and transforming regular expressions.

data/Rakefile CHANGED Viewed

@@ -7,8 +7,8 @@ require 'bundler'
 require 'rubygems/package_task'
-RAGEL_SOURCE_DIR = File.expand_path '../lib/regexp_parser/scanner', __FILE__
-RAGEL_OUTPUT_DIR = File.expand_path '../lib/regexp_parser', __FILE__
+RAGEL_SOURCE_DIR = File.join(__dir__, 'lib/regexp_parser/scanner')
+RAGEL_OUTPUT_DIR = File.join(__dir__, 'lib/regexp_parser')
 RAGEL_SOURCE_FILES = %w{scanner} # scanner.rl includes property.rl
@@ -26,10 +26,10 @@ end
 namespace :ragel do
   desc "Process the ragel source files and output ruby code"
   task :rb do
-    RAGEL_SOURCE_FILES.each do |file|
-      output_file = "#{RAGEL_OUTPUT_DIR}/#{file}.rb"
+    RAGEL_SOURCE_FILES.each do |source_file|
+      output_file = "#{RAGEL_OUTPUT_DIR}/#{source_file}.rb"
       # using faster flat table driven FSM, about 25% larger code, but about 30% faster
-      sh "ragel -F1 -R #{RAGEL_SOURCE_DIR}/#{file}.rl -o #{output_file}"
+      sh "ragel -F1 -R #{RAGEL_SOURCE_DIR}/#{source_file}.rl -o #{output_file}"
       contents = File.read(output_file)
@@ -61,7 +61,7 @@ namespace :props do
   task :update do
     require 'regexp_property_values'
     RegexpPropertyValues.update
-    dir = File.expand_path('../lib/regexp_parser/scanner/properties', __FILE__)
+    dir = File.join(__dir__, 'lib/regexp_parser/scanner/properties')
     require 'psych'
     write_hash_to_file = ->(hash, path) do

data/lib/regexp_parser.rb CHANGED Viewed

@@ -1,6 +1,7 @@
 # encoding: utf-8
 require 'regexp_parser/version'
+require 'regexp_parser/error'
 require 'regexp_parser/token'
 require 'regexp_parser/scanner'
 require 'regexp_parser/syntax'

data/lib/regexp_parser/error.rb ADDED Viewed

@@ -0,0 +1,4 @@
+class Regexp::Parser
+  # base class for all gem-specific errors (inherited but never raised itself)
+  class Error < StandardError; end
+end

data/lib/regexp_parser/expression.rb CHANGED Viewed

@@ -21,7 +21,7 @@ module Regexp::Expression
       self.options           = options
     end
-    def initialize_clone(orig)
+    def initialize_copy(orig)
       self.text       = (orig.text       ? orig.text.dup         : nil)
       self.options    = (orig.options    ? orig.options.dup      : nil)
       self.quantifier = (orig.quantifier ? orig.quantifier.clone : nil)

data/lib/regexp_parser/expression/classes/backref.rb CHANGED Viewed

@@ -2,6 +2,11 @@ module Regexp::Expression
   module Backreference
     class Base < Regexp::Expression::Base
       attr_accessor :referenced_expression
+      def initialize_copy(orig)
+        self.referenced_expression = orig.referenced_expression.dup
+        super
+      end
     end
     class Number < Backreference::Base

data/lib/regexp_parser/expression/classes/conditional.rb CHANGED Viewed

@@ -1,6 +1,6 @@
 module Regexp::Expression
   module Conditional
-    class TooManyBranches < StandardError
+    class TooManyBranches < Regexp::Parser::Error
       def initialize
         super('The conditional expression has more than 2 branches')
       end
@@ -15,6 +15,11 @@ module Regexp::Expression
         ref = text.tr("'<>()", "")
         ref =~ /\D/ ? ref : Integer(ref)
       end
+      def initialize_copy(orig)
+        self.referenced_expression = orig.referenced_expression.dup
+        super
+      end
     end
     class Branch < Regexp::Expression::Sequence; end
@@ -53,6 +58,11 @@ module Regexp::Expression
       def to_s(format = :full)
         "#{text}#{condition}#{branches.join('|')})#{quantifier_affix(format)}"
       end
+      def initialize_copy(orig)
+        self.referenced_expression = orig.referenced_expression.dup
+        super
+      end
     end
   end
 end

data/lib/regexp_parser/expression/classes/free_space.rb CHANGED Viewed

@@ -2,7 +2,7 @@ module Regexp::Expression
   class FreeSpace < Regexp::Expression::Base
     def quantify(_token, _text, _min = nil, _max = nil, _mode = :greedy)
-      raise "Can not quantify a free space object"
+      raise Regexp::Parser::Error, 'Can not quantify a free space object'
     end
   end

data/lib/regexp_parser/expression/classes/group.rb CHANGED Viewed

@@ -35,6 +35,11 @@ module Regexp::Expression
     class Atomic  < Group::Base; end
     class Options < Group::Base
       attr_accessor :option_changes
+      def initialize_copy(orig)
+        self.option_changes = orig.option_changes.dup
+        super
+      end
     end
     class Capture < Group::Base
@@ -53,7 +58,7 @@ module Regexp::Expression
         super
       end
-      def initialize_clone(orig)
+      def initialize_copy(orig)
         @name = orig.name.dup
         super
       end

data/lib/regexp_parser/expression/classes/property.rb CHANGED Viewed

@@ -7,7 +7,7 @@ module Regexp::Expression
       end
       def name
-        text =~ /\A\\[pP]\{([^}]+)\}\z/; $1
+        text[/\A\\[pP]\{([^}]+)\}\z/, 1]
       end
       def shortcut

data/lib/regexp_parser/expression/classes/set/range.rb CHANGED Viewed

@@ -7,7 +7,8 @@ module Regexp::Expression
       alias :ts :starts_at
       def <<(exp)
-        complete? && raise("Can't add more than 2 expressions to a Range")
+        complete? and raise Regexp::Parser::Error,
+          "Can't add more than 2 expressions to a Range"
         super
       end

data/lib/regexp_parser/expression/quantifier.rb CHANGED Viewed

@@ -12,7 +12,7 @@ module Regexp::Expression
       @max   = max
     end
-    def initialize_clone(orig)
+    def initialize_copy(orig)
       @text = orig.text.dup
       super
     end

data/lib/regexp_parser/expression/sequence.rb CHANGED Viewed

@@ -41,17 +41,11 @@ module Regexp::Expression
     alias :ts :starts_at
     def quantify(token, text, min = nil, max = nil, mode = :greedy)
-      offset = -1
-      target = expressions[offset]
-      while target.is_a?(FreeSpace)
-        target = expressions[offset -= 1]
-      end
-      target || raise(ArgumentError, "No valid target found for '#{text}' "\
-                                     'quantifier')
+      target = expressions.reverse.find { |exp| !exp.is_a?(FreeSpace) }
+      target or raise Regexp::Parser::Error,
+        "No valid target found for '#{text}' quantifier"
       target.quantify(token, text, min, max, mode)
     end
   end
 end

data/lib/regexp_parser/expression/subexpression.rb CHANGED Viewed

@@ -12,7 +12,7 @@ module Regexp::Expression
     end
     # Override base method to clone the expressions as well.
-    def initialize_clone(orig)
+    def initialize_copy(orig)
       self.expressions = orig.expressions.map(&:clone)
       super
     end

data/lib/regexp_parser/parser.rb CHANGED Viewed

@@ -2,9 +2,8 @@ require 'regexp_parser/expression'
 class Regexp::Parser
   include Regexp::Expression
-  include Regexp::Syntax
-  class ParserError < StandardError; end
+  class ParserError < Regexp::Parser::Error; end
   class UnknownTokenTypeError < ParserError
     def initialize(type, token)
@@ -70,93 +69,155 @@ class Regexp::Parser
     enabled_options
   end
-  def nest(exp)
-    nesting.push(exp)
-    node << exp
-    update_transplanted_subtree(exp, node)
-    self.node = exp
-  end
+  def parse_token(token)
+    case token.type
+    when :anchor;                     anchor(token)
+    when :assertion, :group;          group(token)
+    when :backref;                    backref(token)
+    when :conditional;                conditional(token)
+    when :escape;                     escape(token)
+    when :free_space;                 free_space(token)
+    when :keep;                       keep(token)
+    when :literal;                    literal(token)
+    when :meta;                       meta(token)
+    when :posixclass, :nonposixclass; posixclass(token)
+    when :property, :nonproperty;     property(token)
+    when :quantifier;                 quantifier(token)
+    when :set;                        set(token)
+    when :type;                       type(token)
+    else
+      raise UnknownTokenTypeError.new(token.type, token)
+    end
-  # subtrees are transplanted to build Alternations, Intersections, Ranges
-  def update_transplanted_subtree(exp, new_parent)
-    exp.nesting_level = new_parent.nesting_level + 1
-    exp.respond_to?(:each) &&
-      exp.each { |subexp| update_transplanted_subtree(subexp, exp) }
+    close_completed_character_set_range
   end
-  def decrease_nesting
-    while nesting.last.is_a?(SequenceOperation)
-      nesting.pop
-      self.node = nesting.last
+  def anchor(token)
+    case token.token
+    when :bol;              node << Anchor::BeginningOfLine.new(token, active_opts)
+    when :bos;              node << Anchor::BOS.new(token, active_opts)
+    when :eol;              node << Anchor::EndOfLine.new(token, active_opts)
+    when :eos;              node << Anchor::EOS.new(token, active_opts)
+    when :eos_ob_eol;       node << Anchor::EOSobEOL.new(token, active_opts)
+    when :match_start;      node << Anchor::MatchStart.new(token, active_opts)
+    when :nonword_boundary; node << Anchor::NonWordBoundary.new(token, active_opts)
+    when :word_boundary;    node << Anchor::WordBoundary.new(token, active_opts)
+    else
+      raise UnknownTokenError.new('Anchor', token)
     end
-    nesting.pop
-    yield(node) if block_given?
-    self.node = nesting.last
-    self.node = node.last if node.last.is_a?(SequenceOperation)
   end
-  def nest_conditional(exp)
-    conditional_nesting.push(exp)
-    nest(exp)
+  def group(token)
+    case token.token
+    when :options, :options_switch
+      options_group(token)
+    when :close
+      close_group
+    when :comment
+      node << Group::Comment.new(token, active_opts)
+    else
+      open_group(token)
+    end
   end
-  def parse_token(token)
-    close_completed_character_set_range
+  MOD_FLAGS = %w[i m x].map(&:to_sym)
+  ENC_FLAGS = %w[a d u].map(&:to_sym)
-    case token.type
-    when :meta;         meta(token)
-    when :quantifier;   quantifier(token)
-    when :anchor;       anchor(token)
-    when :escape;       escape(token)
-    when :group;        group(token)
-    when :assertion;    group(token)
-    when :set;          set(token)
-    when :type;         type(token)
-    when :backref;      backref(token)
-    when :conditional;  conditional(token)
-    when :keep;         keep(token)
-    when :posixclass, :nonposixclass
-      posixclass(token)
-    when :property, :nonproperty
-      property(token)
-    when :literal
-      node << Literal.new(token, active_opts)
-    when :free_space
-      free_space(token)
+  def options_group(token)
+    positive, negative = token.text.split('-', 2)
+    negative ||= ''
+    self.switching_options = token.token.equal?(:options_switch)
-    else
-      raise UnknownTokenTypeError.new(token.type, token)
+    opt_changes = {}
+    new_active_opts = active_opts.dup
+    MOD_FLAGS.each do |flag|
+      if positive.include?(flag.to_s)
+        opt_changes[flag] = new_active_opts[flag] = true
+      end
+      if negative.include?(flag.to_s)
+        opt_changes[flag] = false
+        new_active_opts.delete(flag)
+      end
+    end
+    if (enc_flag = positive.reverse[/[adu]/])
+      enc_flag = enc_flag.to_sym
+      (ENC_FLAGS - [enc_flag]).each do |other|
+        opt_changes[other] = false if new_active_opts[other]
+        new_active_opts.delete(other)
+      end
+      opt_changes[enc_flag] = new_active_opts[enc_flag] = true
     end
+    options_stack << new_active_opts
+    options_group = Group::Options.new(token, active_opts)
+    options_group.option_changes = opt_changes
+    nest(options_group)
   end
-  def set(token)
-    case token.token
-    when :open
-      open_set(token)
-    when :close
-      close_set
-    when :negate
-      negate_set
-    when :range
-      range(token)
-    when :intersection
-      intersection(token)
-    else
-      raise UnknownTokenError.new('CharacterSet', token)
+  def open_group(token)
+    group_class =
+      case token.token
+      when :absence;     Group::Absence
+      when :atomic;      Group::Atomic
+      when :capture;     Group::Capture
+      when :named;       Group::Named
+      when :passive;     Group::Passive
+      when :lookahead;   Assertion::Lookahead
+      when :lookbehind;  Assertion::Lookbehind
+      when :nlookahead;  Assertion::NegativeLookahead
+      when :nlookbehind; Assertion::NegativeLookbehind
+      else
+        raise UnknownTokenError.new('Group type open', token)
+      end
+    group = group_class.new(token, active_opts)
+    if group.capturing?
+      group.number          = total_captured_group_count + 1
+      group.number_at_level = captured_group_count_at_level + 1
+      count_captured_group
     end
+    # Push the active options to the stack again. This way we can simply pop the
+    # stack for any group we close, no matter if it had its own options or not.
+    options_stack << active_opts
+    nest(group)
   end
-  def meta(token)
-    case token.token
-    when :dot
-      node << CharacterType::Any.new(token, active_opts)
-    when :alternation
-      sequence_operation(Alternation, token)
-    else
-      raise UnknownTokenError.new('Meta', token)
+  def total_captured_group_count
+    captured_group_counts.values.reduce(0, :+)
+  end
+  def captured_group_count_at_level
+    captured_group_counts[node.level]
+  end
+  def count_captured_group
+    captured_group_counts[node.level] += 1
+  end
+  def close_group
+    options_stack.pop unless switching_options
+    self.switching_options = false
+    decrease_nesting
+  end
+  def decrease_nesting
+    while nesting.last.is_a?(SequenceOperation)
+      nesting.pop
+      self.node = nesting.last
     end
+    nesting.pop
+    yield(node) if block_given?
+    self.node = nesting.last
+    self.node = node.last if node.last.is_a?(SequenceOperation)
   end
   def backref(token)
@@ -186,31 +247,9 @@ class Regexp::Parser
     end
   end
-  def type(token)
-    case token.token
-    when :digit
-      node << CharacterType::Digit.new(token, active_opts)
-    when :nondigit
-      node << CharacterType::NonDigit.new(token, active_opts)
-    when :hex
-      node << CharacterType::Hex.new(token, active_opts)
-    when :nonhex
-      node << CharacterType::NonHex.new(token, active_opts)
-    when :space
-      node << CharacterType::Space.new(token, active_opts)
-    when :nonspace
-      node << CharacterType::NonSpace.new(token, active_opts)
-    when :word
-      node << CharacterType::Word.new(token, active_opts)
-    when :nonword
-      node << CharacterType::NonWord.new(token, active_opts)
-    when :linebreak
-      node << CharacterType::Linebreak.new(token, active_opts)
-    when :xgrapheme
-      node << CharacterType::ExtendedGrapheme.new(token, active_opts)
-    else
-      raise UnknownTokenError.new('CharacterType', token)
-    end
+  def assign_effective_number(exp)
+    exp.effective_number =
+      exp.number + total_captured_group_count + (exp.number < 0 ? 1 : 0)
   end
   def conditional(token)
@@ -238,11 +277,118 @@ class Regexp::Parser
     end
   end
+  def nest_conditional(exp)
+    conditional_nesting.push(exp)
+    nest(exp)
+  end
+  def nest(exp)
+    nesting.push(exp)
+    node << exp
+    update_transplanted_subtree(exp, node)
+    self.node = exp
+  end
+  # subtrees are transplanted to build Alternations, Intersections, Ranges
+  def update_transplanted_subtree(exp, new_parent)
+    exp.nesting_level = new_parent.nesting_level + 1
+    exp.respond_to?(:each) &&
+      exp.each { |subexp| update_transplanted_subtree(subexp, exp) }
+  end
+  def escape(token)
+    case token.token
+    when :backspace;      node << EscapeSequence::Backspace.new(token, active_opts)
+    when :escape;         node << EscapeSequence::AsciiEscape.new(token, active_opts)
+    when :bell;           node << EscapeSequence::Bell.new(token, active_opts)
+    when :form_feed;      node << EscapeSequence::FormFeed.new(token, active_opts)
+    when :newline;        node << EscapeSequence::Newline.new(token, active_opts)
+    when :carriage;       node << EscapeSequence::Return.new(token, active_opts)
+    when :tab;            node << EscapeSequence::Tab.new(token, active_opts)
+    when :vertical_tab;   node << EscapeSequence::VerticalTab.new(token, active_opts)
+    when :codepoint;      node << EscapeSequence::Codepoint.new(token, active_opts)
+    when :codepoint_list; node << EscapeSequence::CodepointList.new(token, active_opts)
+    when :hex;            node << EscapeSequence::Hex.new(token, active_opts)
+    when :octal;          node << EscapeSequence::Octal.new(token, active_opts)
+    when :control
+      if token.text =~ /\A(?:\\C-\\M|\\c\\M)/
+        node << EscapeSequence::MetaControl.new(token, active_opts)
+      else
+        node << EscapeSequence::Control.new(token, active_opts)
+      end
+    when :meta_sequence
+      if token.text =~ /\A\\M-\\[Cc]/
+        node << EscapeSequence::MetaControl.new(token, active_opts)
+      else
+        node << EscapeSequence::Meta.new(token, active_opts)
+      end
+    else
+      # treating everything else as a literal
+      # TODO: maybe split this up a bit more in v3.0.0?
+      # E.g. escaped quantifiers or set meta chars are not the same
+      # as stuff that would be a literal even without the backslash.
+      # Right now, they all end up here.
+      node << EscapeSequence::Literal.new(token, active_opts)
+    end
+  end
+  def free_space(token)
+    case token.token
+    when :comment
+      node << Comment.new(token, active_opts)
+    when :whitespace
+      if node.last.is_a?(WhiteSpace)
+        node.last.merge(WhiteSpace.new(token, active_opts))
+      else
+        node << WhiteSpace.new(token, active_opts)
+      end
+    else
+      raise UnknownTokenError.new('FreeSpace', token)
+    end
+  end
+  def keep(token)
+    node << Keep::Mark.new(token, active_opts)
+  end
+  def literal(token)
+    node << Literal.new(token, active_opts)
+  end
+  def meta(token)
+    case token.token
+    when :dot
+      node << CharacterType::Any.new(token, active_opts)
+    when :alternation
+      sequence_operation(Alternation, token)
+    else
+      raise UnknownTokenError.new('Meta', token)
+    end
+  end
+  def sequence_operation(klass, token)
+    unless node.is_a?(klass)
+      operator = klass.new(token, active_opts)
+      sequence = operator.add_sequence(active_opts)
+      sequence.expressions = node.expressions
+      node.expressions = []
+      nest(operator)
+    end
+    node.add_sequence(active_opts)
+  end
   def posixclass(token)
     node << PosixClass.new(token, active_opts)
   end
   include Regexp::Expression::UnicodeProperty
+  UPTokens = Regexp::Syntax::Token::UnicodeProperty
   def property(token)
     case token.token
@@ -314,127 +460,20 @@ class Regexp::Parser
     when :private_use;            node << Codepoint::PrivateUse.new(token, active_opts)
     when :unassigned;             node << Codepoint::Unassigned.new(token, active_opts)
-    when *Token::UnicodeProperty::Age
-      node << Age.new(token, active_opts)
-    when *Token::UnicodeProperty::Derived
-      node << Derived.new(token, active_opts)
-    when *Token::UnicodeProperty::Emoji
-      node << Emoji.new(token, active_opts)
-    when *Token::UnicodeProperty::Script
-      node << Script.new(token, active_opts)
-    when *Token::UnicodeProperty::UnicodeBlock
-      node << Block.new(token, active_opts)
+    when *UPTokens::Age;          node << Age.new(token, active_opts)
+    when *UPTokens::Derived;      node << Derived.new(token, active_opts)
+    when *UPTokens::Emoji;        node << Emoji.new(token, active_opts)
+    when *UPTokens::Script;       node << Script.new(token, active_opts)
+    when *UPTokens::UnicodeBlock; node << Block.new(token, active_opts)
     else
       raise UnknownTokenError.new('UnicodeProperty', token)
     end
   end
-  def anchor(token)
-    case token.token
-    when :bol
-      node << Anchor::BeginningOfLine.new(token, active_opts)
-    when :eol
-      node << Anchor::EndOfLine.new(token, active_opts)
-    when :bos
-      node << Anchor::BOS.new(token, active_opts)
-    when :eos
-      node << Anchor::EOS.new(token, active_opts)
-    when :eos_ob_eol
-      node << Anchor::EOSobEOL.new(token, active_opts)
-    when :word_boundary
-      node << Anchor::WordBoundary.new(token, active_opts)
-    when :nonword_boundary
-      node << Anchor::NonWordBoundary.new(token, active_opts)
-    when :match_start
-      node << Anchor::MatchStart.new(token, active_opts)
-    else
-      raise UnknownTokenError.new('Anchor', token)
-    end
-  end
-  def escape(token)
-    case token.token
-    when :backspace
-      node << EscapeSequence::Backspace.new(token, active_opts)
-    when :escape
-      node << EscapeSequence::AsciiEscape.new(token, active_opts)
-    when :bell
-      node << EscapeSequence::Bell.new(token, active_opts)
-    when :form_feed
-      node << EscapeSequence::FormFeed.new(token, active_opts)
-    when :newline
-      node << EscapeSequence::Newline.new(token, active_opts)
-    when :carriage
-      node << EscapeSequence::Return.new(token, active_opts)
-    when :tab
-      node << EscapeSequence::Tab.new(token, active_opts)
-    when :vertical_tab
-      node << EscapeSequence::VerticalTab.new(token, active_opts)
-    when :hex
-      node << EscapeSequence::Hex.new(token, active_opts)
-    when :octal
-      node << EscapeSequence::Octal.new(token, active_opts)
-    when :codepoint
-      node << EscapeSequence::Codepoint.new(token, active_opts)
-    when :codepoint_list
-      node << EscapeSequence::CodepointList.new(token, active_opts)
-    when :control
-      if token.text =~ /\A(?:\\C-\\M|\\c\\M)/
-        node << EscapeSequence::MetaControl.new(token, active_opts)
-      else
-        node << EscapeSequence::Control.new(token, active_opts)
-      end
-    when :meta_sequence
-      if token.text =~ /\A\\M-\\[Cc]/
-        node << EscapeSequence::MetaControl.new(token, active_opts)
-      else
-        node << EscapeSequence::Meta.new(token, active_opts)
-      end
-    else
-      # treating everything else as a literal
-      node << EscapeSequence::Literal.new(token, active_opts)
-    end
-  end
-  def keep(token)
-    node << Keep::Mark.new(token, active_opts)
-  end
-  def free_space(token)
-    case token.token
-    when :comment
-      node << Comment.new(token, active_opts)
-    when :whitespace
-      if node.last.is_a?(WhiteSpace)
-        node.last.merge(WhiteSpace.new(token, active_opts))
-      else
-        node << WhiteSpace.new(token, active_opts)
-      end
-    else
-      raise UnknownTokenError.new('FreeSpace', token)
-    end
-  end
   def quantifier(token)
-    offset = -1
-    target_node = node.expressions[offset]
-    while target_node.is_a?(FreeSpace)
-      target_node = node.expressions[offset -= 1]
-    end
-    target_node || raise(ArgumentError, 'No valid target found for '\
-                                        "'#{token.text}' ")
+    target_node = node.expressions.reverse.find { |exp| !exp.is_a?(FreeSpace) }
+    target_node or raise ParserError, "No valid target found for '#{token.text}'"
     # in case of chained quantifiers, wrap target in an implicit passive group
     # description of the problem: https://github.com/ammar/regexp_parser/issues/3
@@ -454,7 +493,7 @@ class Regexp::Parser
       new_group.implicit = true
       new_group << target_node
       increase_level(target_node)
-      node.expressions[offset] = new_group
+      node.expressions[node.expressions.index(target_node)] = new_group
       target_node = new_group
     end
@@ -515,100 +554,16 @@ class Regexp::Parser
     target_node.quantify(:interval, text, min.to_i, max.to_i, mode)
   end
-  def group(token)
-    case token.token
-    when :options, :options_switch
-      options_group(token)
-    when :close
-      close_group
-    when :comment
-      node << Group::Comment.new(token, active_opts)
-    else
-      open_group(token)
-    end
-  end
-  MOD_FLAGS = %w[i m x].map(&:to_sym)
-  ENC_FLAGS = %w[a d u].map(&:to_sym)
-  def options_group(token)
-    positive, negative = token.text.split('-', 2)
-    negative ||= ''
-    self.switching_options = token.token.equal?(:options_switch)
-    opt_changes = {}
-    new_active_opts = active_opts.dup
-    MOD_FLAGS.each do |flag|
-      if positive.include?(flag.to_s)
-        opt_changes[flag] = new_active_opts[flag] = true
-      end
-      if negative.include?(flag.to_s)
-        opt_changes[flag] = false
-        new_active_opts.delete(flag)
-      end
-    end
-    if (enc_flag = positive.reverse[/[adu]/])
-      enc_flag = enc_flag.to_sym
-      (ENC_FLAGS - [enc_flag]).each do |other|
-        opt_changes[other] = false if new_active_opts[other]
-        new_active_opts.delete(other)
-      end
-      opt_changes[enc_flag] = new_active_opts[enc_flag] = true
-    end
-    options_stack << new_active_opts
-    options_group = Group::Options.new(token, active_opts)
-    options_group.option_changes = opt_changes
-    nest(options_group)
-  end
-  def open_group(token)
+  def set(token)
     case token.token
-    when :passive
-      exp = Group::Passive.new(token, active_opts)
-    when :atomic
-      exp = Group::Atomic.new(token, active_opts)
-    when :named
-      exp = Group::Named.new(token, active_opts)
-    when :capture
-      exp = Group::Capture.new(token, active_opts)
-    when :absence
-      exp = Group::Absence.new(token, active_opts)
-    when :lookahead
-      exp = Assertion::Lookahead.new(token, active_opts)
-    when :nlookahead
-      exp = Assertion::NegativeLookahead.new(token, active_opts)
-    when :lookbehind
-      exp = Assertion::Lookbehind.new(token, active_opts)
-    when :nlookbehind
-      exp = Assertion::NegativeLookbehind.new(token, active_opts)
+    when :open;         open_set(token)
+    when :close;        close_set
+    when :negate;       negate_set
+    when :range;        range(token)
+    when :intersection; intersection(token)
     else
-      raise UnknownTokenError.new('Group type open', token)
-    end
-    if exp.capturing?
-      exp.number          = total_captured_group_count + 1
-      exp.number_at_level = captured_group_count_at_level + 1
-      count_captured_group
+      raise UnknownTokenError.new('CharacterSet', token)
     end
-    # Push the active options to the stack again. This way we can simply pop the
-    # stack for any group we close, no matter if it had its own options or not.
-    options_stack << active_opts
-    nest(exp)
-  end
-  def close_group
-    options_stack.pop unless switching_options
-    self.switching_options = false
-    decrease_nesting
   end
   def open_set(token)
@@ -631,51 +586,45 @@ class Regexp::Parser
     nest(exp)
   end
-  def close_completed_character_set_range
-    decrease_nesting if node.is_a?(CharacterSet::Range) && node.complete?
-  end
   def intersection(token)
     sequence_operation(CharacterSet::Intersection, token)
   end
-  def sequence_operation(klass, token)
-    unless node.is_a?(klass)
-      operator = klass.new(token, active_opts)
-      sequence = operator.add_sequence(active_opts)
-      sequence.expressions = node.expressions
-      node.expressions = []
-      nest(operator)
+  def type(token)
+    case token.token
+    when :digit;     node << CharacterType::Digit.new(token, active_opts)
+    when :hex;       node << CharacterType::Hex.new(token, active_opts)
+    when :linebreak; node << CharacterType::Linebreak.new(token, active_opts)
+    when :nondigit;  node << CharacterType::NonDigit.new(token, active_opts)
+    when :nonhex;    node << CharacterType::NonHex.new(token, active_opts)
+    when :nonspace;  node << CharacterType::NonSpace.new(token, active_opts)
+    when :nonword;   node << CharacterType::NonWord.new(token, active_opts)
+    when :space;     node << CharacterType::Space.new(token, active_opts)
+    when :word;      node << CharacterType::Word.new(token, active_opts)
+    when :xgrapheme; node << CharacterType::ExtendedGrapheme.new(token, active_opts)
+    else
+      raise UnknownTokenError.new('CharacterType', token)
     end
-    node.add_sequence(active_opts)
-  end
-  def active_opts
-    options_stack.last
-  end
-  def total_captured_group_count
-    captured_group_counts.values.reduce(0, :+)
-  end
-  def captured_group_count_at_level
-    captured_group_counts[node.level]
   end
-  def count_captured_group
-    captured_group_counts[node.level] += 1
+  def close_completed_character_set_range
+    decrease_nesting if node.is_a?(CharacterSet::Range) && node.complete?
   end
-  def assign_effective_number(exp)
-    exp.effective_number =
-      exp.number + total_captured_group_count + (exp.number < 0 ? 1 : 0)
+  def active_opts
+    options_stack.last
   end
+  # Assigns referenced expressions to refering expressions, e.g. if there is
+  # an instance of Backreference::Number, its #referenced_expression is set to
+  # the instance of Group::Capture that it refers to via its number.
   def assign_referenced_expressions
     targets = {}
+    # find all referencable expressions
     root.each_expression do |exp|
       exp.is_a?(Group::Capture) && targets[exp.identifier] = exp
     end
+    # assign them to any refering expressions
     root.each_expression do |exp|
       exp.respond_to?(:reference) &&
         exp.referenced_expression = targets[exp.reference]