RubyGems - regexp_parser - Versions diffs - 0.1.6 → 0.2.0 - Mend

regexp_parser 0.1.6 → 0.2.0

Files changed (84) hide show

checksums.yaml +4 -4
data/ChangeLog +57 -0
data/Gemfile +8 -0
data/LICENSE +1 -1
data/README.md +225 -206
data/Rakefile +9 -3
data/lib/regexp_parser.rb +7 -11
data/lib/regexp_parser/expression.rb +72 -14
data/lib/regexp_parser/expression/classes/alternation.rb +3 -16
data/lib/regexp_parser/expression/classes/conditional.rb +57 -0
data/lib/regexp_parser/expression/classes/free_space.rb +17 -0
data/lib/regexp_parser/expression/classes/keep.rb +7 -0
data/lib/regexp_parser/expression/classes/set.rb +28 -7
data/lib/regexp_parser/expression/methods/strfregexp.rb +113 -0
data/lib/regexp_parser/expression/methods/tests.rb +116 -0
data/lib/regexp_parser/expression/methods/traverse.rb +63 -0
data/lib/regexp_parser/expression/quantifier.rb +10 -0
data/lib/regexp_parser/expression/sequence.rb +45 -0
data/lib/regexp_parser/expression/subexpression.rb +29 -1
data/lib/regexp_parser/lexer.rb +31 -8
data/lib/regexp_parser/parser.rb +118 -45
data/lib/regexp_parser/scanner.rb +1745 -1404
data/lib/regexp_parser/scanner/property.rl +57 -3
data/lib/regexp_parser/scanner/scanner.rl +161 -34
data/lib/regexp_parser/syntax.rb +12 -2
data/lib/regexp_parser/syntax/ruby/1.9.1.rb +3 -3
data/lib/regexp_parser/syntax/ruby/1.9.3.rb +2 -7
data/lib/regexp_parser/syntax/ruby/2.0.0.rb +4 -1
data/lib/regexp_parser/syntax/ruby/2.1.4.rb +13 -0
data/lib/regexp_parser/syntax/ruby/2.1.5.rb +13 -0
data/lib/regexp_parser/syntax/ruby/2.1.rb +2 -2
data/lib/regexp_parser/syntax/ruby/2.2.0.rb +16 -0
data/lib/regexp_parser/syntax/ruby/2.2.rb +8 -0
data/lib/regexp_parser/syntax/tokens.rb +19 -2
data/lib/regexp_parser/syntax/tokens/conditional.rb +22 -0
data/lib/regexp_parser/syntax/tokens/keep.rb +14 -0
data/lib/regexp_parser/syntax/tokens/unicode_property.rb +45 -4
data/lib/regexp_parser/token.rb +23 -8
data/lib/regexp_parser/version.rb +5 -0
data/regexp_parser.gemspec +35 -0
data/test/expression/test_all.rb +6 -1
data/test/expression/test_base.rb +19 -0
data/test/expression/test_conditionals.rb +114 -0
data/test/expression/test_free_space.rb +33 -0
data/test/expression/test_set.rb +61 -0
data/test/expression/test_strfregexp.rb +214 -0
data/test/expression/test_subexpression.rb +24 -0
data/test/expression/test_tests.rb +99 -0
data/test/expression/test_to_h.rb +48 -0
data/test/expression/test_to_s.rb +46 -0
data/test/expression/test_traverse.rb +164 -0
data/test/lexer/test_all.rb +16 -3
data/test/lexer/test_conditionals.rb +101 -0
data/test/lexer/test_keep.rb +24 -0
data/test/lexer/test_literals.rb +51 -51
data/test/lexer/test_nesting.rb +62 -62
data/test/lexer/test_refcalls.rb +18 -20
data/test/parser/test_all.rb +18 -3
data/test/parser/test_alternation.rb +11 -14
data/test/parser/test_conditionals.rb +148 -0
data/test/parser/test_escapes.rb +29 -5
data/test/parser/test_free_space.rb +139 -0
data/test/parser/test_groups.rb +40 -0
data/test/parser/test_keep.rb +21 -0
data/test/scanner/test_all.rb +8 -2
data/test/scanner/test_conditionals.rb +166 -0
data/test/scanner/test_escapes.rb +8 -5
data/test/scanner/test_free_space.rb +133 -0
data/test/scanner/test_groups.rb +28 -0
data/test/scanner/test_keep.rb +33 -0
data/test/scanner/test_properties.rb +4 -0
data/test/scanner/test_scripts.rb +71 -1
data/test/syntax/ruby/test_1.9.3.rb +2 -2
data/test/syntax/ruby/test_2.0.0.rb +38 -0
data/test/syntax/ruby/test_2.2.0.rb +38 -0
data/test/syntax/ruby/test_all.rb +1 -8
data/test/syntax/ruby/test_files.rb +104 -0
data/test/test_all.rb +2 -1
data/test/token/test_all.rb +2 -0
data/test/token/test_token.rb +109 -0
metadata +75 -21
data/VERSION.yml +0 -5
data/lib/regexp_parser/ctype.rb +0 -48
data/test/syntax/ruby/test_2.x.rb +0 -46

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA1:
-  metadata.gz: 6ef4ef1e296f8e15fe5316a5603c15e96446cb45
-  data.tar.gz: 0430451d4d0fb874dbdcc123d017d2867856d891
+  metadata.gz: 231a27b00daf24a41710b45ef92fef5b6963dc5a
+  data.tar.gz: 1cd1f75da74654cd20a0ac7716aed8519490fef5
 SHA512:
-  metadata.gz: 05706f3dbe8f1fe9684ea63abf9a0b0e4ff354146bddd488a72fb9bd5352979123bb6d6adc75624dae33c39f1d41e34b76c33cc3706f1ee6f6a989b8e2e259f1
-  data.tar.gz: d427c8ec82b4f955f47b1f27043a23604f654ff25244160fa416b0331c587cca0f2c4f1e9f47ee898934b753391e03cfa1ca6a3eb88b04c34efc255adf6391b8
+  metadata.gz: 620846b89adb5b8d27efe722af58951951ce0e7362f646fcc397c94f5597dc999d79851a507d13338ca7f59b67a25539a205878209f7787f6d4053ba95b2555c
+  data.tar.gz: 2f2400eede6011229f6690c8230ae1c6b3abdc2e83380cf0e7b8b6e1dd72c2c7035e82f87f808f64e0f47442df961ba13dae04164c35c776f41cf92aefa2f515

data/ChangeLog CHANGED Viewed

@@ -1,3 +1,60 @@
+Wed Dec 3 05:21:27 2014 Ammar Ali <ammarabuali@gmail.com>
+	* Added expand_members method to CharacterSet, returns traditional
+	  or unicode property forms of shothands (\d, \W, \s, etc.)
+Tue Dec 2 02:42:39 2014 Ammar Ali <ammarabuali@gmail.com>
+	* Improved meaning and output of %t and %T in strfregexp.
+	* Added syntax versions for ruby 2.1.4 and 2.1.5 and updated
+		latest 2.1 version.
+Mon Dec 1 15:52:31 2014 Ammar Ali <ammarabuali@gmail.com>
+	* Added to_h methods to Expression, Subexpression, and Quantifier.
+Tue Oct 21 19:14:03 2014 Ammar Ali <ammarabuali@gmail.com>
+	* Added traversal methods; traverse, each_expression, and map.
+	* Added token/type test methods; type?, is?, and one_of?
+	* Added printing method strfregexp, inspired by strftime.
+Mon Oct 20 01:03:46 2014 Ammar Ali <ammarabuali@gmail.com>
+	* Added scanning and parsing of free spacing (x mode) expressions.
+	* Improved handling of inline options (?mixdau:...)
+Fri Oct 18 14:09:38 2014 Ammar Ali <ammarabuali@gmail.com>
+	* Added conditional expressions. Ruby 2.0.
+	* Added keep (\K) markers. Ruby 2.0.
+	* Added d, a, and u options. Ruby 2.0.
+	* Added missing meta sequences to the parser. They were supported
+	  by the scanner only.
+	* Renamed Lexer's method to lex, added an alias to the old name (scan)
+	* Use #map instead of #each to run the block in Lexer.lex.
+	* Replaced VERSION.yml file with a constant.
+	* Updated README
+Fri Oct 10 11:49:38 2014 Ammar Ali <ammarabuali@gmail.com>
+	* Update tokens and scanner with new additions in Unicode 7.0.
+Mon Oct 6 04:30:24 2014 Ammar Ali <ammarabuali@gmail.com>
+	* Released version 0.1.6
 Sun Oct 5 19:58:17 2014 Ammar Ali <ammarabuali@gmail.com>
 	* Fixed test and gem building rake tasks and extracted the gem

data/Gemfile ADDED Viewed

@@ -0,0 +1,8 @@
+source 'https://rubygems.org'
+gemspec
+group :development, :test do
+  gem 'rake'
+  gem 'test-unit'
+end

data/LICENSE CHANGED Viewed

@@ -1,4 +1,4 @@
-Copyright (c) 2010 Ammar Ali
+Copyright (c) 2010, 2012-2014,  Ammar Ali
 Permission is hereby granted, free of charge, to any person
 obtaining a copy of this software and associated documentation

data/README.md CHANGED Viewed

@@ -1,57 +1,87 @@
-# Regexp::Parser [![Gem Version](https://badge.fury.io/rb/regexp_parser.svg)](http://badge.fury.io/rb/regexp_parser) [![Build Status](https://secure.travis-ci.org/ammar/regexp_parser.png?branch=master)](http://travis-ci.org/ammar/regexp_parser) [![Code Climate](https://codeclimate.com/github/ammar/regexp_parser.png)](https://codeclimate.com/github/ammar/regexp_parser/badges)
+# Regexp::Parser
-A ruby library to help with lexing, parsing, and transforming regular expressions.
+[![Gem Version](https://badge.fury.io/rb/regexp_parser.svg)](http://badge.fury.io/rb/regexp_parser) [![Build Status](https://secure.travis-ci.org/ammar/regexp_parser.png?branch=master)](http://travis-ci.org/ammar/regexp_parser) [![Code Climate](https://codeclimate.com/github/ammar/regexp_parser.png)](https://codeclimate.com/github/ammar/regexp_parser/badges)
+A ruby gem for tokenizing, parsing, and transforming regular expressions.
 * Multilayered
-  * A scanner based on [ragel](http://www.complang.org/ragel/)
-  * A lexer that produces a "stream" of tokens
-  * A parser that produces a "tree" of Regexp::Expression objects (OO API)
-* Supports ruby 1.8, 1.9, and all but one of the 2.x expressions [See Scanner Syntax](#scanner-syntax)
-* Supports ruby 1.8, 1.9, 2.0, and 2.1 runtimes.
+  * A scanner/tokenizer based on [ragel](http://www.colm.net/open-source/ragel/)
+  * A lexer that produces a "stream" of token objects.
+  * A parser that produces a "tree" of Expression objects (OO API)
+* Runs on ruby 1.8, 1.9, 2.x, and jruby (1.9 mode) runtimes.
+* Recognizes ruby 1.8, 1.9, and 2.x regular expressions [See Scanner Syntax](#scanner-syntax)
 _For an example of regexp_parser in use, see the [meta_re project](https://github.com/ammar/meta_re)_
 ---
 ## Requirements
-* ruby '1.8.7'..'2.1.3'
-* ragel, but only if you want to build the gem or work on the scanner
+* Ruby >= 1.8.7
+* Ragel >= 6.0, but only if you want to build the gem or work on the scanner.
 _Note: See the .travis.yml file for covered versions._
 ---
 ## Install
+Install the gem with:
   `gem install regexp_parser`
+Or, add it to your project's `Gemfile`:
+```gem 'regexp_parser', '~> X.Y.Z'```
+See rubygems for the the [latest version number](https://rubygems.org/gems/regexp_parser)
 ---
 ## Usage
+The three main modules are **Scanner**, **Lexer**, and **Parser**. Each of them
+provides a single method that takes a regular expression (as a RegExp object or
+a string) and returns its results. The **Lexer** and the **Parser** accept an
+optional second argument that specifies the syntax version, like 'ruby/2.0',
+which defaults to the host ruby version (using RUBY_VERSION).
+Here are the basic usage examples:
 ```ruby
-# require the gem, then call one of:
 require 'regexp_parser'
-# The Scanner
-Regexp::Scanner.scan regexp
+Regexp::Scanner.scan(regexp)
-# The Lexer
-Regexp::Lexer.scan regexp
+Regexp::Lexer.lex(regexp)
-# Or the Parser
-Regexp::Parser.parse regexp
+Regexp::Parser.parse(regexp)
 ```
-_All three can either return their results or take a block to perform further handling._
+All three methods accept a block as the last argument, which, if given, gets
+called with the results as follows:
+* **Scanner**: the block gets passed the results as they are scanned. See the
+  example in the next section for details.
+* **Lexer**: after completion, the block gets passed the tokens one by one.
+  _The result of the block is returned._
+* **Parser**: after completion, the block gets passed the root expression.
+  _The result of the block is returned._
 ---
 ## Components
 ### Scanner
-A ragel generated scanner that recognizes the cumulative syntax of both
-supported flavors. Breaks the expression's text into tokens, including
-their type, token, text, and start/end offsets within the original
-pattern.
+A ragel generated scanner that recognizes the cumulative syntax of all
+supported syntax versions. It breaks a given expression's text into the
+smallest parts, and identifies their type, token, text, and start/end
+offsets within the pattern.
 #### Example
 The following scans the given pattern and prints out the type, token, text and
@@ -79,7 +109,8 @@ end
 # type: group, token: close, text: ')' [15..16]
 ```
-A one-liner that returns an array of the textual parts of the given pattern:
+A one-liner that uses map on the result of the scan to return the textual
+parts of the pattern:
 ```ruby
 Regexp::Scanner.scan( /(cat?([bhm]at)){3,5}/ ).map {|token| token[2]}
@@ -90,17 +121,18 @@ Regexp::Scanner.scan( /(cat?([bhm]at)){3,5}/ ).map {|token| token[2]}
 #### Notes
   * The scanner performs basic syntax error checking, like detecting missing
     balancing punctuation and premature end of pattern. Flavor validity checks
-    are performed in the lexer.
+    are performed in the lexer, which uses a syntax object.
-  * If the input is a ruby Regexp object, the scanner calls #source on it to
+  * If the input is a ruby **Regexp** object, the scanner calls #source on it to
     get its string representation. #source does not include the options of
-    expression (m, i, and x) To include the options the scan, #to_s should
-    be called on the Regexp before passing it to the scanner, or any of the
-    higher layers.
+    the expression (m, i, and x) To include the options in the scan, #to_s
+    should be called on the **Regexp** before passing it to the scanner or any
+    of the other modules.
   * To keep the scanner simple(r) and fairly reusable for other purposes, it
     does not perform lexical analysis on the tokens, sticking to the task
-    of tokenizing and leaving lexical analysis upto to the lexer.
+    of identifying the smallest possible tokens and leaving lexical analysis
+    to the lexer.
 ---
@@ -110,28 +142,36 @@ flavor). Syntax classes act as lookup tables, and are layered to create
 flavor variations. Syntax only comes into play in the lexer.
 #### Example
-The following instantiates the syntax for Ruby 1.9 and checks a couple of its
-implementations features, and then does the same for Ruby 1.8:
+The following instantiates syntax objects for Ruby 2.0, 1.9, 1.8, and
+checks a few of their implementation features.
 ```ruby
 require 'regexp_parser'
+ruby_20 = Regexp::Syntax.new 'ruby/2.0'
+ruby_20.implements? :quantifier,  :zero_or_one             # => true
+ruby_20.implements? :quantifier,  :zero_or_one_reluctant   # => true
+ruby_20.implements? :quantifier,  :zero_or_one_possessive  # => true
+ruby_20.implements? :conditional, :condition               # => true
 ruby_19 = Regexp::Syntax.new 'ruby/1.9'
-ruby_19.implements? :quantifier, :zero_or_one             # => true
-ruby_19.implements? :quantifier, :zero_or_one_reluctant   # => true
-ruby_19.implements? :quantifier, :zero_or_one_possessive  # => true
+ruby_19.implements? :quantifier,  :zero_or_one             # => true
+ruby_19.implements? :quantifier,  :zero_or_one_reluctant   # => true
+ruby_19.implements? :quantifier,  :zero_or_one_possessive  # => true
+ruby_19.implements? :conditional, :condition               # => false
 ruby_18 = Regexp::Syntax.new 'ruby/1.8'
-ruby_18.implements? :quantifier, :zero_or_one             # => true
-ruby_18.implements? :quantifier, :zero_or_one_reluctant   # => true
-ruby_18.implements? :quantifier, :zero_or_one_possessive  # => false
+ruby_18.implements? :quantifier,  :zero_or_one             # => true
+ruby_18.implements? :quantifier,  :zero_or_one_reluctant   # => true
+ruby_18.implements? :quantifier,  :zero_or_one_possessive  # => false
+ruby_18.implements? :conditional, :condition               # => false
 ```
 #### Notes
-  * Variatiions on a token, for example a named group with < and > vs one with a
-    pair of single quotes, are specified with an underscore followed by two
-    characters appended to the base token. In the previous named group example,
+  * Variations on a token, for example a named group with angle brackets (< and >)
+    vs one with a pair of single quotes, are specified with an underscore followed
+    by two characters appended to the base token. In the previous named group example,
     the tokens would be :named_ab (angle brackets) and :named_sq (single quotes).
     These variations are normalized by the syntax to :named.
@@ -139,22 +179,23 @@ ruby_18.implements? :quantifier, :zero_or_one_possessive  # => false
 ---
 ### Lexer
 Sits on top of the scanner and performs lexical analysis on the tokens that
-it emits. Among its tasks are breaking quantified literal runs, collecting the
-emitted token structures into an array of Token objects, calculating their
-nesting depth, normalizing tokens for the parser, and checkng if the tokens
-are implemented by the given syntax flavor.
+it emits. Among its tasks are; breaking quantified literal runs, collecting the
+emitted token attributes into Token objects, calculating their nesting depth,
+normalizing tokens for the parser, and checkng if the tokens are implemented by
+the given syntax version.
+See the [Token Objects](https://github.com/ammar/regexp_parser/wiki/Token-Objects)
+wiki page for more information on Token objects.
-Tokens are Struct objects, with a few helper methods; #next, #previous, #offsets
-and #length.
 #### Example
-The following example scans the given pattern, checks it against the ruby 1.8
-syntax, and prints the token objects' text.
+The following example lexes the given pattern, checks it against the ruby 1.9
+syntax, and prints the token objects' text indented to their level.
 ```ruby
 require 'regexp_parser'
-Regexp::Lexer.scan /a?(b(c))*[d]+/ do |token|
+Regexp::Lexer.scan /a?(b(c))*[d]+/, 'ruby/1.9' do |token|
   puts "#{'  ' * token.level}#{token.text}"
 end
@@ -175,8 +216,9 @@ end
 ```
 A one-liner that returns an array of the textual parts of the given pattern.
-Compare the output with that of the one-liner example of the Scanner; notably
-how the sequence 'cat' is treated.
+Compare the output with that of the one-liner example of the **Scanner**; notably
+how the sequence 'cat' is treated. The 't' is seperated because it's followed
+by a quantifier that only applies to it.
 ```ruby
 Regexp::Lexer.scan( /(cat?([b]at)){3,5}/ ).map {|token| token.text}
@@ -184,50 +226,70 @@ Regexp::Lexer.scan( /(cat?([b]at)){3,5}/ ).map {|token| token.text}
 ```
 #### Notes
-  * The default syntax is that of the latest released version of ruby.
+  * The syntax argument is optional. It defaults to the version of the ruby
+    interpreter in use, as returned by RUBY_VERSION.
-  * The lexer performs some basic parsing to determine the depth of the
-    emitted tokens. This responsibility might be relegated to the scanner
-    in a future release.
+  * The lexer normalizes some tokens, as noted in the Syntax section above.
 ---
 ### Parser
 Sits on top of the lexer and transforms the "stream" of Token objects emitted
 by it into a tree of Expression objects represented by an instance of the
-Expression::Root class. See Expression below for more information.
+Expression::Root class.
+See the [Expression Objects](https://github.com/ammar/regexp_parser/wiki/Expression-Objects)
+wiki page for attributes and methods.
 #### Example
 ```ruby
 require 'regexp_parser'
-regex = /a?(b)*[c]+/m
+regex = /a?(b+(c)d)*(?<name>[0-9]+)/
-# using #to_s on the Regexp object to include options. Note that this turns the
-# expression into '(?m-ix:a?(b)*[c]+)', thus the Group::Options in the output
-root = Regexp::Parser.parse( regex.to_s, 'ruby/2.1')
+tree = Regexp::Parser.parse( regex, 'ruby/2.1' )
-root.multiline?         # => true (aliased as m?)
-root.case_insensitive?  # => false (aliased as i?)
+tree.traverse do |event, exp|
+  puts "#{event}: #{exp.type} `#{exp.to_s}`"
+end
-# simple tree walking method (depth-first, pre-order)
-def walk(e, depth = 0)
-  puts "#{'  ' * depth}> #{e.class}"
+# Output
+# visit: literal `a?`
+# enter: group `(b+(c)d)*`
+# visit: literal `b+`
+# enter: group `(c)`
+# visit: literal `c`
+# exit: group `(c)`
+# visit: literal `d`
+# exit: group `(b+(c)d)*`
+# enter: group `(?<name>[0-9]+)`
+# visit: set `[0-9]+`
+# exit: group `(?<name>[0-9]+)`
+```
-  if e.respond_to?(:expressions)
-    e.each {|s| walk(s, depth+1) }
-  end
-end
+Another example, using each_expression and strfregexp to print the object tree.
+_See the traverse.rb and strfregexp.rb files under `lib/regexp_parser/expression/methods`
+for more information on these methods._
-walk(root)
+```ruby
+include_root  = true
+indent_offset = include_root ? 1 : 0
-# output
+tree.each_expression(include_root) do |exp, level_index|
+  puts exp.strfregexp("%>> %c", indent_offset)
+end
+# Output
 # > Regexp::Expression::Root
-#   > Regexp::Expression::Group::Options
+#   > Regexp::Expression::Literal
+#   > Regexp::Expression::Group::Capture
 #     > Regexp::Expression::Literal
 #     > Regexp::Expression::Group::Capture
 #       > Regexp::Expression::Literal
+#     > Regexp::Expression::Literal
+#   > Regexp::Expression::Group::Named
 #     > Regexp::Expression::CharacterSet
 ```
@@ -236,122 +298,84 @@ Expression class. See the next section for details._
 ---
-### Expression
-The base class of all objects returned by the parser, implements most of the
-functions that are common to all expression classes.
-Each Expression object contains the following members:
-  * **quantifier**: an instance of Expression::Quantifier that holds the details
-    of repetition for the Expression. Has a nil value if the expression is not
-    quantified.
-  * **expressions**: an array, holds the sub-expressions for the expression if it
-    is a group or alternation expression. Empty if the expression doesn't have
-    sub-expressions.
-  * **options**: a hash, holds the keys :i, :m, and :x with a boolean value that
-    indicates if the expression has a given option.
-Expressions also contain the following members from the scanner/lexer:
-  * **type**: a symbol, denoting the expression type, such as :group, :quantifier
-  * **token**: a symbol, for the object's token, or opening token (in the case of
-    groups and sets)
-  * **text**: a string, the text of the expression (same as token for nesting expressions)
-Every expression also has the following methods:
-  * **to_s**: returns the string representation of the expression.
-  * **<<**: adds sub-expresions to the expression.
-  * **each**: iterates over the expressions sub-expressions, if any.
-  * **[]**: access sub-expressions by index.
-  * **quantified?**: return true if the expression was followed by a quantifier.
-  * **quantity**: returns an array of the expression's min and max repetitions.
-  * **greedy?**: returns true if the expression's quantifier is greedy.
-  * **reluctant?** or **lazy?**: returns true if the expression's quantifier is
-    reluctant.
-  * **possessive?**: returns true if the expression's quantifier is possessive.
-  * **multiline?** or **m?**: returns true if the expression has the m option
-  * **case_insensitive?** or **ignore_case?** or **i?**: returns true if the expression
-    has the i option
-  * **free_spacing?** or **extended?** or **x?**: returns true if the expression has the x
-    option
-A special expression class **Expression::Sequence** is used to hold the
-expressions of a branch within an **Expression::Alternation** expression. For
-example, the expression 'bat|cat|hat' would result in an alternation with 3
-sequences, one for each possible alternative.
-## Scanner Syntax
-The following syntax elements are supported by the scanner.
-- Alternation: a|b|c, etc.
-- Anchors: ^, $, \b, etc.
-- Character Classes _(aka Sets)_: [abc], [^\]]
-- Character Types: \d, \H, \s, etc.
-- Escape Sequences: \t, \+, \?, etc.
-- Grouped Expressions
-  - Assertions
-    - Lookahead: (?=abc)
-    - Negative Lookahead: (?!abc)
-    - Lookabehind: (?<=abc)
-    - Negative Lookbehind: (?<\!abc)
-  - Atomic: (?>abc)
-  - Back-references:
-    - Named: \k<name>
-    - Nest Level: \k<n-1>
-    - Numbered: \k<1>
-    - Relative: \k<-2>
-  - Capturing: (abc)
-  - Comment: (?# comment)
-  - Named: (?<name>abc)
-  - Options: (?mi-x:abc)
-  - Passive: (?:abc)
-  - Sub-expression Calls: \g<name>, \g<1>
-- Literals: abc, def?, etc.
-- POSIX classes: [:alpha:], [:print:], etc.
-- Quantifiers
-  - Greedy: ?, *, +, {m,M}
-  - Reluctant: ??, *?, +?, {m,M}?
-  - Possessive: ?+, *+, ++, {m,M}+
-- String Escapes
-  - Control: \C-C, \cD, etc.
-  - Hex: \x20, \x{701230}, etc.
-  - Meta: \M-c, \M-\C-C etc.
-  - Octal: \0, \01, \012
-  - Unicode: \uHHHH, \u{H+ H+}
-- Traditional Back-references: \1 thru \9
-- Unicode Properties:
-  - Age: \p{Age=2.1}, \P{age=5.2}, etc.
-  - Classes: \p{Alpha}, \P{Space}, etc.
-  - Derived Properties: \p{Math}, \P{Lowercase}, etc.
-  - General Categories: \p{Lu}, \P{Cs}, etc.
-  - Scripts: \p{Arabic}, \P{Hiragana}, etc.
-  - Simple Properties: \p{Dash}, \p{Extender}, etc.
-### Missing Features
-The following were added by the Onigmo regular expression library used by
-ruby 2.x and are not currently recognized by the scanner:
-- Planned for support
-  - Conditional Expressions: (?(cond)yes-subexp), (?(cond)yes-subexp|no-subexp)
-  - Negative POSIX Brackets: [:^alpha:], [:^digit:]
-  - New Character Set Options: d, a, and u _[see](https://github.com/k-takata/Onigmo/blob/master/doc/RE#L234)_
-- Not planned for support
-  - Keep: \K _(not enabled for ruby syntax)_
-  - Quotes: \Q...\E _(perl and java syntax only) [see](https://github.com/k-takata/Onigmo/blob/master/doc/RE#L452)_
-  - Capture History: (?@...), (?@<name>...) _(not enabled for ruby syntax) [see](https://github.com/k-takata/Onigmo/blob/master/doc/RE#L499)_
+## Supported Syntax
+The three modules support all the regular expression syntax features of Ruby 1.8
+, 1.9, and 2.x:
+_Note that not all of these are available in all versions of Ruby_
+| Syntax Feature                        | Examples                                                | &#x22ef; |
+| ------------------------------------- | ------------------------------------------------------- |:--------:|
+| **Alternation**                       | `a|b|c`                                                 | &#x2713; |
+| **Anchors**                           | `^`, `$`, `\b`                                          | &#x2713; |
+| **Character Classes**                 | `[abc]`, `[^\\]`, `[a-d&&g-h]`, `[a=e=b]`               | &#x2713; |
+| **Character Types**                   | `\d`, `\H`, `\s`                                        | &#x2713; |
+| **Conditional Exps.**                 | `(?(cond)yes-subexp)`, `(?(cond)yes-subexp|no-subexp)`  | &#x2713; |
+| **Escape Sequences**                  | `\t`, `\\+`, `\?`                                       | &#x2713; |
+| **Free Space**                        | whitespace and `# Comments` _(x modifier)_              | &#x2713; |
+| **Grouped Exps.**                     |                                                         | &#x22f1; |
+| &emsp;&nbsp;_**Assertions**_          |                                                         | &#x22f1; |
+| &emsp;&emsp;_Lookahead_               | `(?=abc)`                                               | &#x2713; |
+| &emsp;&emsp;_Negative Lookahead_      | `(?!abc)`                                               | &#x2713; |
+| &emsp;&emsp;_Lookbehind_              | `(?<=abc)`                                              | &#x2713; |
+| &emsp;&emsp;_Negative Lookbehind_     | `(?<!abc)`                                              | &#x2713; |
+| &emsp;&nbsp;_**Atomic**_              | `(?>abc)`                                               | &#x2713; |
+| &emsp;&nbsp;_**Back-references**_     |                                                         | &#x22f1; |
+| &emsp;&emsp;_Named_                   | `\k<name>`                                              | &#x2713; |
+| &emsp;&emsp;_Nest Level_              | `\k<n-1>`                                               | &#x2713; |
+| &emsp;&emsp;_Numbered_                | `\k<1>`                                                 | &#x2713; |
+| &emsp;&emsp;_Relative_                | `\k<-2>`                                                | &#x2713; |
+| &emsp;&emsp;_Traditional_             | `\1` thru `\9`                                          | &#x2713; |
+| &emsp;&nbsp;_**Capturing**_           | `(abc)`                                                 | &#x2713; |
+| &emsp;&nbsp;_**Comments**_            | `(?# comment text)`                                     | &#x2713; |
+| &emsp;&nbsp;_**Named**_               | `(?<name>abc)`, `(?'name'abc)`                          | &#x2713; |
+| &emsp;&nbsp;_**Options**_             | `(?mi-x:abc)`, `(?a:\s\w+)`                             | &#x2713; |
+| &emsp;&nbsp;_**Passive**_             | `(?:abc)`                                               | &#x2713; |
+| &emsp;&nbsp;_**Subexp. Calls**_       | `\g<name>`, `\g<1>`                                     | &#x2713; |
+| **Keep**                              | `\K`, `(ab\Kc|d\Ke)f`                                   | &#x2713; |
+| **Literals** _(utf-8)_                | `Ruby`, `ルビー`, `روبي`                                | &#x2713; |
+| **POSIX Classes**                     | `[:alpha:]`, `[:^digit:]`                               | &#x2713; |
+| **Quantifiers**                       |                                                         | &#x22f1; |
+| &emsp;&nbsp;_**Greedy**_              | `?`, `*`, `+`, `{m,M}`                                  | &#x2713; |
+| &emsp;&nbsp;_**Reluctant** (Lazy)_    | `??`, `*?`, `+?`, `{m,M}?`                              | &#x2713; |
+| &emsp;&nbsp;_**Possessive**_          | `?+`, `*+`, `++`, `{m,M}+`                              | &#x2713; |
+| **String Escapes**                    |                                                         | &#x22f1; |
+| &emsp;&nbsp;_**Control**_             | `\C-C`, `\cD`                                           | &#x2713; |
+| &emsp;&nbsp;_**Hex**_                 | `\x20`, `\x{701230}`                                    | &#x2713; |
+| &emsp;&nbsp;_**Meta**_                | `\M-c`, `\M-\C-C`                                       | &#x2713; |
+| &emsp;&nbsp;_**Octal**_               | `\0`, `\01`, `\012`                                     | &#x2713; |
+| &emsp;&nbsp;_**Unicode**_             | `\uHHHH`, `\u{H+ H+}`                                   | &#x2713; |
+| **Unicode Properties**                | _<sub>([Unicode 7.0.0](http://www.unicode.org/versions/Unicode7.0.0/))</sub>_ | &#x22f1; |
+| &emsp;&nbsp;_**Age**_                 | `\p{Age=5.2}`, `\P{age=7.0}`                            | &#x2713; |
+| &emsp;&nbsp;_**Classes**_             | `\p{Alpha}`, `\P{Space}`                                | &#x2713; |
+| &emsp;&nbsp;_**Derived**_             | `\p{Math}`, `\P{Lowercase}`                             | &#x2713; |
+| &emsp;&nbsp;_**General Categories**_  | `\p{Lu}`, `\P{Cs}`                                      | &#x2713; |
+| &emsp;&nbsp;_**Scripts**_             | `\p{Arabic}`, `\P{Hiragana}`                            | &#x2713; |
+| &emsp;&nbsp;_**Simple**_              | `\p{Dash}`, `\p{Extender}`                              | &#x2713; |
+<br/>
+##### Inapplicable Features
+Some modifiers, like `o` and `s`, apply to the **Regexp** object itself and do not
+appear in its source. Others such modifiers include the encoding modifiers `e` and `n`
+[See](http://www.ruby-doc.org/core-2.1.3/Regexp.html#class-Regexp-label-Encoding).
+These are not seen by the scanner.
+The following features are not currently enabled for Ruby by its regular
+expressions library (Onigmo). They are not supported by the scanner.
+  - **Quotes**: `\Q...\E` _<a href="https://github.com/k-takata/Onigmo/blob/master/doc/RE#L452/" title="Links to master branch, may change">[See]</a>_
+  - **Capture History**: `(?@...)`, `(?@<name>...)` _<a href="https://github.com/k-takata/Onigmo/blob/master/doc/RE#L499" title="Links to master branch, may change">[See]</a>_
 See something else missing? Please submit an [issue](https://github.com/ammar/regexp_parser/issues)
-_**Note**: Attempting to process expressions with any of the missing syntax features will
-cause an error._
+_**Note**: Attempting to process expressions with unsupported syntax features can raise an error,
+or incorrectly return tokens/objects as literals._
 ## Testing
@@ -366,38 +390,44 @@ tasks, which only run the tests for one component at a time. These are:
 * test:expression
 * test:syntax
-_A special task 'test:full' generatees the scanner's code from the ragel source files and
-runs all the tests. This requires ragel to be installed._
+_A special task 'test:full' generates the scanner's code from the ragel source files and
+runs all the tests. This task requires ragel to be installed._
-The tests use ruby's test_unit, so they can also be run with:
+The tests use ruby's test/unit, so they can also be run with:
 ```
-ruby test/test_all.rb
+ruby -Ilib test/test_all.rb
 ```
 This is useful when there is a need to focus on specific test files, for example:
 ```
-ruby test/scanner/test_properties.rb
+ruby -Ilib test/scanner/test_properties.rb
+```
+It is sometimes helpful during development to focus on a specific test case, for example:
+```
+ruby -Ilib test/expression/test_base.rb -n test_expression_to_re
 ```
 ## Building
-Building the scanner and the gem requires [ragel](http://www.complang.org/ragel/) to be
+Building the scanner and the gem requires [ragel](http://www.colm.net/open-source/ragel/) to be
 installed. The build tasks will automatically invoke the 'ragel:rb' task to generate the
 ruby scanner code.
-The project uses the standard rubygems package tasks:
+The project uses the standard rubygems package tasks, so:
-To build, run:
+To build the gem, run:
 ```
 rake build
 ```
-To install, run:
+To install the gem from the cloned project, run:
 ```
 rake install
 ```
@@ -408,14 +438,15 @@ Documentation and books used while working on this project.
 #### Ruby Flavors
-* Oniguruma Regular Expressions [link](http://www.geocities.jp/kosako3/oniguruma/doc/RE.txt)
-* Read Ruby > Regexps [link](https://github.com/runpaint/read-ruby/blob/master/src/regexps.xml)
+* Oniguruma Regular Expressions (Ruby 1.9.x) [link](http://www.geocities.jp/kosako3/oniguruma/doc/RE.txt)
+* Onigmo Regular Expressions (Ruby >= 2.0) [link](https://github.com/k-takata/Onigmo/blob/master/doc/RE)
 #### Regular Expressions
 * Mastering Regular Expressions, By Jeffrey E.F. Friedl (2nd Edition) [book](http://oreilly.com/catalog/9781565922570/)
 * Regular Expression Flavor Comparison [link](http://www.regular-expressions.info/refflavors.html)
 * Enumerating the strings of regular languages [link](http://www.cs.dartmouth.edu/~doug/nfa.ps.gz)
+* Stack Overflow Regular Expressions FAQ [link](http://stackoverflow.com/questions/22937618/reference-what-does-this-regex-mean/22944075#22944075)
 #### Unicode
@@ -425,18 +456,6 @@ Documentation and books used while working on this project.
 * Unicode Regular Expressions [link](http://www.unicode.org/reports/tr18/)
 * Unicode Standard Annex #44 [link](http://www.unicode.org/reports/tr44/)
-## Thanks
-This work is based on and inspired by the hard work and ideas of many people,
-directly or indirectly. The following are only a few of those that should be
-thanked.
-* Adrian Thurston, for developing [ragel](http://www.complang.org/ragel/).
-* Caleb Clausen, for feedback, which inspired this, valuable insights on structuring the parser,
-  and lots of [cool code](http://github.com/coatl).
-* Jan Goyvaerts, for his [excellent resource](http://www.regular-expressions.info) on regular expressions.
-* Run Paint Run Run, for his work on [Read Ruby](https://github.com/runpaint/read-ruby)
-* Yukihiro Matsumoto, of course! For "The Ruby", of course!
 ---
 ##### Copyright