RubyGems - regexp-examples - Versions diffs - 1.0.1 → 1.0.2 - Mend

regexp-examples 1.0.1 → 1.0.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (6) hide show

checksums.yaml +4 -4
data/README.md +20 -9
data/lib/regexp-examples/parser.rb +98 -72
data/lib/regexp-examples/version.rb +1 -1
data/spec/regexp-examples_spec.rb +5 -2
metadata +3 -3

checksums.yaml CHANGED

@@ -1,7 +1,7 @@
 ---
 SHA1:
-  metadata.gz: fc845182adb1adaeed70de6139d27711a69dc81f
-  data.tar.gz: 3d1850382acaf7ee4c96c9acf924a585cec6939b
+  metadata.gz: e4648ff5cf5c73b7916f58099a989ad58619e2d1
+  data.tar.gz: 9a9d53ceaf5a89f1f363124fad033b46b1489774
 SHA512:
-  metadata.gz: 0b2a8ff8619ba8bc4186a27491dac7140ff8e0d7e4cb87ccfd8e9047b0f392c7152e2988666edaf4df6a1d2db0961ac1dfdc0af32b9d50c3ddbda6ff60814c97
-  data.tar.gz: f3690a8f6d2089b57a57246d9ec2b25252a2bee5972536ef095bf45358348ce1c322e3d3b04b69121e8530e155c2c9c4542f5f48cdc6a83ea6366eb246af3f94
+  metadata.gz: 77997419f70d44cde2181c9a61f81a7e34456de573ab0bfbe46d1dcb35a6350472b3f09b4f9853aa657bca17af8b220d6179649227fda29e69a203f4a467b668
+  data.tar.gz: 5c830560b6485f7a02bb9ad41fdc0f316ca4b0685d079ea8706cf170a4298e88f113b8bb8eb9bfc0cce18e22e733e235f03ab42120c60bd7828628ae779f5666

data/README.md CHANGED

@@ -5,7 +5,7 @@
 Extends the Regexp class with the method: Regexp#examples
-This method generates a list of (some\*) strings that will match the given regular expression
+This method generates a list of (some\*) strings that will match the given regular expression.
 \* If the regex has an infinite number of possible srings that match it, such as `/a*b+c{2,}/`,
 or a huge number of possible matches, such as `/.\w/`, then only a subset of these will be listed.
@@ -22,9 +22,15 @@ For more detail on this, see [configuration options](#configuration-options).
   # 'http://www.github.com', 'https://github.com', 'https://www.github.com']
 /(I(N(C(E(P(T(I(O(N)))))))))*/.examples #=> ["", "INCEPTION", "INCEPTIONINCEPTION"]
 /\x74\x68\x69\x73/.examples #=> ["this"]
-/\u6829/.examples #=> ["栩"]
 /what about (backreferences\?) \1/.examples
   #=> ['what about backreferences? backreferences?']
+/
+  \u{28}\u2022\u{5f}\u2022\u{29}
+  |
+  \u{28}\u{20}\u2022\u{5f}\u2022\u{29}\u{3e}\u2310\u25a0\u{2d}\u25a0\u{20}
+  |
+  \u{28}\u2310\u25a0\u{5f}\u25a0\u{29}
+/x.examples #=> ["(•_•)", "( •_•)>⌐■-■ ", "(⌐■_■)"]
 ```
 ## Installation
@@ -45,6 +51,10 @@ Or install it yourself as:
 ## Supported syntax
+Short answer: **Everything** is supported, apart from "irregular" aspects of the regexp language -- see [impossible features](#impossible-features-illegal-syntax)
+Long answer:
 * All forms of repeaters (quantifiers), e.g. `/a*/`, `/a+/`, `/a?/`, `/a{1,4}/`, `/a{3,}/`, `/a{,2}/`
   * Reluctant and possissive repeaters work fine, too, e.g. `/a*?/`, `/a*+/`
 * Boolean "Or" groups, e.g. `/a|b|c/`
@@ -57,8 +67,9 @@ Or install it yourself as:
 * Escaped characters, e.g. `/\n/`, `/\w/`, `/\D/` (and so on...)
 * Capture groups, e.g. `/(group)/`
   * Including named groups, e.g. `/(?<name>group)/`
-  * ...And backreferences(!!!), e.g. `/(this|that) \1/` `/(?<name>foo) \k<name>/`
-  * Groups work fine, even if nested or optional, e.g. `/(even(this(works?))) \1 \2 \3/`, `/what about (this)? \1/`
+  * And backreferences(!!!), e.g. `/(this|that) \1/` `/(?<name>foo) \k<name>/`
+  * ...even for the more "obscure" syntax, e.g. `/(?<future>the) \k'future'/`, `/(a)(b) \k<-1>/``
+  * ...and even if nested or optional, e.g. `/(even(this(works?))) \1 \2 \3/`, `/what about (this)? \1/`
   * Non-capture groups, e.g. `/(?:foo)/`
   * Comment groups, e.g. `/foo(?#comment)bar/`
 * Control characters, e.g. `/\ca/`, `/\cZ/`, `/\C-9/`
@@ -66,7 +77,7 @@ Or install it yourself as:
 * Unicode characters, e.g. `/\u0123/`, `/\uabcd/`, `/\u{789}/`
 * Octal characters, e.g. `/\10/`, `/\177/`
 * Named properties, e.g. `/\p{L}/` ("Letter"), `/\p{Arabic}/` ("Arabic character")
-, `/\p{^Ll}/` ("Not a lowercase letter"), `\P{^Canadian_Aboriginal}` ("Not not a Canadian aboriginal character")
+, `/\p{^Ll}/` ("Not a lowercase letter"), `/\P{^Canadian_Aboriginal}/` ("Not not a Canadian aboriginal character")
 * **Arbitrarily complex combinations of all the above!**
 * Regexp options can also be used:
@@ -77,13 +88,13 @@ Or install it yourself as:
 ## Bugs and Not-Yet-Supported syntax
-* There are some (rare) edge cases where backreferences do not work properly, e.g. `/(a*)a* \1/.examples` - which includes "aaaa aa". This is because each repeater is not context-aware, so the "greediness" logic is flawed. (E.g. in this case, the second `a*` should always evaluate to an empty string, because the previous `a*` was greedy! However, patterns like this are highly unusual...
+* There are some (rare) edge cases where backreferences do not work properly, e.g. `/(a*)a* \1/.examples` - which includes "aaaa aa". This is because each repeater is not context-aware, so the "greediness" logic is flawed. (E.g. in this case, the second `a*` should always evaluate to an empty string, because the previous `a*` was greedy! However, patterns like this are highly unusual...)
 * Some named properties, e.g. `/\p{Arabic}/`, list non-matching examples for ruby 2.0/2.1 (as the definitions changed in ruby 2.2). This will be fixed in version 1.1.0 (see the pending pull request)!
-There are also some various (increasingly obscure) unsupported bits of syntax; some of which I haven't yet investigated. Much of this is not even mentioned in the ruby docs! Full documentation on all the intricate obscurities in the ruby (version 2.x) regexp parser can be found [here](https://raw.githubusercontent.com/k-takata/Onigmo/master/doc/RE). To name a few:
+Since the Regexp language is so vast, it's quite likely I've missed something (please raise an issue if you find something)! The only missing feature that I'm currently aware of is:
 * Conditional capture groups, e.g. `/(group1)? (?(1)yes|no)/.examples` (which *should* return: `["group1 yes", " no"]`)
-* Back reference by relative group number, e.g. `/(a)(b)(c)(d) \k<-2>/.examples` (which *should* return: `["abcd c"]`)
-* Back reference using single quotes, and for group numbers, e.g. `/(a) \k'1'/.examples` (which is really just alternative syntax for `/(a) \1/`!)
+Some of the most obscure regexp features are not even mentioned in the ruby docs! However, full documentation on all the intricate obscurities in the ruby (version 2.x) regexp parser can be found [here](https://raw.githubusercontent.com/k-takata/Onigmo/master/doc/RE).
 ## Impossible features ("illegal syntax")

data/lib/regexp-examples/parser.rb CHANGED

@@ -46,31 +46,60 @@ module RegexpExamples
       when '\\'
         group = parse_after_backslash_group
       when '^'
-        if @current_position == 0
-          group = PlaceHolderGroup.new # Ignore the "illegal" character
-        else
-          raise IllegalSyntaxError, "Anchors ('#{next_char}') cannot be supported, as they are not regular"
-        end
+        group = parse_caret
       when '$'
-        if @current_position == (regexp_string.length - 1)
-          group = PlaceHolderGroup.new # Ignore the "illegal" character
-        else
-          raise IllegalSyntaxError, "Anchors ('#{next_char}') cannot be supported, as they are not regular"
-        end
+        group = parse_dollar
       when /[#\s]/
-        if @extended
-          parse_extended_whitespace
-          group = PlaceHolderGroup.new # Ignore the whitespace/comment
-        else
-          group = parse_single_char_group(next_char)
-        end
+        group = parse_extended_whitespace
       else
         group = parse_single_char_group(next_char)
       end
       group
     end
+    def parse_repeater(group)
+      case next_char
+      when '*'
+        repeater = parse_star_repeater(group)
+      when '+'
+        repeater = parse_plus_repeater(group)
+      when '?'
+        repeater = parse_question_mark_repeater(group)
+      when '{'
+        repeater = parse_range_repeater(group)
+      else
+        repeater = parse_one_time_repeater(group)
+      end
+      repeater
+    end
+    def parse_caret
+      if @current_position == 0
+        return PlaceHolderGroup.new # Ignore the "illegal" character
+      else
+        raise_anchors_exception!
+      end
+    end
+    def parse_dollar
+      if @current_position == (regexp_string.length - 1)
+        return PlaceHolderGroup.new # Ignore the "illegal" character
+      else
+        raise_anchors_exception!
+      end
+    end
     def parse_extended_whitespace
+      if @extended
+        skip_whitespace
+        group = PlaceHolderGroup.new # Ignore the whitespace/comment
+      else
+        group = parse_single_char_group(next_char)
+      end
+      group
+    end
+    def skip_whitespace
       whitespace_chars = rest_of_string.match(/#.*|\s+/)[0]
       @current_position += whitespace_chars.length - 1
     end
@@ -81,9 +110,11 @@ module RegexpExamples
       when rest_of_string =~ /\A(\d{1,3})/
         @current_position += ($1.length - 1) # In case of 10+ backrefs!
         group = parse_backreference_group($1)
-      when rest_of_string =~ /\Ak<([^>]+)>/ # Named capture group
+      when rest_of_string =~ /\Ak['<]([\w-]+)['>]/ # Named capture group
         @current_position += ($1.length + 2)
-        group = parse_backreference_group($1)
+        # Check for RELATIVE group number, e.g. /(a)(b)(c)(d) \k<-2>/
+        group_id = ($1.to_i < 0) ? (@num_groups + $1.to_i + 1) : $1
+        group = parse_backreference_group(group_id)
       when BackslashCharMap.keys.include?(next_char)
         group = CharGroup.new(
           BackslashCharMap[next_char].dup,
@@ -117,18 +148,18 @@ module RegexpExamples
       when next_char == 'g' # Subexpression call
         raise IllegalSyntaxError, "Subexpression calls (\\g) cannot be supported, as they are not regular"
       when next_char =~ /[bB]/ # Anchors
-        raise IllegalSyntaxError, "Anchors ('\\#{next_char}') cannot be supported, as they are not regular"
+        raise_anchors_exception!
       when next_char =~ /[AG]/ # Start of string
         if @current_position == 1
           group = PlaceHolderGroup.new
         else
-          raise IllegalSyntaxError, "Anchors ('\\#{next_char}') cannot be supported, as they are not regular"
+          raise_anchors_exception!
         end
       when next_char =~ /[zZ]/ # End of string
         if @current_position == (regexp_string.length - 1)
           group = PlaceHolderGroup.new
         else
-          raise IllegalSyntaxError, "Anchors ('\\#{next_char}') cannot be supported, as they are not regular"
+          raise_anchors_exception!
         end
       else
         group = parse_single_char_group( next_char )
@@ -136,31 +167,13 @@ module RegexpExamples
       group
     end
-    def parse_repeater(group)
-      case next_char
-      when '*'
-        repeater = parse_star_repeater(group)
-      when '+'
-        repeater = parse_plus_repeater(group)
-      when '?'
-        repeater = parse_question_mark_repeater(group)
-      when '{'
-        repeater = parse_range_repeater(group)
-      else
-        repeater = parse_one_time_repeater(group)
-      end
-      repeater
-    end
     def parse_multi_group
       @current_position += 1
       @num_groups += 1
-      group_id = nil # init
-      previous_ignorecase = @ignorecase
-      previous_multiline = @multiline
-      previous_extended = @extended
-      rest_of_string.match(
-        /
+      remember_old_regexp_options do
+        group_id = nil # init
+        rest_of_string.match(
+          /
           \A
           (\?)?               # Is it a "special" group, i.e. starts with a "?"?
             (
@@ -175,39 +188,48 @@ module RegexpExamples
                 |[^>]+        # Named capture
               )
               |[mix]*-?[mix]* # Option toggle
-          )?
-        /x
-      ) do |match|
-        case
-        when match[1].nil? # e.g. /(normal)/
-          group_id = @num_groups.to_s
-        when match[2] == ':' # e.g. /(?:nocapture)/
-          @current_position += 2
-        when match[2] == '#' # e.g. /(?#comment)/
-          comment_group = rest_of_string.match(/.*?[^\\](?:\\{2})*\)/)[0]
-          @current_position += comment_group.length
-        when match[2] =~ /\A(?=[mix-]+)([mix]*)-?([mix]*)/ # e.g. /(?i-mx)/
-          regexp_options_toggle($1, $2)
-          @current_position += $&.length + 1
-          if next_char == ':' # e.g. /(?i:subexpr)/
-            @current_position += 1
-          else
-            return PlaceHolderGroup.new
+            )?
+          /x
+        ) do |match|
+          case
+          when match[1].nil? # e.g. /(normal)/
+            group_id = @num_groups.to_s
+          when match[2] == ':' # e.g. /(?:nocapture)/
+            @current_position += 2
+          when match[2] == '#' # e.g. /(?#comment)/
+            comment_group = rest_of_string.match(/.*?[^\\](?:\\{2})*\)/)[0]
+            @current_position += comment_group.length
+          when match[2] =~ /\A(?=[mix-]+)([mix]*)-?([mix]*)/ # e.g. /(?i-mx)/
+            regexp_options_toggle($1, $2)
+            @num_groups -= 1 # Toggle "groups" should not increase backref group count
+            @current_position += $&.length + 1
+            if next_char == ':' # e.g. /(?i:subexpr)/
+              @current_position += 1
+            else
+              return PlaceHolderGroup.new
+            end
+          when %w(! =).include?(match[2]) # e.g. /(?=lookahead)/, /(?!neglookahead)/
+            raise IllegalSyntaxError, "Lookaheads are not regular; cannot generate examples"
+          when %w(! =).include?(match[3]) # e.g. /(?<=lookbehind)/, /(?<!neglookbehind)/
+            raise IllegalSyntaxError, "Lookbehinds are not regular; cannot generate examples"
+          else # e.g. /(?<name>namedgroup)/
+            @current_position += (match[3].length + 3)
+            group_id = match[3]
           end
-        when %w(! =).include?(match[2]) # e.g. /(?=lookahead)/, /(?!neglookahead)/
-          raise IllegalSyntaxError, "Lookaheads are not regular; cannot generate examples"
-        when %w(! =).include?(match[3]) # e.g. /(?<=lookbehind)/, /(?<!neglookbehind)/
-          raise IllegalSyntaxError, "Lookbehinds are not regular; cannot generate examples"
-        else # e.g. /(?<name>namedgroup)/
-          @current_position += (match[3].length + 3)
-          group_id = match[3]
         end
+        MultiGroup.new(parse, group_id)
       end
-      groups = parse
+    end
+    def remember_old_regexp_options
+      previous_ignorecase = @ignorecase
+      previous_multiline = @multiline
+      previous_extended = @extended
+      group = yield
       @ignorecase = previous_ignorecase
       @multiline = previous_multiline
       @extended = previous_extended
-      MultiGroup.new(groups, group_id)
+      group
     end
     def regexp_options_toggle(on, off)
@@ -246,8 +268,8 @@ module RegexpExamples
       SingleCharGroup.new(char, @ignorecase)
     end
-    def parse_backreference_group(match)
-      BackReferenceGroup.new(match)
+    def parse_backreference_group(group_id)
+      BackReferenceGroup.new(group_id)
     end
     def parse_control_character(char)
@@ -308,6 +330,10 @@ module RegexpExamples
         repeater
     end
+    def raise_anchors_exception!
+      raise IllegalSyntaxError, "Anchors ('#{next_char}') cannot be supported, as they are not regular"
+    end
     def parse_one_time_repeater(group)
       OneTimeRepeater.new(group)
     end

data/lib/regexp-examples/version.rb CHANGED

@@ -1,3 +1,3 @@
 module RegexpExamples
-  VERSION = '1.0.1'
+  VERSION = '1.0.2'
 end

data/spec/regexp-examples_spec.rb CHANGED

@@ -98,7 +98,8 @@ RSpec.describe Regexp, "#examples" do
         /(normal)/,
         /(?:nocapture)/,
         /(?<name>namedgroup)/,
-        /(?<name>namedgroup) \k<name>/
+        /(?<name>namedgroup) \k<name>/,
+        /(?<name>namedgroup) \k'name'/
       )
     end
@@ -124,7 +125,8 @@ RSpec.describe Regexp, "#examples" do
         /(a?(b?(c?(d?(e?)))))/,
         /(a)? \1/,
         /(a|(b)) \2/,
-        /([ab]){2} \1/ # \1 should always be the LAST result of the capture group
+        /([ab]){2} \1/, # \1 should always be the LAST result of the capture group
+        /(ref1) (ref2) \k'1' \k<-1>/, # RELATIVE backref!
       )
     end
@@ -326,6 +328,7 @@ RSpec.describe Regexp, "#examples" do
           it { expect(/a(?i)b(?-i)c/.examples).to eq %w{abc aBc}}
           it { expect(/a(?x)   b(?-x) c/.examples).to eq %w{ab\ c}}
           it { expect(/(?m)./.examples(max_group_results: 999)).to include "\n" }
+          it { expect(/(?i)(a)-\1/.examples).to eq %w{a-a A-A}} # Toggle "groups" should not increase backref group count
         end
         context "subexpression" do
           it { expect(/a(?i:b)c/.examples).to eq %w{abc aBc}}

metadata CHANGED

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: regexp-examples
 version: !ruby/object:Gem::Version
-  version: 1.0.1
+  version: 1.0.2
 platform: ruby
 authors:
 - Tom Lord
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2015-03-04 00:00:00.000000000 Z
+date: 2015-03-07 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: bundler
@@ -85,7 +85,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
       version: '0'
 requirements: []
 rubyforge_project:
-rubygems_version: 2.2.2
+rubygems_version: 2.4.5
 signing_key:
 specification_version: 4
 summary: Extends the Regexp class with '#examples'