regexp_parser 1.7.1 → 2.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: dd872b22bf04a288790ef0f73df9041f14fb88a08c2a03852d9dbbc238b452d6
4
- data.tar.gz: 4641097a24b5fa0f7b0c8e5aacc152587fe8b15d30f3f78bbec8157887b8b897
3
+ metadata.gz: 4d4ee1ebabfe19761461dc33344c1d5928be3d1f47b3064b5bf37206984ec43e
4
+ data.tar.gz: d4d0fae95d08fecedfe67d60849564fbe8fb971dafe1a8039e8b646eab23d765
5
5
  SHA512:
6
- metadata.gz: 858570df4a7047a2d8b09555b56de28a66ca4f8022e596c249900f5312f8e7fb9376384ca816bc3c08f3e324930702ad410a28b5be680adea6867e1f8075441e
7
- data.tar.gz: 0d70e7b4f18739826bb334fb305e335e44a354ae302214ca3c1884f66ace8680e48a9e4c64b890b220b82056da761084413c8b9b8c5e363382f5cf165b3d3448
6
+ metadata.gz: a78da1d206611573a47328e7904b0aba69203e00b9d33afb65a0fec1d22498cf1d16c761dbda6cc3af930c3fdb4fcc35932126e0fc048a8c6047c17485ce62ec
7
+ data.tar.gz: 3bc8081a187746c76fe5cb7d69519638e03f690533fe221c8b8a9285d537c95afcecb1aebc861ceea1252e6af55a117004f063dd319b0a402c503ae95fb5e0c7
@@ -1,5 +1,88 @@
1
1
  ## [Unreleased]
2
2
 
3
+ ## [2.0.1] - 2020-12-20 - [Janosch Müller](mailto:janosch84@gmail.com)
4
+
5
+ ### Fixed
6
+
7
+ - fixed error when scanning some group names
8
+ * this affected names containing hyphens, digits or multibyte chars, e.g. `/(?<a1>a)/`
9
+ * thanks to [Daniel Gollahon](https://github.com/dgollahon) for the report
10
+ - fixed error when scanning hex escapes with just one hex digit
11
+ * e.g. `/\x0A/` was scanned correctly, but the equivalent `/\xA/` was not
12
+ * thanks to [Daniel Gollahon](https://github.com/dgollahon) for the report
13
+
14
+ ## [2.0.0] - 2020-11-25 - [Janosch Müller](mailto:janosch84@gmail.com)
15
+
16
+ ### Changed
17
+
18
+ - some methods that used to return byte-based indices now return char-based indices
19
+ * the returned values have only changed for Regexps that contain multibyte chars
20
+ * this is only a breaking change if you used such methods directly AND relied on them pointing to bytes
21
+ * affected methods:
22
+ * `Regexp::Token` `#length`, `#offset`, `#te`, `#ts`
23
+ * `Regexp::Expression::Base` `#full_length`, `#offset`, `#starts_at`, `#te`, `#ts`
24
+ * thanks to [Akinori MUSHA](https://github.com/knu) for the report
25
+ - removed some deprecated methods/signatures
26
+ * these are rarely used and have been showing deprecation warnings for a long time
27
+ * `Regexp::Expression::Subexpression.new` with 3 arguments
28
+ * `Regexp::Expression::Root.new` without a token argument
29
+ * `Regexp::Expression.parsed`
30
+
31
+ ### Added
32
+
33
+ - `Regexp::Expression::Base#base_length`
34
+ * returns the character count of an expression body, ignoring any quantifier
35
+ - pragmatic, experimental support for chained quantifiers
36
+ * e.g.: `/^a{10}{4,6}$/` matches exactly 40, 50 or 60 `a`s
37
+ * successive quantifiers used to be silently dropped by the parser
38
+ * they are now wrapped with passive groups as if they were written `(?:a{10}){4,6}`
39
+ * thanks to [calfeld](https://github.com/calfeld) for reporting this a while back
40
+
41
+ ### Fixed
42
+
43
+ - incorrect encoding output for non-ascii comments
44
+ * this led to a crash when calling `#to_s` on parse results containing such comments
45
+ * thanks to [Michael Glass](https://github.com/michaelglass) for the report
46
+ - some crashes when scanning contrived patterns such as `'\😋'`
47
+
48
+ ### [1.8.2] - 2020-10-11 - [Janosch Müller](mailto:janosch84@gmail.com)
49
+
50
+ ### Fixed
51
+
52
+ - fix `FrozenError` in `Expression::Base#repetitions` on Ruby 3.0
53
+ * thanks to [Thomas Walpole](https://github.com/twalpole)
54
+ - removed "unknown future version" warning on Ruby 3.0
55
+
56
+ ### [1.8.1] - 2020-09-28 - [Janosch Müller](mailto:janosch84@gmail.com)
57
+
58
+ ### Fixed
59
+
60
+ - fixed scanning of comment-like text in normal mode
61
+ * this was an old bug, but had become more prevalent in v1.8.0
62
+ * thanks to [Tietew](https://github.com/Tietew) for the report
63
+ - specified correct minimum Ruby version in gemspec
64
+ * it said 1.9 but really required 2.0 as of v1.8.0
65
+
66
+ ### [1.8.0] - 2020-09-20 - [Janosch Müller](mailto:janosch84@gmail.com)
67
+
68
+ ### Changed
69
+
70
+ - dropped support for running on Ruby 1.9.x
71
+
72
+ ### Added
73
+
74
+ - regexp flags can now be passed when parsing a `String` as regexp body
75
+ * see the [README](/README.md#usage) for details
76
+ * thanks to [Owen Stephens](https://github.com/owst)
77
+ - bare occurrences of `\g` and `\k` are now allowed and scanned as literal escapes
78
+ * matches Onigmo behavior
79
+ * thanks for the report to [Marc-André Lafortune](https://github.com/marcandre)
80
+
81
+ ### Fixed
82
+
83
+ - fixed parsing comments without preceding space or trailing newline in x-mode
84
+ * thanks to [Owen Stephens](https://github.com/owst)
85
+
3
86
  ### [1.7.1] - 2020-06-07 - [Ammar Ali](mailto:ammarabuali@gmail.com)
4
87
 
5
88
  ### Fixed
data/README.md CHANGED
@@ -1,6 +1,6 @@
1
1
  # Regexp::Parser
2
2
 
3
- [![Gem Version](https://badge.fury.io/rb/regexp_parser.svg)](http://badge.fury.io/rb/regexp_parser) [![Build Status](https://secure.travis-ci.org/ammar/regexp_parser.svg?branch=master)](http://travis-ci.org/ammar/regexp_parser) [![Code Climate](https://codeclimate.com/github/ammar/regexp_parser.svg)](https://codeclimate.com/github/ammar/regexp_parser/badges)
3
+ [![Gem Version](https://badge.fury.io/rb/regexp_parser.svg)](http://badge.fury.io/rb/regexp_parser) [![Build Status](https://github.com/ammar/regexp_parser/workflows/tests/badge.svg)](https://github.com/ammar/regexp_parser/actions) [![Code Climate](https://codeclimate.com/github/ammar/regexp_parser.svg)](https://codeclimate.com/github/ammar/regexp_parser/badges)
4
4
 
5
5
  A Ruby gem for tokenizing, parsing, and transforming regular expressions.
6
6
 
@@ -8,8 +8,8 @@ A Ruby gem for tokenizing, parsing, and transforming regular expressions.
8
8
  * A scanner/tokenizer based on [Ragel](http://www.colm.net/open-source/ragel/)
9
9
  * A lexer that produces a "stream" of token objects.
10
10
  * A parser that produces a "tree" of Expression objects (OO API)
11
- * Runs on Ruby 1.9, 2.x, and JRuby (1.9 mode) runtimes.
12
- * Recognizes Ruby 1.8, 1.9, and 2.x regular expressions [See Supported Syntax](#supported-syntax)
11
+ * Runs on Ruby 2.x, 3.x and JRuby runtimes
12
+ * Recognizes Ruby 1.8, 1.9, 2.x and 3.x regular expressions [See Supported Syntax](#supported-syntax)
13
13
 
14
14
 
15
15
  _For examples of regexp_parser in use, see [Example Projects](#example-projects)._
@@ -18,13 +18,10 @@ _For examples of regexp_parser in use, see [Example Projects](#example-projects)
18
18
  ---
19
19
  ## Requirements
20
20
 
21
- * Ruby >= 1.9
21
+ * Ruby >= 2.0
22
22
  * Ragel >= 6.0, but only if you want to build the gem or work on the scanner.
23
23
 
24
24
 
25
- _Note: See the .travis.yml file for covered versions._
26
-
27
-
28
25
  ---
29
26
  ## Install
30
27
 
@@ -72,6 +69,17 @@ called with the results as follows:
72
69
  * **Parser**: after completion, the block gets passed the root expression.
73
70
  _The result of the block is returned._
74
71
 
72
+ All three methods accept either a `Regexp` or `String` (containing the pattern)
73
+ - if a String is passed, `options` can be supplied:
74
+
75
+ ```ruby
76
+ require 'regexp_parser'
77
+
78
+ Regexp::Parser.parse(
79
+ "a+ # Recognises a and A...",
80
+ options: ::Regexp::EXTENDED | ::Regexp::IGNORECASE
81
+ )
82
+ ```
75
83
 
76
84
  ---
77
85
  ## Components
@@ -306,7 +314,7 @@ Expression class. See the next section for details._
306
314
 
307
315
  ## Supported Syntax
308
316
  The three modules support all the regular expression syntax features of Ruby 1.8,
309
- 1.9, and 2.x:
317
+ 1.9, 2.x and 3.x:
310
318
 
311
319
  _Note that not all of these are available in all versions of Ruby_
312
320
 
@@ -429,13 +437,17 @@ rake install
429
437
  ## Example Projects
430
438
  Projects using regexp_parser.
431
439
 
440
+ - [capybara](https://github.com/teamcapybara/capybara) is an integration testing tool that uses regexp_parser to convert Regexps to css/xpath selectors.
441
+
442
+ - [js_regex](https://github.com/janosch-x/js_regex) converts Ruby regular expressions to JavaScript-compatible regular expressions.
443
+
432
444
  - [meta_re](https://github.com/ammar/meta_re) is a regular expression preprocessor with alias support.
433
445
 
434
446
  - [mutant](https://github.com/mbj/mutant) (before v0.9.0) manipulates your regular expressions (amongst others) to see if your tests cover their behavior.
435
447
 
436
- - [twitter-cldr-rb](https://github.com/twitter/twitter-cldr-rb) uses regexp_parser to generate examples of postal codes.
448
+ - [rubocop](https://github.com/rubocop-hq/rubocop) is a linter for Ruby that uses regexp_parser to lint Regexps.
437
449
 
438
- - [js_regex](https://github.com/janosch-x/js_regex) converts Ruby regular expressions to JavaScript-compatible regular expressions.
450
+ - [twitter-cldr-rb](https://github.com/twitter/twitter-cldr-rb) is a localization helper that uses regexp_parser to generate examples of postal codes.
439
451
 
440
452
 
441
453
  ## References
@@ -464,4 +476,4 @@ Documentation and books used while working on this project.
464
476
 
465
477
  ---
466
478
  ##### Copyright
467
- _Copyright (c) 2010-2019 Ammar Ali. See LICENSE file for details._
479
+ _Copyright (c) 2010-2020 Ammar Ali. See LICENSE file for details._
@@ -34,6 +34,10 @@ module Regexp::Expression
34
34
 
35
35
  alias :starts_at :ts
36
36
 
37
+ def base_length
38
+ to_s(:base).length
39
+ end
40
+
37
41
  def full_length
38
42
  to_s.length
39
43
  end
@@ -80,8 +84,12 @@ module Regexp::Expression
80
84
  return 1..1 unless quantified?
81
85
  min = quantifier.min
82
86
  max = quantifier.max < 0 ? Float::INFINITY : quantifier.max
83
- # fix Range#minmax - https://bugs.ruby-lang.org/issues/15807
84
- (min..max).tap { |r| r.define_singleton_method(:minmax) { [min, max] } }
87
+ range = min..max
88
+ # fix Range#minmax on old Rubies - https://bugs.ruby-lang.org/issues/15807
89
+ if RUBY_VERSION.to_f < 2.7
90
+ range.define_singleton_method(:minmax) { [min, max] }
91
+ end
92
+ range
85
93
  end
86
94
 
87
95
  def greedy?
@@ -114,23 +122,6 @@ module Regexp::Expression
114
122
  alias :to_h :attributes
115
123
  end
116
124
 
117
- def self.parsed(exp)
118
- warn('WARNING: Regexp::Expression::Base.parsed is buggy and '\
119
- 'will be removed in 2.0.0. Use Regexp::Parser.parse instead.')
120
- case exp
121
- when String
122
- Regexp::Parser.parse(exp)
123
- when Regexp
124
- Regexp::Parser.parse(exp.source) # <- causes loss of root options
125
- when Regexp::Expression # <- never triggers
126
- exp
127
- else
128
- raise ArgumentError, 'Expression.parsed accepts a String, Regexp, or '\
129
- 'a Regexp::Expression as a value for exp, but it '\
130
- "was given #{exp.class.name}."
131
- end
132
- end
133
-
134
125
  end # module Regexp::Expression
135
126
 
136
127
  require 'regexp_parser/expression/quantifier'
@@ -10,9 +10,24 @@ module Regexp::Expression
10
10
  def comment?; false end
11
11
  end
12
12
 
13
- class Atomic < Group::Base; end
14
- class Passive < Group::Base; end
13
+ class Passive < Group::Base
14
+ attr_writer :implicit
15
+
16
+ def to_s(format = :full)
17
+ if implicit?
18
+ "#{expressions.join}#{quantifier_affix(format)}"
19
+ else
20
+ super
21
+ end
22
+ end
23
+
24
+ def implicit?
25
+ @implicit ||= false
26
+ end
27
+ end
28
+
15
29
  class Absence < Group::Base; end
30
+ class Atomic < Group::Base; end
16
31
  class Options < Group::Base
17
32
  attr_accessor :option_changes
18
33
  end
@@ -1,24 +1,12 @@
1
1
  module Regexp::Expression
2
2
 
3
3
  class Root < Regexp::Expression::Subexpression
4
- # TODO: this override is here for backwards compatibility, remove in 2.0.0
5
- def initialize(*args)
6
- unless args.first.is_a?(Regexp::Token)
7
- warn('WARNING: Root.new without a Token argument is deprecated and '\
8
- 'will be removed in 2.0.0. Use Root.build for the old behavior.')
9
- return super(self.class.build_token, *args)
10
- end
11
- super
4
+ def self.build(options = {})
5
+ new(build_token, options)
12
6
  end
13
7
 
14
- class << self
15
- def build(options = {})
16
- new(build_token, options)
17
- end
18
-
19
- def build_token
20
- Regexp::Token.new(:expression, :root, '', 0)
21
- end
8
+ def self.build_token
9
+ Regexp::Token.new(:expression, :root, '', 0)
22
10
  end
23
11
  end
24
12
  end
@@ -40,5 +40,14 @@ module Regexp::Expression
40
40
  RUBY
41
41
  end
42
42
  alias :lazy? :reluctant?
43
+
44
+ def ==(other)
45
+ other.class == self.class &&
46
+ other.token == token &&
47
+ other.mode == mode &&
48
+ other.min == min &&
49
+ other.max == max
50
+ end
51
+ alias :eq :==
43
52
  end
44
53
  end
@@ -7,16 +7,6 @@ module Regexp::Expression
7
7
  # Used as the base class for the Alternation alternatives, Conditional
8
8
  # branches, and CharacterSet::Intersection intersected sequences.
9
9
  class Sequence < Regexp::Expression::Subexpression
10
- # TODO: this override is here for backwards compatibility, remove in 2.0.0
11
- def initialize(*args)
12
- if args.count == 3
13
- warn('WARNING: Sequence.new without a Regexp::Token argument is '\
14
- 'deprecated and will be removed in 2.0.0.')
15
- return self.class.at_levels(*args)
16
- end
17
- super
18
- end
19
-
20
10
  class << self
21
11
  def add_to(subexpression, params = {}, active_opts = {})
22
12
  sequence = at_levels(
@@ -11,11 +11,11 @@ class Regexp::Lexer
11
11
 
12
12
  CLOSING_TOKENS = [:close].freeze
13
13
 
14
- def self.lex(input, syntax = "ruby/#{RUBY_VERSION}", &block)
15
- new.lex(input, syntax, &block)
14
+ def self.lex(input, syntax = "ruby/#{RUBY_VERSION}", options: nil, &block)
15
+ new.lex(input, syntax, options: options, &block)
16
16
  end
17
17
 
18
- def lex(input, syntax = "ruby/#{RUBY_VERSION}", &block)
18
+ def lex(input, syntax = "ruby/#{RUBY_VERSION}", options: nil, &block)
19
19
  syntax = Regexp::Syntax.new(syntax)
20
20
 
21
21
  self.tokens = []
@@ -25,7 +25,7 @@ class Regexp::Lexer
25
25
  self.shift = 0
26
26
 
27
27
  last = nil
28
- Regexp::Scanner.scan(input) do |type, token, text, ts, te|
28
+ Regexp::Scanner.scan(input, options: options) do |type, token, text, ts, te|
29
29
  type, token = *syntax.normalize(type, token)
30
30
  syntax.check! type, token
31
31
 
@@ -96,10 +96,10 @@ class Regexp::Lexer
96
96
 
97
97
  tokens.pop
98
98
  tokens << Regexp::Token.new(:literal, :literal, lead,
99
- token.ts, (token.te - last.bytesize),
99
+ token.ts, (token.te - last.length),
100
100
  nesting, set_nesting, conditional_nesting)
101
101
  tokens << Regexp::Token.new(:literal, :literal, last,
102
- (token.ts + lead.bytesize), token.te,
102
+ (token.ts + lead.length), token.te,
103
103
  nesting, set_nesting, conditional_nesting)
104
104
  end
105
105
 
@@ -18,12 +18,12 @@ class Regexp::Parser
18
18
  end
19
19
  end
20
20
 
21
- def self.parse(input, syntax = "ruby/#{RUBY_VERSION}", &block)
22
- new.parse(input, syntax, &block)
21
+ def self.parse(input, syntax = "ruby/#{RUBY_VERSION}", options: nil, &block)
22
+ new.parse(input, syntax, options: options, &block)
23
23
  end
24
24
 
25
- def parse(input, syntax = "ruby/#{RUBY_VERSION}", &block)
26
- root = Root.build(options_from_input(input))
25
+ def parse(input, syntax = "ruby/#{RUBY_VERSION}", options: nil, &block)
26
+ root = Root.build(extract_options(input, options))
27
27
 
28
28
  self.root = root
29
29
  self.node = root
@@ -35,7 +35,7 @@ class Regexp::Parser
35
35
 
36
36
  self.captured_group_counts = Hash.new(0)
37
37
 
38
- Regexp::Lexer.scan(input, syntax) do |token|
38
+ Regexp::Lexer.scan(input, syntax, options: options) do |token|
39
39
  parse_token(token)
40
40
  end
41
41
 
@@ -54,14 +54,20 @@ class Regexp::Parser
54
54
  :options_stack, :switching_options, :conditional_nesting,
55
55
  :captured_group_counts
56
56
 
57
- def options_from_input(input)
58
- return {} unless input.is_a?(::Regexp)
57
+ def extract_options(input, options)
58
+ if options && !input.is_a?(String)
59
+ raise ArgumentError, 'options cannot be supplied unless parsing a String'
60
+ end
61
+
62
+ options = input.options if input.is_a?(::Regexp)
59
63
 
60
- options = {}
61
- options[:i] = true if input.options & ::Regexp::IGNORECASE != 0
62
- options[:m] = true if input.options & ::Regexp::MULTILINE != 0
63
- options[:x] = true if input.options & ::Regexp::EXTENDED != 0
64
- options
64
+ return {} unless options
65
+
66
+ enabled_options = {}
67
+ enabled_options[:i] = true if options & ::Regexp::IGNORECASE != 0
68
+ enabled_options[:m] = true if options & ::Regexp::MULTILINE != 0
69
+ enabled_options[:x] = true if options & ::Regexp::EXTENDED != 0
70
+ enabled_options
65
71
  end
66
72
 
67
73
  def nest(exp)
@@ -432,6 +438,28 @@ class Regexp::Parser
432
438
  target_node || raise(ArgumentError, 'No valid target found for '\
433
439
  "'#{token.text}' ")
434
440
 
441
+ # in case of chained quantifiers, wrap target in an implicit passive group
442
+ # description of the problem: https://github.com/ammar/regexp_parser/issues/3
443
+ # rationale for this solution: https://github.com/ammar/regexp_parser/pull/69
444
+ if target_node.quantified?
445
+ new_token = Regexp::Token.new(
446
+ :group,
447
+ :passive,
448
+ '', # text
449
+ target_node.ts,
450
+ nil, # te (unused)
451
+ target_node.level,
452
+ target_node.set_level,
453
+ target_node.conditional_level
454
+ )
455
+ new_group = Group::Passive.new(new_token, active_opts)
456
+ new_group.implicit = true
457
+ new_group << target_node
458
+ increase_level(target_node)
459
+ node.expressions[offset] = new_group
460
+ target_node = new_group
461
+ end
462
+
435
463
  case token.token
436
464
  when :zero_or_one
437
465
  target_node.quantify(:zero_or_one, token.text, 0, 1, :greedy)
@@ -462,6 +490,11 @@ class Regexp::Parser
462
490
  end
463
491
  end
464
492
 
493
+ def increase_level(exp)
494
+ exp.level += 1
495
+ exp.respond_to?(:each) && exp.each { |subexp| increase_level(subexp) }
496
+ end
497
+
465
498
  def interval(target_node, token)
466
499
  text = token.text
467
500
  mchr = text[text.length-1].chr =~ /[?+]/ ? text[text.length-1].chr : nil