regexp_parser 1.7.1 → 2.0.1

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: dd872b22bf04a288790ef0f73df9041f14fb88a08c2a03852d9dbbc238b452d6
4
- data.tar.gz: 4641097a24b5fa0f7b0c8e5aacc152587fe8b15d30f3f78bbec8157887b8b897
3
+ metadata.gz: 4d4ee1ebabfe19761461dc33344c1d5928be3d1f47b3064b5bf37206984ec43e
4
+ data.tar.gz: d4d0fae95d08fecedfe67d60849564fbe8fb971dafe1a8039e8b646eab23d765
5
5
  SHA512:
6
- metadata.gz: 858570df4a7047a2d8b09555b56de28a66ca4f8022e596c249900f5312f8e7fb9376384ca816bc3c08f3e324930702ad410a28b5be680adea6867e1f8075441e
7
- data.tar.gz: 0d70e7b4f18739826bb334fb305e335e44a354ae302214ca3c1884f66ace8680e48a9e4c64b890b220b82056da761084413c8b9b8c5e363382f5cf165b3d3448
6
+ metadata.gz: a78da1d206611573a47328e7904b0aba69203e00b9d33afb65a0fec1d22498cf1d16c761dbda6cc3af930c3fdb4fcc35932126e0fc048a8c6047c17485ce62ec
7
+ data.tar.gz: 3bc8081a187746c76fe5cb7d69519638e03f690533fe221c8b8a9285d537c95afcecb1aebc861ceea1252e6af55a117004f063dd319b0a402c503ae95fb5e0c7
@@ -1,5 +1,88 @@
1
1
  ## [Unreleased]
2
2
 
3
+ ## [2.0.1] - 2020-12-20 - [Janosch Müller](mailto:janosch84@gmail.com)
4
+
5
+ ### Fixed
6
+
7
+ - fixed error when scanning some group names
8
+ * this affected names containing hyphens, digits or multibyte chars, e.g. `/(?<a1>a)/`
9
+ * thanks to [Daniel Gollahon](https://github.com/dgollahon) for the report
10
+ - fixed error when scanning hex escapes with just one hex digit
11
+ * e.g. `/\x0A/` was scanned correctly, but the equivalent `/\xA/` was not
12
+ * thanks to [Daniel Gollahon](https://github.com/dgollahon) for the report
13
+
14
+ ## [2.0.0] - 2020-11-25 - [Janosch Müller](mailto:janosch84@gmail.com)
15
+
16
+ ### Changed
17
+
18
+ - some methods that used to return byte-based indices now return char-based indices
19
+ * the returned values have only changed for Regexps that contain multibyte chars
20
+ * this is only a breaking change if you used such methods directly AND relied on them pointing to bytes
21
+ * affected methods:
22
+ * `Regexp::Token` `#length`, `#offset`, `#te`, `#ts`
23
+ * `Regexp::Expression::Base` `#full_length`, `#offset`, `#starts_at`, `#te`, `#ts`
24
+ * thanks to [Akinori MUSHA](https://github.com/knu) for the report
25
+ - removed some deprecated methods/signatures
26
+ * these are rarely used and have been showing deprecation warnings for a long time
27
+ * `Regexp::Expression::Subexpression.new` with 3 arguments
28
+ * `Regexp::Expression::Root.new` without a token argument
29
+ * `Regexp::Expression.parsed`
30
+
31
+ ### Added
32
+
33
+ - `Regexp::Expression::Base#base_length`
34
+ * returns the character count of an expression body, ignoring any quantifier
35
+ - pragmatic, experimental support for chained quantifiers
36
+ * e.g.: `/^a{10}{4,6}$/` matches exactly 40, 50 or 60 `a`s
37
+ * successive quantifiers used to be silently dropped by the parser
38
+ * they are now wrapped with passive groups as if they were written `(?:a{10}){4,6}`
39
+ * thanks to [calfeld](https://github.com/calfeld) for reporting this a while back
40
+
41
+ ### Fixed
42
+
43
+ - incorrect encoding output for non-ascii comments
44
+ * this led to a crash when calling `#to_s` on parse results containing such comments
45
+ * thanks to [Michael Glass](https://github.com/michaelglass) for the report
46
+ - some crashes when scanning contrived patterns such as `'\😋'`
47
+
48
+ ### [1.8.2] - 2020-10-11 - [Janosch Müller](mailto:janosch84@gmail.com)
49
+
50
+ ### Fixed
51
+
52
+ - fix `FrozenError` in `Expression::Base#repetitions` on Ruby 3.0
53
+ * thanks to [Thomas Walpole](https://github.com/twalpole)
54
+ - removed "unknown future version" warning on Ruby 3.0
55
+
56
+ ### [1.8.1] - 2020-09-28 - [Janosch Müller](mailto:janosch84@gmail.com)
57
+
58
+ ### Fixed
59
+
60
+ - fixed scanning of comment-like text in normal mode
61
+ * this was an old bug, but had become more prevalent in v1.8.0
62
+ * thanks to [Tietew](https://github.com/Tietew) for the report
63
+ - specified correct minimum Ruby version in gemspec
64
+ * it said 1.9 but really required 2.0 as of v1.8.0
65
+
66
+ ### [1.8.0] - 2020-09-20 - [Janosch Müller](mailto:janosch84@gmail.com)
67
+
68
+ ### Changed
69
+
70
+ - dropped support for running on Ruby 1.9.x
71
+
72
+ ### Added
73
+
74
+ - regexp flags can now be passed when parsing a `String` as regexp body
75
+ * see the [README](/README.md#usage) for details
76
+ * thanks to [Owen Stephens](https://github.com/owst)
77
+ - bare occurrences of `\g` and `\k` are now allowed and scanned as literal escapes
78
+ * matches Onigmo behavior
79
+ * thanks for the report to [Marc-André Lafortune](https://github.com/marcandre)
80
+
81
+ ### Fixed
82
+
83
+ - fixed parsing comments without preceding space or trailing newline in x-mode
84
+ * thanks to [Owen Stephens](https://github.com/owst)
85
+
3
86
  ### [1.7.1] - 2020-06-07 - [Ammar Ali](mailto:ammarabuali@gmail.com)
4
87
 
5
88
  ### Fixed
data/README.md CHANGED
@@ -1,6 +1,6 @@
1
1
  # Regexp::Parser
2
2
 
3
- [![Gem Version](https://badge.fury.io/rb/regexp_parser.svg)](http://badge.fury.io/rb/regexp_parser) [![Build Status](https://secure.travis-ci.org/ammar/regexp_parser.svg?branch=master)](http://travis-ci.org/ammar/regexp_parser) [![Code Climate](https://codeclimate.com/github/ammar/regexp_parser.svg)](https://codeclimate.com/github/ammar/regexp_parser/badges)
3
+ [![Gem Version](https://badge.fury.io/rb/regexp_parser.svg)](http://badge.fury.io/rb/regexp_parser) [![Build Status](https://github.com/ammar/regexp_parser/workflows/tests/badge.svg)](https://github.com/ammar/regexp_parser/actions) [![Code Climate](https://codeclimate.com/github/ammar/regexp_parser.svg)](https://codeclimate.com/github/ammar/regexp_parser/badges)
4
4
 
5
5
  A Ruby gem for tokenizing, parsing, and transforming regular expressions.
6
6
 
@@ -8,8 +8,8 @@ A Ruby gem for tokenizing, parsing, and transforming regular expressions.
8
8
  * A scanner/tokenizer based on [Ragel](http://www.colm.net/open-source/ragel/)
9
9
  * A lexer that produces a "stream" of token objects.
10
10
  * A parser that produces a "tree" of Expression objects (OO API)
11
- * Runs on Ruby 1.9, 2.x, and JRuby (1.9 mode) runtimes.
12
- * Recognizes Ruby 1.8, 1.9, and 2.x regular expressions [See Supported Syntax](#supported-syntax)
11
+ * Runs on Ruby 2.x, 3.x and JRuby runtimes
12
+ * Recognizes Ruby 1.8, 1.9, 2.x and 3.x regular expressions [See Supported Syntax](#supported-syntax)
13
13
 
14
14
 
15
15
  _For examples of regexp_parser in use, see [Example Projects](#example-projects)._
@@ -18,13 +18,10 @@ _For examples of regexp_parser in use, see [Example Projects](#example-projects)
18
18
  ---
19
19
  ## Requirements
20
20
 
21
- * Ruby >= 1.9
21
+ * Ruby >= 2.0
22
22
  * Ragel >= 6.0, but only if you want to build the gem or work on the scanner.
23
23
 
24
24
 
25
- _Note: See the .travis.yml file for covered versions._
26
-
27
-
28
25
  ---
29
26
  ## Install
30
27
 
@@ -72,6 +69,17 @@ called with the results as follows:
72
69
  * **Parser**: after completion, the block gets passed the root expression.
73
70
  _The result of the block is returned._
74
71
 
72
+ All three methods accept either a `Regexp` or `String` (containing the pattern)
73
+ - if a String is passed, `options` can be supplied:
74
+
75
+ ```ruby
76
+ require 'regexp_parser'
77
+
78
+ Regexp::Parser.parse(
79
+ "a+ # Recognises a and A...",
80
+ options: ::Regexp::EXTENDED | ::Regexp::IGNORECASE
81
+ )
82
+ ```
75
83
 
76
84
  ---
77
85
  ## Components
@@ -306,7 +314,7 @@ Expression class. See the next section for details._
306
314
 
307
315
  ## Supported Syntax
308
316
  The three modules support all the regular expression syntax features of Ruby 1.8,
309
- 1.9, and 2.x:
317
+ 1.9, 2.x and 3.x:
310
318
 
311
319
  _Note that not all of these are available in all versions of Ruby_
312
320
 
@@ -429,13 +437,17 @@ rake install
429
437
  ## Example Projects
430
438
  Projects using regexp_parser.
431
439
 
440
+ - [capybara](https://github.com/teamcapybara/capybara) is an integration testing tool that uses regexp_parser to convert Regexps to css/xpath selectors.
441
+
442
+ - [js_regex](https://github.com/janosch-x/js_regex) converts Ruby regular expressions to JavaScript-compatible regular expressions.
443
+
432
444
  - [meta_re](https://github.com/ammar/meta_re) is a regular expression preprocessor with alias support.
433
445
 
434
446
  - [mutant](https://github.com/mbj/mutant) (before v0.9.0) manipulates your regular expressions (amongst others) to see if your tests cover their behavior.
435
447
 
436
- - [twitter-cldr-rb](https://github.com/twitter/twitter-cldr-rb) uses regexp_parser to generate examples of postal codes.
448
+ - [rubocop](https://github.com/rubocop-hq/rubocop) is a linter for Ruby that uses regexp_parser to lint Regexps.
437
449
 
438
- - [js_regex](https://github.com/janosch-x/js_regex) converts Ruby regular expressions to JavaScript-compatible regular expressions.
450
+ - [twitter-cldr-rb](https://github.com/twitter/twitter-cldr-rb) is a localization helper that uses regexp_parser to generate examples of postal codes.
439
451
 
440
452
 
441
453
  ## References
@@ -464,4 +476,4 @@ Documentation and books used while working on this project.
464
476
 
465
477
  ---
466
478
  ##### Copyright
467
- _Copyright (c) 2010-2019 Ammar Ali. See LICENSE file for details._
479
+ _Copyright (c) 2010-2020 Ammar Ali. See LICENSE file for details._
@@ -34,6 +34,10 @@ module Regexp::Expression
34
34
 
35
35
  alias :starts_at :ts
36
36
 
37
+ def base_length
38
+ to_s(:base).length
39
+ end
40
+
37
41
  def full_length
38
42
  to_s.length
39
43
  end
@@ -80,8 +84,12 @@ module Regexp::Expression
80
84
  return 1..1 unless quantified?
81
85
  min = quantifier.min
82
86
  max = quantifier.max < 0 ? Float::INFINITY : quantifier.max
83
- # fix Range#minmax - https://bugs.ruby-lang.org/issues/15807
84
- (min..max).tap { |r| r.define_singleton_method(:minmax) { [min, max] } }
87
+ range = min..max
88
+ # fix Range#minmax on old Rubies - https://bugs.ruby-lang.org/issues/15807
89
+ if RUBY_VERSION.to_f < 2.7
90
+ range.define_singleton_method(:minmax) { [min, max] }
91
+ end
92
+ range
85
93
  end
86
94
 
87
95
  def greedy?
@@ -114,23 +122,6 @@ module Regexp::Expression
114
122
  alias :to_h :attributes
115
123
  end
116
124
 
117
- def self.parsed(exp)
118
- warn('WARNING: Regexp::Expression::Base.parsed is buggy and '\
119
- 'will be removed in 2.0.0. Use Regexp::Parser.parse instead.')
120
- case exp
121
- when String
122
- Regexp::Parser.parse(exp)
123
- when Regexp
124
- Regexp::Parser.parse(exp.source) # <- causes loss of root options
125
- when Regexp::Expression # <- never triggers
126
- exp
127
- else
128
- raise ArgumentError, 'Expression.parsed accepts a String, Regexp, or '\
129
- 'a Regexp::Expression as a value for exp, but it '\
130
- "was given #{exp.class.name}."
131
- end
132
- end
133
-
134
125
  end # module Regexp::Expression
135
126
 
136
127
  require 'regexp_parser/expression/quantifier'
@@ -10,9 +10,24 @@ module Regexp::Expression
10
10
  def comment?; false end
11
11
  end
12
12
 
13
- class Atomic < Group::Base; end
14
- class Passive < Group::Base; end
13
+ class Passive < Group::Base
14
+ attr_writer :implicit
15
+
16
+ def to_s(format = :full)
17
+ if implicit?
18
+ "#{expressions.join}#{quantifier_affix(format)}"
19
+ else
20
+ super
21
+ end
22
+ end
23
+
24
+ def implicit?
25
+ @implicit ||= false
26
+ end
27
+ end
28
+
15
29
  class Absence < Group::Base; end
30
+ class Atomic < Group::Base; end
16
31
  class Options < Group::Base
17
32
  attr_accessor :option_changes
18
33
  end
@@ -1,24 +1,12 @@
1
1
  module Regexp::Expression
2
2
 
3
3
  class Root < Regexp::Expression::Subexpression
4
- # TODO: this override is here for backwards compatibility, remove in 2.0.0
5
- def initialize(*args)
6
- unless args.first.is_a?(Regexp::Token)
7
- warn('WARNING: Root.new without a Token argument is deprecated and '\
8
- 'will be removed in 2.0.0. Use Root.build for the old behavior.')
9
- return super(self.class.build_token, *args)
10
- end
11
- super
4
+ def self.build(options = {})
5
+ new(build_token, options)
12
6
  end
13
7
 
14
- class << self
15
- def build(options = {})
16
- new(build_token, options)
17
- end
18
-
19
- def build_token
20
- Regexp::Token.new(:expression, :root, '', 0)
21
- end
8
+ def self.build_token
9
+ Regexp::Token.new(:expression, :root, '', 0)
22
10
  end
23
11
  end
24
12
  end
@@ -40,5 +40,14 @@ module Regexp::Expression
40
40
  RUBY
41
41
  end
42
42
  alias :lazy? :reluctant?
43
+
44
+ def ==(other)
45
+ other.class == self.class &&
46
+ other.token == token &&
47
+ other.mode == mode &&
48
+ other.min == min &&
49
+ other.max == max
50
+ end
51
+ alias :eq :==
43
52
  end
44
53
  end
@@ -7,16 +7,6 @@ module Regexp::Expression
7
7
  # Used as the base class for the Alternation alternatives, Conditional
8
8
  # branches, and CharacterSet::Intersection intersected sequences.
9
9
  class Sequence < Regexp::Expression::Subexpression
10
- # TODO: this override is here for backwards compatibility, remove in 2.0.0
11
- def initialize(*args)
12
- if args.count == 3
13
- warn('WARNING: Sequence.new without a Regexp::Token argument is '\
14
- 'deprecated and will be removed in 2.0.0.')
15
- return self.class.at_levels(*args)
16
- end
17
- super
18
- end
19
-
20
10
  class << self
21
11
  def add_to(subexpression, params = {}, active_opts = {})
22
12
  sequence = at_levels(
@@ -11,11 +11,11 @@ class Regexp::Lexer
11
11
 
12
12
  CLOSING_TOKENS = [:close].freeze
13
13
 
14
- def self.lex(input, syntax = "ruby/#{RUBY_VERSION}", &block)
15
- new.lex(input, syntax, &block)
14
+ def self.lex(input, syntax = "ruby/#{RUBY_VERSION}", options: nil, &block)
15
+ new.lex(input, syntax, options: options, &block)
16
16
  end
17
17
 
18
- def lex(input, syntax = "ruby/#{RUBY_VERSION}", &block)
18
+ def lex(input, syntax = "ruby/#{RUBY_VERSION}", options: nil, &block)
19
19
  syntax = Regexp::Syntax.new(syntax)
20
20
 
21
21
  self.tokens = []
@@ -25,7 +25,7 @@ class Regexp::Lexer
25
25
  self.shift = 0
26
26
 
27
27
  last = nil
28
- Regexp::Scanner.scan(input) do |type, token, text, ts, te|
28
+ Regexp::Scanner.scan(input, options: options) do |type, token, text, ts, te|
29
29
  type, token = *syntax.normalize(type, token)
30
30
  syntax.check! type, token
31
31
 
@@ -96,10 +96,10 @@ class Regexp::Lexer
96
96
 
97
97
  tokens.pop
98
98
  tokens << Regexp::Token.new(:literal, :literal, lead,
99
- token.ts, (token.te - last.bytesize),
99
+ token.ts, (token.te - last.length),
100
100
  nesting, set_nesting, conditional_nesting)
101
101
  tokens << Regexp::Token.new(:literal, :literal, last,
102
- (token.ts + lead.bytesize), token.te,
102
+ (token.ts + lead.length), token.te,
103
103
  nesting, set_nesting, conditional_nesting)
104
104
  end
105
105
 
@@ -18,12 +18,12 @@ class Regexp::Parser
18
18
  end
19
19
  end
20
20
 
21
- def self.parse(input, syntax = "ruby/#{RUBY_VERSION}", &block)
22
- new.parse(input, syntax, &block)
21
+ def self.parse(input, syntax = "ruby/#{RUBY_VERSION}", options: nil, &block)
22
+ new.parse(input, syntax, options: options, &block)
23
23
  end
24
24
 
25
- def parse(input, syntax = "ruby/#{RUBY_VERSION}", &block)
26
- root = Root.build(options_from_input(input))
25
+ def parse(input, syntax = "ruby/#{RUBY_VERSION}", options: nil, &block)
26
+ root = Root.build(extract_options(input, options))
27
27
 
28
28
  self.root = root
29
29
  self.node = root
@@ -35,7 +35,7 @@ class Regexp::Parser
35
35
 
36
36
  self.captured_group_counts = Hash.new(0)
37
37
 
38
- Regexp::Lexer.scan(input, syntax) do |token|
38
+ Regexp::Lexer.scan(input, syntax, options: options) do |token|
39
39
  parse_token(token)
40
40
  end
41
41
 
@@ -54,14 +54,20 @@ class Regexp::Parser
54
54
  :options_stack, :switching_options, :conditional_nesting,
55
55
  :captured_group_counts
56
56
 
57
- def options_from_input(input)
58
- return {} unless input.is_a?(::Regexp)
57
+ def extract_options(input, options)
58
+ if options && !input.is_a?(String)
59
+ raise ArgumentError, 'options cannot be supplied unless parsing a String'
60
+ end
61
+
62
+ options = input.options if input.is_a?(::Regexp)
59
63
 
60
- options = {}
61
- options[:i] = true if input.options & ::Regexp::IGNORECASE != 0
62
- options[:m] = true if input.options & ::Regexp::MULTILINE != 0
63
- options[:x] = true if input.options & ::Regexp::EXTENDED != 0
64
- options
64
+ return {} unless options
65
+
66
+ enabled_options = {}
67
+ enabled_options[:i] = true if options & ::Regexp::IGNORECASE != 0
68
+ enabled_options[:m] = true if options & ::Regexp::MULTILINE != 0
69
+ enabled_options[:x] = true if options & ::Regexp::EXTENDED != 0
70
+ enabled_options
65
71
  end
66
72
 
67
73
  def nest(exp)
@@ -432,6 +438,28 @@ class Regexp::Parser
432
438
  target_node || raise(ArgumentError, 'No valid target found for '\
433
439
  "'#{token.text}' ")
434
440
 
441
+ # in case of chained quantifiers, wrap target in an implicit passive group
442
+ # description of the problem: https://github.com/ammar/regexp_parser/issues/3
443
+ # rationale for this solution: https://github.com/ammar/regexp_parser/pull/69
444
+ if target_node.quantified?
445
+ new_token = Regexp::Token.new(
446
+ :group,
447
+ :passive,
448
+ '', # text
449
+ target_node.ts,
450
+ nil, # te (unused)
451
+ target_node.level,
452
+ target_node.set_level,
453
+ target_node.conditional_level
454
+ )
455
+ new_group = Group::Passive.new(new_token, active_opts)
456
+ new_group.implicit = true
457
+ new_group << target_node
458
+ increase_level(target_node)
459
+ node.expressions[offset] = new_group
460
+ target_node = new_group
461
+ end
462
+
435
463
  case token.token
436
464
  when :zero_or_one
437
465
  target_node.quantify(:zero_or_one, token.text, 0, 1, :greedy)
@@ -462,6 +490,11 @@ class Regexp::Parser
462
490
  end
463
491
  end
464
492
 
493
+ def increase_level(exp)
494
+ exp.level += 1
495
+ exp.respond_to?(:each) && exp.each { |subexp| increase_level(subexp) }
496
+ end
497
+
465
498
  def interval(target_node, token)
466
499
  text = token.text
467
500
  mchr = text[text.length-1].chr =~ /[?+]/ ? text[text.length-1].chr : nil