mustermann 4.0.0.alpha → 4.0.0.alpha3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 295f1b612e2e5a51c3edb1d1fa1c1ed66f56792d3782dda3adb1f48082d702ac
4
- data.tar.gz: db02262f6b9853a63b8718e87e984526655b90e0e0648c1250ea4af23cb497da
3
+ metadata.gz: c6a0beea1e6d365356c6444910ec10200dcaf98f1968e88ac2fe91e0a7b23b93
4
+ data.tar.gz: 1be43d10761af60bbd4cdd5d087214073984825b05f397de8d1475a7f35686d6
5
5
  SHA512:
6
- metadata.gz: ae9f2778ec8d47418feec32de4ee1ca837b2ad22076ccd59c438872f075bbcd7deef2be648a7710570e45cb49e5c82085b5da1435641c5143131fee69734b8ec
7
- data.tar.gz: d43838b66491e62f6fc7cf066fcce44ae2a7ec9a28e47abbad100519d3db13d1b0c8eed789bdff64b5f3f18e6548cfbe2e6312288ebe96c6bf236aac80eb0807
6
+ metadata.gz: 710a7afa55bdbd0f6fb17845458572ea9adaf8149674c9b1e54ddba333382be9e05136d37d7680595bead97b21bee76334d129cbf96c320b0f975688b20ad29c
7
+ data.tar.gz: 64e15b87da6f731fce7d0e50eb4ee13511c175b979765d9a233a066c026026c0c82300e22b8492db8220d2f89995cadb471c78ac00da60ea8be360a22b8290be
data/README.md CHANGED
@@ -420,19 +420,6 @@ set.add('/ping')
420
420
  set.match('/ping').value # => nil
421
421
  ```
422
422
 
423
- ### Conflict Resolution
424
-
425
- The set follows insertion order: when two patterns both match a string, the one added first wins. Use `match_all` to retrieve every match:
426
-
427
- ``` ruby
428
- set = Mustermann::Set.new
429
- set.add('/foo', :static)
430
- set.add('/:var', :dynamic)
431
-
432
- set.match('/foo').value # => :static
433
- set.match_all('/foo').map(&:value) # => [:static, :dynamic]
434
- ```
435
-
436
423
  ### Peeking
437
424
 
438
425
  `peek_match` matches a prefix of the input rather than the full string. The unmatched remainder is available via `post_match`:
@@ -489,6 +476,66 @@ object = MyObject.new
489
476
  Mustermann.new(object, type: :rails) # => #<Mustermann::Rails:"/foo">
490
477
  ```
491
478
 
479
+ ### Match order
480
+
481
+ A set can match patterns and values in loose or strict insertion order.
482
+
483
+ You have the following guarantees without strict ordering:
484
+
485
+ * Patterns with dynamic segments in the same position and equal static parts will always match in the order they were added.
486
+ * Multiple values for the same pattern will retain their insertion order in regards to that pattern.
487
+
488
+ Trade-offs without strict ordering:
489
+
490
+ * Static segments may be favored over dynamic segments. If you want to guarantee this behavior, enable trie-mode proactively.
491
+ * When a pattern has multiple values, these will follow each other directly when using `match_all` or `peek_match_all`.
492
+
493
+ Strict ordering comes with both a performance overhead and marginally increased memory usage.
494
+ How big the performance overhead is depends on the number of patterns that overlap in the strings they successfully match against.
495
+ It does use Ruby's built-in sorting, which on MRI is based on quicksort. The memory overhead grows linear with the number
496
+ of pattern and value combinations, but is generally small compared to the memory used by the patterns and values themselves.
497
+
498
+ With strict ordering enabled, patterns and values are guaranteed to occur in insertion order.
499
+
500
+ Without strict ordering, not using a trie:
501
+
502
+ ```ruby
503
+ set = Mustermann::Set.new(use_trie: false)
504
+
505
+ set.add("/:path", :first)
506
+ set.add("/static", :second)
507
+ set.add("/:path", :third)
508
+
509
+ set.match("/static").value # => :first
510
+ set.match_all("/static").map(&:value) # => [:first, :third, :second]
511
+ ```
512
+
513
+ Without strict ordering, using a trie:
514
+
515
+ ```ruby
516
+ set = Mustermann::Set.new(use_trie: true)
517
+
518
+ set.add("/:path", :first)
519
+ set.add("/static", :second)
520
+ set.add("/:path", :third)
521
+
522
+ set.match("/static").value # => :second
523
+ set.match_all("/static").map(&:value) # => [:second, :first, :third]
524
+ ```
525
+
526
+ With strict ordering enabled, regardless of whether a trie is used or not:
527
+
528
+ ```ruby
529
+ set = Mustermann::Set.new(strict_order: true)
530
+
531
+ set.add("/:path", :first)
532
+ set.add("/static", :second)
533
+ set.add("/:path", :third)
534
+
535
+ set.match("/static").value # => :first
536
+ set.match_all("/static").map(&:value) # => [:first, :second, :third]
537
+ ```
538
+
492
539
  <a name="-duck-typing-respond-to"></a>
493
540
  ### `respond_to?`
494
541
 
@@ -0,0 +1,117 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Mustermann
4
+ module AST
5
+ # Mixin for AST::Pattern subclasses that accelerates compilation and AST
6
+ # construction for "simple" patterns: only static path segments and
7
+ # unconstrained full-segment captures (e.g. /foo/:bar/baz/:id).
8
+ # Patterns with optional groups, constraints, or non-default options fall
9
+ # through to the full AST pipeline.
10
+ module FastPattern
11
+ # Matches patterns that consist only of slashes, static segments, and
12
+ # simple :name captures — no optional groups, no constraints.
13
+ SIMPLE = /\A(?:\/(?:[a-zA-Z0-9\-_.~]+|:[a-zA-Z_]\w*))+\z/
14
+
15
+ # Regexp fragment for each printable ASCII char, matching the same output
16
+ # as Compiler#encoded with uri_decode: true.
17
+ ENCODED = (0..127).each_with_object({}) do |byte, h|
18
+ c = byte.chr
19
+ pct = '%%%02X' % byte
20
+ reps = [c, pct, pct.downcase].uniq
21
+ h[c] = reps.size == 1 ? Regexp.escape(reps.first) :
22
+ '(?:%s)' % reps.map { |r| Regexp.escape(r) }.join('|')
23
+ end.freeze
24
+
25
+ SEGMENT_SCAN = %r{(/)|(:[a-zA-Z_]\w*)|([^/:]+)}
26
+
27
+ private_constant :SIMPLE, :ENCODED, :SEGMENT_SCAN
28
+
29
+ # Bypasses the generic build_match overhead for simple patterns: uses
30
+ # MatchData#named_captures directly and avoids match.to_s / post_match /
31
+ # pre_match calls (all no-ops for \A…\Z anchored regexps).
32
+ def match(string)
33
+ return super unless @fast_match
34
+ return unless match = @regexp.match(string)
35
+ params = match.named_captures
36
+ params.transform_values! { |v| unescape(v) } if string.include?('%')
37
+ Match.new(self, string, params)
38
+ end
39
+
40
+ # Public override: fast path for simple patterns, falls through to super otherwise.
41
+ # Must remain public to match AST::Pattern#to_ast visibility.
42
+ def to_ast
43
+ return super unless simple_pattern?
44
+ ast = self.class.ast_cache.fetch(@string) { build_fast_ast }
45
+ @param_converters ||= {}
46
+ ast
47
+ end
48
+
49
+ private
50
+
51
+ def simple_pattern?
52
+ options[:capture].nil? &&
53
+ options[:except].nil? &&
54
+ options.fetch(:greedy, true) != false &&
55
+ uri_decode &&
56
+ @string.match?(SIMPLE)
57
+ end
58
+
59
+ def compile(**options)
60
+ return super unless simple_pattern?
61
+ result = fast_compile
62
+ @fast_match = true
63
+ result
64
+ end
65
+
66
+ def fast_compile
67
+ tokens = @string.scan(SEGMENT_SCAN)
68
+ src = String.new
69
+ tokens.each_with_index do |(sep, cap, chars), i|
70
+ if sep
71
+ src << '\\/'
72
+ elsif cap
73
+ # Mirror the compiler: wrap in atomic group when the next token is a separator.
74
+ if tokens[i + 1]&.first
75
+ src << "(?<#{cap[1..]}>(?>[^/\\?#]+))"
76
+ else
77
+ src << "(?<#{cap[1..]}>[^/\\?#]+)"
78
+ end
79
+ else
80
+ chars.each_char { |c| src << ENCODED[c] }
81
+ end
82
+ end
83
+ Regexp.new(src)
84
+ end
85
+
86
+ def build_fast_ast
87
+ nodes = []
88
+ pos = 0
89
+ @string.scan(SEGMENT_SCAN) do |sep, cap, chars|
90
+ if sep
91
+ node = Node::Separator.new('/')
92
+ node.start, node.stop = pos, pos + 1
93
+ nodes << node
94
+ pos += 1
95
+ elsif cap
96
+ node = Node::Capture.new(cap[1..])
97
+ node.start, node.stop = pos, pos + cap.length
98
+ nodes << node
99
+ pos += cap.length
100
+ else
101
+ chars.each_char do |c|
102
+ node = Node::Char.new(c)
103
+ node.start, node.stop = pos, pos + 1
104
+ nodes << node
105
+ pos += 1
106
+ end
107
+ end
108
+ end
109
+ root = Node::Root.new
110
+ root.payload = nodes
111
+ root.pattern = @string
112
+ root.start, root.stop = 0, @string.length
113
+ root
114
+ end
115
+ end
116
+ end
117
+ end
@@ -23,6 +23,11 @@ module Mustermann
23
23
  instance_delegate %i[parser compiler transformer validation template_generator param_scanner boundaries] => 'self.class'
24
24
  instance_delegate parse: :parser, transform: :transformer, validate: :validation,
25
25
  generate_templates: :template_generator, scan_params: :param_scanner, set_boundaries: :boundaries
26
+
27
+ # @api private
28
+ def self.ast_cache
29
+ @ast_cache ||= EqualityMap.new
30
+ end
26
31
 
27
32
  # @api private
28
33
  # @return [#parse] parser object for pattern
@@ -96,13 +101,14 @@ module Mustermann
96
101
  # Internal AST representation of pattern.
97
102
  # @!visibility private
98
103
  def to_ast
99
- @ast_cache ||= EqualityMap.new
100
- @ast_cache.fetch(@string) do
104
+ ast = self.class.ast_cache.fetch(@string) do
101
105
  ast = parse(@string, pattern: self)
102
106
  ast &&= transform(ast)
103
107
  ast &&= set_boundaries(ast, string: @string)
104
108
  validate(ast)
105
109
  end
110
+ @param_converters ||= scan_params(ast) if ast
111
+ ast
106
112
  end
107
113
 
108
114
  # All AST-based pattern implementations support expanding.
@@ -140,6 +146,11 @@ module Mustermann
140
146
  @param_converters ||= scan_params(to_ast)
141
147
  end
142
148
 
149
+ # @api private
150
+ def identity_params?(params)
151
+ param_converters.empty? && super
152
+ end
153
+
143
154
  private :compile, :parse, :transform, :validate, :generate_templates, :param_converters, :scan_params, :set_boundaries
144
155
  end
145
156
  end
@@ -358,6 +358,7 @@ module Mustermann
358
358
  # @!visibility private
359
359
  def unescape(string, decode = uri_decode)
360
360
  return string unless decode and string
361
+ return string unless string.include?('%')
361
362
  @@uri.unescape(string)
362
363
  end
363
364
 
@@ -369,6 +370,13 @@ module Mustermann
369
370
  ALWAYS_ARRAY.include? key
370
371
  end
371
372
 
373
+ # @api private
374
+ # Returns true if params can be used as-is without calling map_param.
375
+ # Used by Set::Trie to skip building a redundant copy of the params hash.
376
+ def identity_params?(params)
377
+ !params.any? { |k, v| v.is_a?(Array) || always_array?(k) || (v.respond_to?(:include?) && v.include?('%')) }
378
+ end
379
+
372
380
  private :unescape, :map_param, :respond_to_special?
373
381
  private_constant :ALWAYS_ARRAY
374
382
  end
@@ -1,6 +1,7 @@
1
1
  # frozen_string_literal: true
2
2
  require 'mustermann'
3
3
  require 'mustermann/ast/pattern'
4
+ require 'mustermann/ast/fast_pattern'
4
5
  require 'mustermann/versions'
5
6
 
6
7
  module Mustermann
@@ -12,6 +13,7 @@ module Mustermann
12
13
  # @see Mustermann::Pattern
13
14
  # @see file:README.md#rails Syntax description in the README
14
15
  class Rails < AST::Pattern
16
+ include AST::FastPattern
15
17
  extend Versions
16
18
  register :rails
17
19
 
@@ -19,6 +19,7 @@ module Mustermann
19
19
  regexp = compile(**options)
20
20
  @peek_regexp = /\A#{regexp}/
21
21
  @regexp = /\A#{regexp}\Z/
22
+ @simple_captures = @regexp.named_captures.none? { |name, positions| positions.size > 1 || always_array?(name) }
22
23
  end
23
24
 
24
25
  # @param (see Mustermann::Pattern#peek_size)
@@ -34,14 +35,10 @@ module Mustermann
34
35
  # @see (see Mustermann::Pattern#peek_match)
35
36
  def peek_match(string) = build_match(@peek_regexp.match(string))
36
37
 
37
- def match(string) = build_match(@regexp.match(string))
38
-
39
- # private
40
-
41
- # def build_match(match)
42
- # return unless match
43
- # Match.new(self, match.string, match.named_captures, post_match: match.post_match, pre_match: match.pre_match)
44
- # end
38
+ def match(string)
39
+ return unless match = @regexp.match(string)
40
+ Match.new(self, string, build_params(match))
41
+ end
45
42
 
46
43
  extend Forwardable
47
44
  def_delegators :regexp, :===, :=~, :names
@@ -50,12 +47,21 @@ module Mustermann
50
47
 
51
48
  def build_match(match)
52
49
  return unless match
53
- params = match.regexp.named_captures.to_h do |name, positions|
54
- value = positions.size < 2 && !always_array?(name) ? map_param(name, match[name]) :
55
- positions.flat_map { |pos| map_param(name, match[pos]) }
56
- [name, value]
50
+ Match.new(self, match.to_s, build_params(match), post_match: match.post_match, pre_match: match.pre_match)
51
+ end
52
+
53
+ def build_params(match)
54
+ if @simple_captures
55
+ params = match.named_captures
56
+ return params if params.empty? || identity_params?(params)
57
+ params.each_with_object({}) { |(k, v), h| h[k] = map_param(k, v) }
58
+ else
59
+ match.regexp.named_captures.to_h do |name, positions|
60
+ value = positions.size < 2 && !always_array?(name) ? map_param(name, match[name]) :
61
+ positions.flat_map { |pos| map_param(name, match[pos]) }
62
+ [name, value]
63
+ end
57
64
  end
58
- Match.new(self, match.to_s, params, post_match: match.post_match, pre_match: match.pre_match)
59
65
  end
60
66
 
61
67
  def compile(**options) = raise NotImplementedError, 'subclass responsibility'
@@ -11,11 +11,12 @@ module Mustermann
11
11
  # router = Mustermann::Router.new do
12
12
  # get "/hello/:name" do |env|
13
13
  # name = env["mustermann.match"][:name]
14
- # [200, { "Content-Type" => "text/plain" }, ["Hello, #{name}!"]]
14
+ # [200, { "content-type" => "text/plain" }, ["Hello, #{name}!"]]
15
15
  # end
16
16
  # end
17
17
  #
18
18
  # # in config.ru
19
+ # use Rack::Head
19
20
  # run router
20
21
  #
21
22
  # @example Routing to other applications
@@ -30,7 +31,7 @@ module Mustermann
30
31
  #
31
32
  # @example As middleware
32
33
  # use Mustermann::Router do
33
- # get("/up") { [200, { "Content-Type" => "text/plain" }, ["Up!"]] }
34
+ # get("/up") { [200, { "content-type" => "text/plain" }, ["Up!"]] }
34
35
  # end
35
36
  #
36
37
  # run MyApp
@@ -38,8 +39,8 @@ module Mustermann
38
39
  # @see Mustermann::Set
39
40
  # @see https://rack.github.io/rack/
40
41
  class Router
41
- NOT_FOUND = [404, { "Content-Type" => "text/plain", "X-Cascade" => "pass" }, ["Not found"]].freeze
42
- VERBS = %w[GET HEAD POST PUT PATCH DELETE OPTIONS LINK UNLINK].freeze
42
+ NOT_FOUND = [404, { "content-type" => "text/plain", "x-cascade" => "pass" }, ["Not found"]].freeze
43
+ VERBS = %w[GET POST PUT PATCH DELETE OPTIONS LINK UNLINK].freeze
43
44
  private_constant :VERBS, :NOT_FOUND
44
45
 
45
46
  # Initializes a new router.
@@ -47,17 +48,22 @@ module Mustermann
47
48
  # @param options [Hash] Options to be passed to the Mustermann patterns.
48
49
  def initialize(fallback = nil, key: "mustermann.match", **options, &block)
49
50
  @key = key
50
- @sets = VERBS.to_h { |verb| [verb, Set.new] }
51
- @options = options
52
- @fallback = fallback || ->(env) { NOT_FOUND }
53
- instance_exec(&block) if block_given?
51
+ @sets = VERBS.to_h { |verb| [verb, Set.new(**options)] }
52
+ @fallback = fallback || ->(env) { NOT_FOUND.dup }
53
+
54
+ if block_given?
55
+ instance_exec(&block)
56
+ @sets.each_value(&:optimize!)
57
+ end
54
58
  end
55
59
 
56
60
  # @param env [Hash] The Rack environment hash for the request.
57
61
  # @return [Array] The Rack response array (status, headers, body).
58
62
  def call(env)
59
- if routes = @sets[env["REQUEST_METHOD"]] and match = routes.match(env["PATH_INFO"] || "/")
60
- env = env.merge(@key => match)
63
+ request_method = env["REQUEST_METHOD"] || "GET"
64
+ request_method = "GET" if request_method == "HEAD"
65
+ if routes = @sets[request_method] and match = routes.match(env["PATH_INFO"] || "/")
66
+ env[@key] = match
61
67
  return match.value.call(env)
62
68
  end
63
69
  @fallback.call(env)
@@ -78,7 +84,6 @@ module Mustermann
78
84
  def route(verb, pattern, target = nil, **options, &block)
79
85
  raise ArgumentError, "need to provide target, :to or a block" unless target || block
80
86
  raise ArgumentError, "unknown verb: #{verb}" unless VERBS.include?(verb)
81
- pattern = Mustermann.new(pattern, **@options, **options)
82
87
  @sets[verb].add(pattern, target || block)
83
88
  end
84
89
 
@@ -5,24 +5,42 @@ module Mustermann
5
5
  class Set
6
6
  class Cache
7
7
  PLACEHOLDER = Object.new.freeze
8
+ EMPTY_ARRAY = [].freeze
8
9
 
9
10
  def self.new(matcher) = defined?(ObjectSpace::WeakKeyMap) ? super : matcher
10
11
 
11
12
  def initialize(matcher)
12
13
  @matcher = matcher
13
- @caches = {}
14
+ reset_cache
14
15
  end
15
16
 
16
17
  def add(pattern)
17
18
  @matcher.add(pattern)
18
- @caches.clear
19
+ reset_cache
19
20
  end
20
21
 
21
- def match(string, **options)
22
- cache = @caches[options] ||= ObjectSpace::WeakKeyMap.new
23
- result = cache[string] ||= @matcher.match(string, **options) || PLACEHOLDER
24
- result unless result.equal? PLACEHOLDER
22
+ def match(string, all: false, peek: false)
23
+ cache = @match_cache[all][peek]
24
+ result = cache[string] ||= @matcher.match(string, all: all, peek: peek) || PLACEHOLDER
25
+ return result unless result.equal? PLACEHOLDER
26
+ all ? EMPTY_ARRAY : nil
25
27
  end
28
+
29
+ def reset_cache
30
+ @match_cache = {
31
+ true => {
32
+ true => ObjectSpace::WeakKeyMap.new,
33
+ false => ObjectSpace::WeakKeyMap.new
34
+ },
35
+ false => {
36
+ true => ObjectSpace::WeakKeyMap.new,
37
+ false => ObjectSpace::WeakKeyMap.new
38
+ }
39
+ }
40
+ end
41
+
42
+ def optimize! = @matcher.optimize!
43
+ def track(...) = @matcher.track(...)
26
44
  end
27
45
 
28
46
  private_constant :Cache
@@ -23,6 +23,8 @@ module Mustermann
23
23
  end
24
24
  result
25
25
  end
26
+
27
+ def optimize! = nil
26
28
  end
27
29
 
28
30
  private_constant :Linear
@@ -1,15 +1,22 @@
1
1
  # frozen_string_literal: true
2
2
  require 'mustermann/match'
3
- require 'delegate'
4
3
 
5
4
  module Mustermann
6
5
  class Set
7
- class Match < DelegateClass(Mustermann::Match)
6
+ class Match < Mustermann::Match
8
7
  attr_reader :value
9
8
 
10
- def initialize(*args, value: nil, match: nil, **options)
9
+ def initialize(pattern = nil, string = nil, params = {}, value: nil, match: nil, post_match: '', pre_match: '')
11
10
  @value = value
12
- super(match || Mustermann::Match.new(*args, **options))
11
+ if match
12
+ @pattern = match.pattern
13
+ @string = match.string
14
+ @params = match.params
15
+ @post_match = match.post_match
16
+ @pre_match = match.pre_match
17
+ else
18
+ super(pattern, string, params, post_match:, pre_match:)
19
+ end
13
20
  end
14
21
  end
15
22
  end
@@ -0,0 +1,29 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Mustermann
4
+ class Set
5
+ class StrictOrder
6
+ def initialize(matcher)
7
+ @matcher = matcher
8
+ @order = {}
9
+ @count = 0
10
+ end
11
+
12
+ def add(...) = @matcher.add(...)
13
+ def optimize! = @matcher.optimize!
14
+
15
+ def match(string, all: false, peek: false)
16
+ possible = @matcher.match(string, all: true, peek: peek)
17
+ possible.sort_by! { |m| @order.dig(m.pattern, m.value) }
18
+ all ? possible : possible.first
19
+ end
20
+
21
+ def track(pattern, value)
22
+ @order[pattern] ||= {}
23
+ @order[pattern][value] = @count += 1
24
+ end
25
+ end
26
+
27
+ private_constant :StrictOrder
28
+ end
29
+ end
@@ -20,11 +20,8 @@ module Mustermann
20
20
  end
21
21
 
22
22
  translate(:char) do |trie, **options|
23
- strings = t.possible_strings(payload)
24
- return trie if strings.empty?
25
- primary_node = trie[strings.first]
26
- strings[1..-1].each { |s| trie.wire(s, primary_node) }
27
- primary_node
23
+ return trie if payload.empty?
24
+ trie[payload]
28
25
  end
29
26
 
30
27
  translate(:optional) do |trie, **options|
@@ -63,21 +60,21 @@ module Mustermann
63
60
  super()
64
61
  end
65
62
 
66
- def compile(node, **options) = /\A#{@compiler.translate(node, **@options, **options)}/
67
-
68
- def possible_strings(char)
69
- return [] if char.empty?
70
- @compiler.class.char_representations(char, **@options.slice(:uri_decode, :space_matches_plus))
71
- end
63
+ # \G anchors to a position passed to String#match, avoiding substring allocation.
64
+ def compile(node, **options) = /\G#{@compiler.translate(node, **@options, **options)}/
72
65
  end
73
66
 
74
67
  attr_reader :patterns, :set, :static, :dynamic
75
68
 
76
69
  def initialize(set, patterns = [])
77
- @set = set
78
- @patterns = []
79
- @dynamic = {}
80
- @static = {}
70
+ @set = set
71
+ @patterns = []
72
+ @dynamic = {}
73
+ @static = {}
74
+ @stride = nil
75
+ @fast_static = nil
76
+ @byte_lookup = nil
77
+ @dynamic_entries = nil
81
78
  patterns.each { |pattern| add(pattern) }
82
79
  end
83
80
 
@@ -88,51 +85,70 @@ module Mustermann
88
85
  end
89
86
  end
90
87
 
91
- def wire(string, target)
92
- return if string.empty?
93
- if string.size == 1
94
- @static[string] ||= target
95
- else
96
- (@static[string[0]] ||= Trie.new(@set)).wire(string[1..-1], target)
97
- end
98
- end
99
-
100
88
  def match(string, all: false, peek: false, position: 0, params: {})
89
+ optimize! if @stride.nil?
101
90
  return build_matches(string, params, all:) if position >= string.size
102
91
  result = [] if all
103
92
 
104
- if node = @static[string[position]]
105
- if nested_result = node.match(string, all:, peek:, position: position + 1, params:)
106
- return nested_result unless all
107
- result.concat(nested_result)
93
+ if @fast_static
94
+ stride = @stride
95
+ if node = @fast_static[string[position, stride]]
96
+ if nested_result = node.match(string, all:, peek:, position: position + stride, params:)
97
+ return nested_result unless all
98
+ result.concat(nested_result)
99
+ end
100
+ end
101
+ elsif @byte_lookup
102
+ if node = @byte_lookup[string.getbyte(position)]
103
+ if nested_result = node.match(string, all:, peek:, position: position + 1, params:)
104
+ return nested_result unless all
105
+ result.concat(nested_result)
106
+ end
108
107
  end
109
108
  end
110
109
 
111
- anchored = {}
112
- @dynamic.each do |matcher, node|
113
- remaining = string[position..-1]
114
- regexp_match = matcher.match(remaining)
115
- # Non-greedy patterns (e.g. splat .*?) can match 0 chars on non-empty input, making
116
- # no progress. Retry with an end-of-string anchor so they consume the full remainder.
117
- if regexp_match&.to_s&.empty? && !remaining.empty?
118
- anchored_matcher = anchored[matcher] ||= Regexp.new(matcher.source + '\z')
119
- regexp_match = anchored_matcher.match(remaining)
120
- end
121
- next unless regexp_match
110
+ unless @dynamic_entries.empty?
111
+ anchored = nil
112
+ base_params = all ? params : nil
113
+ @dynamic_entries.each do |matcher, node, capture_names, fast_name|
114
+ if fast_name
115
+ # Fast path: unconstrained single-segment capture no regex, no MatchData.
116
+ end_pos = string.index('/', position) || string.size
117
+ next if end_pos == position
118
+ edge_params = all ? base_params.dup : params
119
+ edge_params[fast_name] = string.byteslice(position, end_pos - position)
120
+ nested_result = node.match(string, all:, params: edge_params, peek:, position: end_pos)
121
+ return nested_result unless all
122
+ result.concat(nested_result)
123
+ next
124
+ end
122
125
 
123
- regexp_match.named_captures.each do |name, value|
124
- params = params.dup
125
- params[name] = params[name]&.dup || []
126
- params[name] << value
127
- end
126
+ regexp_match = matcher.match(string, position)
127
+ # Non-greedy patterns (e.g. splat .*?) can match 0 chars on non-empty input, making
128
+ # no progress. Retry with an end-of-string anchor so they consume the full remainder.
129
+ if regexp_match && regexp_match.end(0) == position
130
+ anchored ||= {}
131
+ anchored_matcher = anchored[matcher] ||= Regexp.new(matcher.source + '\z')
132
+ regexp_match = anchored_matcher.match(string, position)
133
+ end
134
+ next unless regexp_match
135
+
136
+ edge_params = all ? base_params.dup : params
137
+ capture_names.each do |name|
138
+ value = regexp_match[name]
139
+ next unless value
140
+ existing = edge_params[name]
141
+ edge_params[name] = existing ? (existing.is_a?(Array) ? existing << value : [existing, value]) : value
142
+ end
128
143
 
129
- nested_result = node.match(string, all:, params:, peek:, position: position + regexp_match.to_s.size)
130
- return nested_result unless all
131
- result.concat(nested_result)
144
+ nested_result = node.match(string, all:, params: edge_params, peek:, position: regexp_match.end(0))
145
+ return nested_result unless all
146
+ result.concat(nested_result)
147
+ end
132
148
  end
133
149
 
134
150
  if peek
135
- matches = build_matches(string[0, position], params, all:, post_match: string[position..])
151
+ matches = build_matches(string[0, position], params, all:, post_match: string[position..], pre_match: '')
136
152
  return matches unless all
137
153
  result.concat(matches)
138
154
  end
@@ -140,21 +156,19 @@ module Mustermann
140
156
  result
141
157
  end
142
158
 
143
- def build_matches(string, params, all: false, **options)
159
+ NIL_VALUES = [nil].freeze
160
+
161
+ def build_matches(string, params, all: false, post_match: '', pre_match: '')
144
162
  result = [] if all
145
163
 
146
164
  @patterns.each do |pattern|
147
165
  next if pattern.except_regexp&.match?(string)
148
166
 
149
- pattern_params = params.to_h do |key, value|
150
- value = value.flat_map { |v| pattern.map_param(key, v) }
151
- value = value.first if value.size < 2 and not pattern.always_array?(key)
152
- [key, value]
153
- end
167
+ pattern_params = build_pattern_params(pattern, params)
154
168
 
155
- values = @set.values_for_pattern(pattern) || [nil]
169
+ values = @set.values_for_pattern(pattern) || NIL_VALUES
156
170
  values.each do |value|
157
- match = Set::Match.new(pattern, string, pattern_params, value:, **options)
171
+ match = Set::Match.new(pattern, string, pattern_params, value:, post_match:, pre_match:)
158
172
  return match unless all
159
173
  result << match
160
174
  end
@@ -163,9 +177,91 @@ module Mustermann
163
177
  result
164
178
  end
165
179
 
180
+ def build_pattern_params(pattern, params)
181
+ return params if pattern.identity_params?(params)
182
+
183
+ result = {}
184
+ params.each do |key, raw|
185
+ if raw.is_a?(Array)
186
+ val = raw.flat_map { |v| pattern.map_param(key, v) }
187
+ val = val.first if val.size < 2 && !pattern.always_array?(key)
188
+ else
189
+ val = pattern.map_param(key, raw)
190
+ val = [val] if pattern.always_array?(key)
191
+ end
192
+ result[key] = val
193
+ end
194
+ result
195
+ end
196
+
166
197
  def add(pattern)
198
+ @stride = nil
199
+ @fast_static = nil
200
+ @byte_lookup = nil
201
+ @dynamic_entries = nil
167
202
  Translator.new(pattern).translate(pattern.to_ast, self)
168
203
  end
204
+
205
+ # Compacts the trie by replacing sequential single-char static lookups with a
206
+ # single stride-length hash lookup. The stride is the minimum number of static
207
+ # steps all paths from this node share before hitting a dynamic edge or branch.
208
+ def optimize!
209
+ depth = min_static_depth
210
+ if depth > 1
211
+ @fast_static = build_stride_hash(depth)
212
+ @byte_lookup = nil
213
+ @stride = depth
214
+ @fast_static.each_value(&:optimize!)
215
+ elsif @static.empty?
216
+ @fast_static = nil
217
+ @byte_lookup = nil
218
+ @stride = 1
219
+ # no children to recurse into
220
+ else
221
+ @fast_static = nil
222
+ @byte_lookup = Array.new(256)
223
+ @static.each { |k, v| @byte_lookup[k.getbyte(0)] = v }
224
+ @stride = 1
225
+ @static.each_value(&:optimize!)
226
+ end
227
+ @dynamic.each_value(&:optimize!)
228
+ @dynamic_entries = @dynamic.map do |matcher, node|
229
+ names = matcher.names.each(&:freeze)
230
+ # Detect unconstrained single-segment captures: can use fast string.index instead of regex.
231
+ # Two conditions: (1) the edge is a bare capture (source starts with \G(?<name>), no leading
232
+ # static chars) and (2) the capture's character class excludes '/' (PATH_INFO never has '?' or '#').
233
+ fast = if names.size == 1
234
+ name = names.first
235
+ matcher.source.start_with?("\\G(?<#{name}>") && !matcher.match?('/') ? name : nil
236
+ end
237
+ [matcher, node, names, fast]
238
+ end
239
+ end
240
+
241
+ protected
242
+
243
+ # Returns the minimum number of guaranteed static steps from this node across
244
+ # all possible paths, before encountering a dynamic edge, a terminal pattern,
245
+ # or an empty node. Branching is allowed; only the minimum depth matters.
246
+ def min_static_depth
247
+ return 0 if @dynamic.any?
248
+ return 0 if @patterns.any?
249
+ return 0 if @static.empty?
250
+ 1 + @static.values.map { |node| node.min_static_depth }.min
251
+ end
252
+
253
+ private
254
+
255
+ # Builds a hash whose keys are +stride+-character strings and whose values are
256
+ # the trie nodes reached after consuming exactly those characters.
257
+ def build_stride_hash(stride)
258
+ stride.times.reduce({ "" => self }) do |frontier, _|
259
+ frontier.each_with_object({}) do |(prefix, node), nxt|
260
+ node.static.each { |char, child| nxt[prefix + char] = child }
261
+ end
262
+ end
263
+ end
264
+
169
265
  end
170
266
 
171
267
  private_constant :Trie
@@ -3,6 +3,7 @@ require 'mustermann'
3
3
  require 'mustermann/expander'
4
4
  require 'mustermann/set/cache'
5
5
  require 'mustermann/set/linear'
6
+ require 'mustermann/set/strict_order'
6
7
  require 'mustermann/set/trie'
7
8
 
8
9
  module Mustermann
@@ -61,11 +62,27 @@ module Mustermann
61
62
  # Mustermann::Set.new { { '/users/:id' => :users } }
62
63
  #
63
64
  # @param mapping [Array] initial patterns or mappings to add
64
- # @param additional_values [:raise, :ignore, :append] behavior when extra keys are passed to {#expand};
65
- # defaults to +:raise+
66
- # @param options [Hash] pattern options forwarded to {Mustermann.new} (e.g. +type: :rails+)
65
+ #
66
+ # @param additional_values [:raise, :ignore, :append] behavior when extra keys are passed to {#expand}.
67
+ # Defaults to +:raise+
68
+ #
69
+ # @param use_trie [Boolean, Integer]
70
+ # whether to use a trie for matching
71
+ # If an Integer is given, it is the number of patterns at which to switch from linear to trie matching.
72
+ # Defaults to 50
73
+ #
74
+ # @param use_cache [Boolean]
75
+ # whether to cache matches not yet garbage collected. Defaults to +true+
76
+ #
77
+ # @param strict_order [Boolean]
78
+ # whether to match patterns in strict insertion order rather than trie order. Defaults to +false+.
79
+ # See {#use_strict_order?} for details
80
+ #
81
+ # @param options [Hash]
82
+ # pattern options forwarded to {Mustermann.new} (e.g. +type: :rails+)
83
+ #
67
84
  # @raise [ArgumentError] if +additional_values+ is not a recognized behavior symbol
68
- def initialize(*mapping, additional_values: :raise, use_trie: 50, use_cache: true, **options, &block)
85
+ def initialize(*mapping, additional_values: :raise, use_trie: 50, use_cache: true, strict_order: false, **options, &block)
69
86
  raise ArgumentError, "Illegal value %p for additional_values" % additional_values unless Expander::ADDITIONAL_VALUES.include? additional_values
70
87
  raise ArgumentError, "Illegal value %p for use_trie" % use_trie unless [true, false].include?(use_trie) or use_trie.is_a? Integer
71
88
 
@@ -77,6 +94,7 @@ module Mustermann
77
94
  @options = {}
78
95
  @expanders = {}
79
96
  @additional_values = additional_values
97
+ @strict_order = strict_order
80
98
 
81
99
  options.each do |key, value|
82
100
  if key.is_a? Symbol
@@ -89,8 +107,66 @@ module Mustermann
89
107
  update(mapping)
90
108
 
91
109
  block.arity == 0 ? update(yield) : yield(self) if block
110
+
111
+ optimize!
92
112
  end
93
113
 
114
+ # A set can match patterns and values in loose or strict insertion order.
115
+ #
116
+ # You have the following guarantees without strict ordering:
117
+ # - Patterns with dynamic segments in the same position and equal static parts will always match in the order they were added.
118
+ # - Multiple values for the same pattern will retain their insertion order in regards to that pattern.
119
+ #
120
+ # Trade-offs without strict ordering:
121
+ # - Static segments may be favored over dynamic segments. If you want to guarantee this behavior, enable trie-mode proactively.
122
+ # - When a pattern has multiple values, these will follow each other directly when using {#match_all} or {#peek_match_all}.
123
+ #
124
+ # Strict ordering comes with both a performance overhead and marginally increased memory usage.
125
+ # How big the performance overhead is depends on the number of patterns that overlap in the strings they successfully match against.
126
+ # It does use Ruby's built-in sorting, which on MRI is based on quicksort. The memory overhead grows linear with the number
127
+ # of pattern and value combinations, but is generally small compared to the memory used by the patterns and values themselves.
128
+ #
129
+ # With strict ordering enabled, patterns and values are guaranteed to occur in insertion order.
130
+ #
131
+ # @example Without strict ordering, not using a trie
132
+ # set = Mustermann::Set.new(use_trie: false)
133
+ #
134
+ # set.add("/:path", :first)
135
+ # set.add("/static", :second)
136
+ # set.add("/:path", :third)
137
+ #
138
+ # set.match("/static").value # => :first
139
+ # set.match_all("/static").map(&:value) # => [:first, :third, :second]
140
+ #
141
+ # @example Without strict ordering, using a trie
142
+ # set = Mustermann::Set.new(use_trie: true)
143
+ #
144
+ # set.add("/:path", :first)
145
+ # set.add("/static", :second)
146
+ # set.add("/:path", :third)
147
+ #
148
+ # set.match("/static").value # => :second
149
+ # set.match_all("/static").map(&:value) # => [:second, :first, :third]
150
+ #
151
+ # @example With strict ordering
152
+ # set = Mustermann::Set.new(strict_order: true)
153
+ #
154
+ # set.add("/:path", :first)
155
+ # set.add("/static", :second)
156
+ # set.add("/:path", :third)
157
+ #
158
+ # set.match("/static").value # => :first
159
+ # set.match_all("/static").map(&:value) # => [:first, :second, :third]
160
+ #
161
+ # @return [Boolean] whether matching happens in strict pattern/value insertion order
162
+ def strict_order? = @strict_order
163
+
164
+ # @return [Boolean] whether caching is enabled
165
+ def use_cache? = @use_cache
166
+
167
+ # @return [Boolean] whether trie optimization is enabled
168
+ def use_trie? = @use_trie == true
169
+
94
170
  # Adds a pattern to the set, optionally associated with one or more values.
95
171
  #
96
172
  # If the pattern is given as a String it will be compiled via {Mustermann.new}
@@ -129,6 +205,7 @@ module Mustermann
129
205
  @reverse_mapping[value] ||= []
130
206
  @reverse_mapping[value] << pattern unless @reverse_mapping[value].include? pattern
131
207
  @expanders[value]&.add(pattern)
208
+ @matcher.track(pattern, value) if strict_order?
132
209
  end
133
210
 
134
211
  self
@@ -148,8 +225,8 @@ module Mustermann
148
225
  # set['/users/42'] # => :users_show (or nil)
149
226
  #
150
227
  # @example Pattern lookup
151
- # pat = Mustermann.new('/users/:id')
152
- # set[pat] # => :users_show (or nil)
228
+ # pattern = Mustermann.new('/users/:id')
229
+ # set[pattern] # => :users_show (or nil)
153
230
  #
154
231
  # @param pattern_or_string [String, Pattern]
155
232
  # @return [Object, nil] the associated value, or +nil+ if not found
@@ -300,6 +377,9 @@ module Mustermann
300
377
  # @!visibility private
301
378
  def values_for_pattern(pattern) = @mapping[pattern] # :nodoc:
302
379
 
380
+ # Runs trie optimizations pro-actively and explicitly rather than at match time.
381
+ def optimize! = @matcher&.optimize!
382
+
303
383
  protected
304
384
 
305
385
  attr_reader :mapping
@@ -307,21 +387,22 @@ module Mustermann
307
387
  private
308
388
 
309
389
  def add_pattern(pattern)
310
- case @use_trie
311
- when true
312
- @matcher ||= Trie.new(self, @mapping.keys)
313
- when Integer
314
- if @mapping.size >= @use_trie
315
- @matcher = Trie.new(self, @mapping.keys)
316
- @use_trie = true
317
- end
390
+ if @use_trie.is_a? Integer and @mapping.size >= @use_trie
391
+ @use_trie = true
392
+ @matcher = build_matcher
318
393
  end
319
394
 
320
- @matcher ||= Linear.new(self, @mapping.keys)
321
- @matcher = Cache.new(@matcher) if @use_cache and not @matcher.is_a? Cache
395
+ @matcher ||= build_matcher
322
396
  @matcher.add(pattern)
323
-
324
397
  @expanders[self]&.add(pattern)
325
398
  end
399
+
400
+ def build_matcher
401
+ factory = use_trie? ? Trie : Linear
402
+ matcher = factory.new(self, @mapping.keys)
403
+ matcher = StrictOrder.new(matcher) if strict_order?
404
+ matcher = Cache.new(matcher) if use_cache?
405
+ matcher
406
+ end
326
407
  end
327
408
  end
@@ -2,6 +2,7 @@
2
2
  require 'mustermann'
3
3
  require 'mustermann/identity'
4
4
  require 'mustermann/ast/pattern'
5
+ require 'mustermann/ast/fast_pattern'
5
6
  require 'mustermann/sinatra/parser'
6
7
  require 'mustermann/sinatra/safe_renderer'
7
8
  require 'mustermann/sinatra/try_convert'
@@ -15,6 +16,7 @@ module Mustermann
15
16
  # @see Mustermann::Pattern
16
17
  # @see file:README.md#sinatra Syntax description in the README
17
18
  class Sinatra < AST::Pattern
19
+ include AST::FastPattern
18
20
  include Concat::Native
19
21
  register :sinatra
20
22
 
@@ -1,4 +1,4 @@
1
1
  # frozen_string_literal: true
2
2
  module Mustermann
3
- VERSION ||= '4.0.0.alpha'
3
+ VERSION ||= '4.0.0.alpha3'
4
4
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: mustermann
3
3
  version: !ruby/object:Gem::Version
4
- version: 4.0.0.alpha
4
+ version: 4.0.0.alpha3
5
5
  platform: ruby
6
6
  authors:
7
7
  - Konstantin Haase
@@ -30,6 +30,7 @@ files:
30
30
  - lib/mustermann/ast/boundaries.rb
31
31
  - lib/mustermann/ast/compiler.rb
32
32
  - lib/mustermann/ast/expander.rb
33
+ - lib/mustermann/ast/fast_pattern.rb
33
34
  - lib/mustermann/ast/node.rb
34
35
  - lib/mustermann/ast/param_scanner.rb
35
36
  - lib/mustermann/ast/parser.rb
@@ -57,6 +58,7 @@ files:
57
58
  - lib/mustermann/set/cache.rb
58
59
  - lib/mustermann/set/linear.rb
59
60
  - lib/mustermann/set/match.rb
61
+ - lib/mustermann/set/strict_order.rb
60
62
  - lib/mustermann/set/trie.rb
61
63
  - lib/mustermann/sinatra.rb
62
64
  - lib/mustermann/sinatra/parser.rb