accept_language 2.1.0 → 2.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 7273e9328183e3dee11fd68a6598d82c67efbb8bab156d6d9b3424d9ed45dcca
4
- data.tar.gz: ec31e8a4ac07501f1c481e65452f362be2d7669d93eea626977f25c3aca88dc2
3
+ metadata.gz: a993b9e4d4792701b09a650afb27011ff9a94ba104362a8c542d01ee389ca5e9
4
+ data.tar.gz: 129990017c1827e87e95847d8f8f42fb8c85b2d5b8146da5e6aeecb6ac7853ea
5
5
  SHA512:
6
- metadata.gz: 36120af5b03b49ea9dce1d50e5c7915c62bf9b0fa09c1130516090e4e3c882b924035f963ad2dc559fda0e818633b6a3e77aba3e46b25533bc972f5dc23ca729
7
- data.tar.gz: d11855f60c7a4a35c8f675ea5ffc4b320db5a4b40cc95e57143b3dad3c188580830ae0052ac7e934f5ae4c5f2be582db47b4e85b9513b78b4ec1fc69a97ee850
6
+ metadata.gz: 2cf1e95c98cf16c78b33f7db1e666ce834f62d436a1fc84ffee28df36dfbe41a1595710cc539e7cdbc63431c13d51b55065fdcf037a01114cc639451de33498d
7
+ data.tar.gz: 63c161793225af35b1c5f73364dee830ba1429a2cc02e431c3422da2cf9e60edb76b9f601535f9cdb3afb982ff9d28b5f3c148a2fc8bad8320deba65115b1d23
data/README.md CHANGED
@@ -1,6 +1,6 @@
1
1
  # AcceptLanguage
2
2
 
3
- A lightweight, thread-safe Ruby library for parsing `Accept-Language` HTTP headers as defined in [RFC 2616](https://tools.ietf.org/html/rfc2616#section-14.4).
3
+ A lightweight, thread-safe Ruby library for parsing the `Accept-Language` HTTP header as defined in [RFC 2616](https://tools.ietf.org/html/rfc2616#section-14.4), with full support for [BCP 47](https://tools.ietf.org/html/bcp47) language tags.
4
4
 
5
5
  [![Version](https://img.shields.io/github/v/tag/cyril/accept_language.rb?label=Version&logo=github)](https://github.com/cyril/accept_language.rb/tags)
6
6
  [![Yard documentation](https://img.shields.io/badge/Yard-documentation-blue.svg?logo=github)](https://rubydoc.info/github/cyril/accept_language.rb/main)
@@ -8,14 +8,6 @@ A lightweight, thread-safe Ruby library for parsing `Accept-Language` HTTP heade
8
8
  ![RuboCop](https://github.com/cyril/accept_language.rb/actions/workflows/rubocop.yml/badge.svg?branch=main)
9
9
  [![License](https://img.shields.io/github/license/cyril/accept_language.rb?label=License&logo=github)](https://github.com/cyril/accept_language.rb/raw/main/LICENSE.md)
10
10
 
11
- ## Features
12
-
13
- - Thread-safe
14
- - No framework dependencies
15
- - Case-insensitive matching
16
- - BCP 47 language tag support
17
- - Wildcard and exclusion handling
18
-
19
11
  ## Installation
20
12
 
21
13
  ```ruby
@@ -25,51 +17,145 @@ gem "accept_language"
25
17
  ## Usage
26
18
 
27
19
  ```ruby
28
- AcceptLanguage.parse("en-GB, en;q=0.9").match(:en, :"en-GB")
29
- # => :"en-GB"
20
+ AcceptLanguage.parse("da, en-GB;q=0.8, en;q=0.7").match(:en, :da)
21
+ # => :da
30
22
  ```
31
23
 
24
+ ## Behavior
25
+
32
26
  ### Quality values
33
27
 
34
- Quality values (q-values) indicate preference order from 0 to 1:
28
+ Quality values (q-values) express relative preference, ranging from `0` (unacceptable) to `1` (most preferred). When omitted, the default is `1`.
35
29
 
36
30
  ```ruby
37
31
  parser = AcceptLanguage.parse("da, en-GB;q=0.8, en;q=0.7")
38
32
 
39
- parser.match(:en, :da) # => :da
40
- parser.match(:en, :"en-GB") # => :"en-GB"
41
- parser.match(:fr) # => nil
33
+ parser.match(:en, :da) # => :da (q=1 > q=0.8)
34
+ parser.match(:en, :"en-GB") # => :"en-GB" (q=0.8 > q=0.7)
35
+ parser.match(:ja) # => nil (no match)
36
+ ```
37
+
38
+ Per RFC 2616 Section 3.9, valid q-values have at most three decimal places: `0`, `0.7`, `0.85`, `1.000`. Invalid q-values are ignored.
39
+
40
+ ### Identical quality values
41
+
42
+ When multiple languages share the same q-value, the order of declaration in the header determines priority—the first declared language is preferred:
43
+
44
+ ```ruby
45
+ AcceptLanguage.parse("en;q=0.8, fr;q=0.8").match(:en, :fr)
46
+ # => :en (declared first)
47
+
48
+ AcceptLanguage.parse("fr;q=0.8, en;q=0.8").match(:en, :fr)
49
+ # => :fr (declared first)
50
+ ```
51
+
52
+ ### Prefix matching
53
+
54
+ Per RFC 2616 Section 14.4, a language-range matches any language-tag that exactly equals the range or begins with the range followed by `-`:
55
+
56
+ ```ruby
57
+ AcceptLanguage.parse("zh").match(:"zh-TW")
58
+ # => :"zh-TW" ("zh" matches "zh-TW")
59
+
60
+ AcceptLanguage.parse("zh-TW").match(:zh)
61
+ # => nil ("zh-TW" does not match "zh")
62
+ ```
63
+
64
+ Note that prefix matching follows hyphen boundaries—`zh` does not match `zhx`:
65
+
66
+ ```ruby
67
+ AcceptLanguage.parse("zh").match(:zhx)
68
+ # => nil ("zhx" is a different language code)
69
+ ```
70
+
71
+ ### Wildcards
72
+
73
+ The wildcard `*` matches any language not matched by another range:
74
+
75
+ ```ruby
76
+ AcceptLanguage.parse("de, *;q=0.5").match(:ja)
77
+ # => :ja (matched by wildcard)
78
+
79
+ AcceptLanguage.parse("de, *;q=0.5").match(:de, :ja)
80
+ # => :de (explicit match preferred over wildcard)
42
81
  ```
43
82
 
44
- ### Language variants
83
+ ### Exclusions
84
+
85
+ A q-value of `0` explicitly excludes a language:
86
+
87
+ ```ruby
88
+ AcceptLanguage.parse("*, en;q=0").match(:en)
89
+ # => nil (English excluded)
90
+
91
+ AcceptLanguage.parse("*, en;q=0").match(:ja)
92
+ # => :ja (matched by wildcard)
93
+ ```
45
94
 
46
- A generic language tag matches its regional variants, but not the reverse:
95
+ Exclusions apply to prefix matches:
47
96
 
48
97
  ```ruby
49
- AcceptLanguage.parse("fr").match(:"fr-CH") # => :"fr-CH"
50
- AcceptLanguage.parse("fr-CH").match(:fr) # => nil
98
+ AcceptLanguage.parse("*, en;q=0").match(:"en-GB")
99
+ # => nil (en-GB excluded via "en" prefix)
51
100
  ```
52
101
 
53
- ### Wildcards and exclusions
102
+ ### Case insensitivity
54
103
 
55
- The wildcard `*` matches any language. A q-value of 0 explicitly excludes a language:
104
+ Matching is case-insensitive per RFC 2616, but the original case of available language tags is preserved:
56
105
 
57
106
  ```ruby
58
- AcceptLanguage.parse("de-DE, *;q=0.5").match(:fr) # => :fr
59
- AcceptLanguage.parse("*, en;q=0").match(:en) # => nil
60
- AcceptLanguage.parse("*, en;q=0").match(:fr) # => :fr
107
+ AcceptLanguage.parse("EN-GB").match(:"en-gb")
108
+ # => :"en-gb"
109
+
110
+ AcceptLanguage.parse("en-gb").match(:"EN-GB")
111
+ # => :"EN-GB"
61
112
  ```
62
113
 
63
- ### Case sensitivity
114
+ ### BCP 47 language tags
64
115
 
65
- Matching is case-insensitive but preserves the case of the available language tag:
116
+ Full support for [BCP 47](https://tools.ietf.org/html/bcp47) language tags:
66
117
 
67
118
  ```ruby
68
- AcceptLanguage.parse("en-GB").match("en-gb") # => "en-gb"
69
- AcceptLanguage.parse("en-gb").match("en-GB") # => "en-GB"
119
+ # Script subtags
120
+ AcceptLanguage.parse("zh-Hant").match(:"zh-Hant-TW", :"zh-Hans-CN")
121
+ # => :"zh-Hant-TW"
122
+
123
+ # Variant subtags
124
+ AcceptLanguage.parse("de-1996, de;q=0.9").match(:"de-CH-1996", :"de-CH")
125
+ # => :"de-CH-1996"
126
+ ```
127
+
128
+ ## Integration examples
129
+
130
+ ### Rack
131
+
132
+ ```ruby
133
+ # config.ru
134
+ class LocaleMiddleware
135
+ def initialize(app, available_locales:, default_locale:)
136
+ @app = app
137
+ @available_locales = available_locales
138
+ @default_locale = default_locale
139
+ end
140
+
141
+ def call(env)
142
+ locale = detect_locale(env) || @default_locale
143
+ env["rack.locale"] = locale
144
+ @app.call(env)
145
+ end
146
+
147
+ private
148
+
149
+ def detect_locale(env)
150
+ header = env["HTTP_ACCEPT_LANGUAGE"]
151
+ return unless header
152
+
153
+ AcceptLanguage.parse(header).match(*@available_locales)
154
+ end
155
+ end
70
156
  ```
71
157
 
72
- ## Rails integration
158
+ ### Ruby on Rails
73
159
 
74
160
  ```ruby
75
161
  # app/controllers/application_controller.rb
@@ -100,13 +186,15 @@ end
100
186
 
101
187
  ## Documentation
102
188
 
103
- - [API Documentation](https://rubydoc.info/github/cyril/accept_language.rb/main)
189
+ - [API documentation](https://rubydoc.info/github/cyril/accept_language.rb/main)
190
+ - [RFC 2616 Section 14.4](https://tools.ietf.org/html/rfc2616#section-14.4)
191
+ - [BCP 47](https://tools.ietf.org/html/bcp47)
104
192
  - [Language negotiation with Ruby](https://dev.to/cyri_/language-negotiation-with-ruby-5166)
105
193
  - [Rubyで言語ネゴシエーション](https://qiita.com/cyril/items/45dc233edb7be9d614e7)
106
194
 
107
195
  ## Versioning
108
196
 
109
- This library follows [Semantic Versioning 2.0.0](https://semver.org/).
197
+ This library follows [Semantic Versioning 2.0](https://semver.org/).
110
198
 
111
199
  ## License
112
200
 
@@ -1,57 +1,280 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module AcceptLanguage
4
- # Matches Accept-Language header values against application-supported languages to determine
5
- # the optimal language choice. Handles quality values, wildcards, and language tag matching
6
- # according to RFC 2616 specifications.
4
+ # = Language Preference Matcher
5
+ #
6
+ # Matcher implements the language matching algorithm defined in RFC 2616
7
+ # Section 14.4. It takes parsed language preferences (from {Parser}) and
8
+ # determines the optimal language choice from a set of available languages.
9
+ #
10
+ # == Overview
11
+ #
12
+ # The matching process balances multiple factors:
13
+ #
14
+ # 1. **Quality values**: Higher q-values indicate stronger user preference
15
+ # 2. **Declaration order**: Tie-breaker when q-values are equal
16
+ # 3. **Prefix matching**: Allows +en+ to match +en-US+, +en-GB+, etc.
17
+ # 4. **Wildcards**: The +*+ range matches any otherwise unmatched language
18
+ # 5. **Exclusions**: Languages with +q=0+ are explicitly unacceptable
19
+ #
20
+ # == RFC 2616 Section 14.4 Compliance
21
+ #
22
+ # This implementation follows the Accept-Language matching rules:
23
+ #
24
+ # > A language-range matches a language-tag if it exactly equals the tag,
25
+ # > or if it exactly equals a prefix of the tag such that the first tag
26
+ # > character following the prefix is "-".
27
+ #
28
+ # This means:
29
+ # - +en+ matches +en+, +en-US+, +en-GB+, +en-Latn-US+
30
+ # - +en-US+ matches only +en-US+ (not +en+ or +en-GB+)
31
+ # - +en+ does NOT match +eng+ (no hyphen boundary)
32
+ #
33
+ # == Quality Value Semantics
34
+ #
35
+ # Quality values have specific meanings per RFC 2616:
36
+ #
37
+ # - +q=1+ (or omitted): Most preferred
38
+ # - +0 < q < 1+: Acceptable with relative preference
39
+ # - +q=0+: Explicitly NOT acceptable
40
+ #
41
+ # The +q=0+ case is special: it doesn't just indicate low preference, it
42
+ # completely excludes the language from consideration. This is used with
43
+ # wildcards to express "any language except X":
44
+ #
45
+ # Accept-Language: *, en;q=0
46
+ #
47
+ # == Wildcard Behavior
48
+ #
49
+ # The wildcard +*+ matches any language not explicitly matched by another
50
+ # language-range. When processing a wildcard:
51
+ #
52
+ # 1. Collect all explicitly listed language tags (excluding the wildcard)
53
+ # 2. Find available languages that don't match any explicit tag
54
+ # 3. Return the first such language
55
+ #
56
+ # This ensures explicit preferences always take priority over the wildcard.
57
+ #
58
+ # == Internal Design
59
+ #
60
+ # The Matcher separates languages into two categories during initialization:
61
+ #
62
+ # - **preferred_langtags**: Languages with q > 0, sorted by descending quality
63
+ # - **excluded_langtags**: Languages with q = 0 (explicitly unacceptable)
64
+ #
65
+ # This separation optimizes the matching algorithm by allowing quick
66
+ # filtering of excluded languages before attempting matches.
67
+ #
68
+ # == Thread Safety
69
+ #
70
+ # Matcher instances are immutable after initialization. Both +preferred_langtags+
71
+ # and +excluded_langtags+ are frozen, making instances safe for concurrent use.
7
72
  #
8
73
  # @api private
9
- # @note This class is intended for internal use by {Parser} and should not be instantiated directly.
74
+ # @note This class is used internally by {Parser#match} and should not be
75
+ # instantiated directly. Use {AcceptLanguage.parse} followed by
76
+ # {Parser#match} instead.
77
+ #
78
+ # @example Internal usage (via Parser)
79
+ # # Don't do this:
80
+ # matcher = AcceptLanguage::Matcher.new("en" => 1000, "fr" => 800)
81
+ #
82
+ # # Do this instead:
83
+ # AcceptLanguage.parse("en, fr;q=0.8").match(:en, :fr)
84
+ #
85
+ # @see Parser#match
86
+ # @see https://tools.ietf.org/html/rfc2616#section-14.4 RFC 2616 Section 14.4
10
87
  class Matcher
88
+ # The hyphen character used as a subtag delimiter in BCP 47 language tags.
89
+ #
90
+ # Per RFC 2616 Section 14.4, prefix matching must respect hyphen boundaries.
91
+ # A language-range matches a language-tag only if the character immediately
92
+ # following the prefix is a hyphen.
93
+ #
94
+ # @api private
95
+ # @return [String] "-"
96
+ HYPHEN = "-"
97
+
98
+ # Error message raised when an available language tag is not a Symbol.
99
+ #
100
+ # This guards against accidental non-Symbol values in the available languages
101
+ # array, which would cause unexpected behavior during matching.
102
+ #
103
+ # @api private
104
+ # @return [String]
105
+ LANGTAG_TYPE_ERROR = "Language tag must be a Symbol"
106
+
107
+ # The wildcard character that matches any language not explicitly listed.
108
+ #
109
+ # Per RFC 2616 Section 14.4, the wildcard has special semantics:
110
+ # - It matches any language not matched by other ranges
111
+ # - +*;q=0+ makes all unlisted languages unacceptable
112
+ # - It has lower effective priority than explicit language tags
113
+ #
11
114
  # @api private
115
+ # @return [String] "*"
12
116
  WILDCARD = "*"
13
117
 
118
+ # Language tags explicitly marked as unacceptable (+q=0+).
119
+ #
120
+ # These tags are filtered out from available languages before any
121
+ # matching occurs. Exclusions apply via prefix matching, so excluding
122
+ # +en+ also excludes +en-US+, +en-GB+, etc.
123
+ #
124
+ # @note The wildcard +*+ is never added to this set, even when +*;q=0+
125
+ # is specified. Wildcard exclusion is handled implicitly: when +*;q=0+
126
+ # and no other languages have +q > 0+, the preferred_langtags list is
127
+ # empty, resulting in no matches.
128
+ #
14
129
  # @api private
15
- attr_reader :excluded_langtags, :preferred_langtags
130
+ # @return [Set<String>] downcased language tags with q=0
131
+ #
132
+ # @example
133
+ # # For "*, en;q=0, de;q=0"
134
+ # matcher.excluded_langtags
135
+ # # => #<Set: {"en", "de"}>
136
+ attr_reader :excluded_langtags
16
137
 
138
+ # Language tags sorted by preference (descending quality value).
139
+ #
140
+ # This array contains only tags with +q > 0+, ordered from most preferred
141
+ # to least preferred. When quality values are equal, the original
142
+ # declaration order from the Accept-Language header is preserved.
143
+ #
144
+ # The stable sort guarantee ensures deterministic matching: given the
145
+ # same header and available languages, the result is always the same.
146
+ #
17
147
  # @api private
148
+ # @return [Array<String>] downcased language tags, highest quality first
149
+ #
150
+ # @example
151
+ # # For "fr;q=0.8, en, de;q=0.9"
152
+ # # Sorted: en (q=1), de (q=0.9), fr (q=0.8)
153
+ # matcher.preferred_langtags
154
+ # # => ["en", "de", "fr"]
155
+ attr_reader :preferred_langtags
156
+
157
+ # Creates a new Matcher instance from parsed language preferences.
158
+ #
159
+ # The initialization process:
160
+ #
161
+ # 1. Separates excluded tags (+q=0+) from preferred tags (+q > 0+)
162
+ # 2. Sorts preferred tags by descending quality value
163
+ # 3. Preserves original order for tags with equal quality (stable sort)
164
+ #
165
+ # == Exclusion Rules
166
+ #
167
+ # Only specific language tags with +q=0+ are added to the exclusion set.
168
+ # The wildcard +*+ is explicitly NOT added even when +*;q=0+ is present,
169
+ # because:
170
+ #
171
+ # - Adding +*+ to exclusions would break prefix matching logic
172
+ # - +*;q=0+ semantics are: "no unlisted language is acceptable"
173
+ # - This is achieved by having an empty preferred_langtags (no wildcards)
174
+ #
175
+ # == Stable Sorting
176
+ #
177
+ # Ruby's +sort_by+ is stable since Ruby 2.0, meaning elements with equal
178
+ # sort keys maintain their relative order. This ensures that when multiple
179
+ # languages have the same quality value, the first one declared in the
180
+ # Accept-Language header wins.
181
+ #
182
+ # @api private
183
+ # @param languages_range [Hash{String => Integer}] language tags mapped to
184
+ # quality values (0-1000), as produced by {Parser}
185
+ #
186
+ # @example
187
+ # Matcher.new("en" => 1000, "fr" => 800, "de" => 0)
188
+ # # preferred_langtags: ["en", "fr"]
189
+ # # excluded_langtags: #<Set: {"de"}>
18
190
  def initialize(**languages_range)
19
191
  @excluded_langtags = ::Set[]
20
- langtags = []
21
-
22
- languages_range.select do |langtag, quality|
23
- if quality.zero?
24
- @excluded_langtags << langtag unless wildcard?(langtag)
25
- else
26
- level = (quality * 1_000).to_i
27
- langtags[level] = langtag
28
- end
192
+
193
+ languages_range.each do |langtag, quality|
194
+ next unless quality.zero? && !wildcard?(langtag)
195
+
196
+ # Exclude specific language tags, but NOT the wildcard.
197
+ # When "*;q=0" is specified, all non-listed languages become
198
+ # unacceptable implicitly (they won't match any preferred_langtags).
199
+ # Adding "*" to excluded_langtags would break prefix_match? logic.
200
+ @excluded_langtags << langtag
29
201
  end
30
202
 
31
- @preferred_langtags = langtags.compact.reverse
203
+ # Sort by descending quality. Ruby's sort_by is stable, so languages
204
+ # with identical quality values preserve their original order from
205
+ # the Accept-Language header (first declared = higher priority).
206
+ @preferred_langtags = languages_range
207
+ .reject { |_, quality| quality.zero? }
208
+ .sort_by { |_, quality| -quality }
209
+ .map(&:first)
32
210
  end
33
211
 
212
+ # Finds the best matching language from the available options.
213
+ #
214
+ # == Algorithm
215
+ #
216
+ # 1. **Filter**: Remove available languages that match any excluded tag
217
+ # 2. **Match**: For each preferred tag (in quality order):
218
+ # - If it's a wildcard, return the first available language not
219
+ # matching any other preferred tag
220
+ # - Otherwise, return the first available language that matches
221
+ # via exact match or prefix match
222
+ # 3. **Result**: Return the first match found, or +nil+ if none
223
+ #
224
+ # == Return Value
225
+ #
226
+ # The returned value preserves the exact form (case) of the matched
227
+ # element from +available_langtags+. This is important for direct use
228
+ # with APIs like +I18n.locale=+ that may be case-sensitive.
229
+ #
34
230
  # @api private
231
+ # @param available_langtags [Array<Symbol>] languages to match against
232
+ # @return [Symbol, nil] the best matching language, or +nil+
233
+ # @raise [TypeError] if any available language tag is not a Symbol
234
+ #
235
+ # @example Basic matching
236
+ # matcher = Matcher.new("en" => 1000, "fr" => 800)
237
+ # matcher.call(:en, :fr, :de)
238
+ # # => :en
239
+ #
240
+ # @example Prefix matching
241
+ # matcher = Matcher.new("en" => 1000)
242
+ # matcher.call(:"en-US", :"en-GB")
243
+ # # => :"en-US"
244
+ #
245
+ # @example With exclusion
246
+ # matcher = Matcher.new("*" => 500, "en" => 0)
247
+ # matcher.call(:en, :fr)
248
+ # # => :fr
35
249
  def call(*available_langtags)
36
- raise ::ArgumentError, "Language tags cannot be nil" if available_langtags.any?(&:nil?)
37
-
38
250
  filtered_tags = drop_unacceptable(*available_langtags)
39
- return nil if filtered_tags.empty?
251
+ return if filtered_tags.empty?
40
252
 
41
253
  find_best_match(filtered_tags)
42
254
  end
43
255
 
44
256
  private
45
257
 
258
+ # Iterates through preferred languages to find the first match.
259
+ #
260
+ # @param available_langtags [Set<String>] pre-filtered available tags
261
+ # @return [Symbol, nil] the matched tag or nil
46
262
  def find_best_match(available_langtags)
47
263
  preferred_langtags.each do |preferred_tag|
48
264
  match = match_langtag(preferred_tag, available_langtags)
49
- return match if match
265
+ return :"#{match}" unless match.nil?
50
266
  end
51
267
 
52
268
  nil
53
269
  end
54
270
 
271
+ # Attempts to match a single preferred tag against available languages.
272
+ #
273
+ # Handles both wildcard and specific language tags differently.
274
+ #
275
+ # @param preferred_tag [String] the preferred language tag to match
276
+ # @param available_langtags [Set<String>] available tags to search
277
+ # @return [String, nil] the matched tag or nil
55
278
  def match_langtag(preferred_tag, available_langtags)
56
279
  if wildcard?(preferred_tag)
57
280
  any_other_langtag(*available_langtags)
@@ -60,38 +283,105 @@ module AcceptLanguage
60
283
  end
61
284
  end
62
285
 
286
+ # Finds an available language that matches via exact or prefix match.
287
+ #
288
+ # @param preferred_tag [String] the preferred tag (downcased)
289
+ # @param available_langtags [Set<String>] available tags
290
+ # @return [String, nil] the first matching tag or nil
63
291
  def find_matching_tag(preferred_tag, available_langtags)
64
- pattern = /\A#{::Regexp.escape(preferred_tag)}/i
65
- available_langtags.find { |tag| tag.match?(pattern) }
292
+ available_langtags.find { |tag| prefix_match?(preferred_tag, tag) }
66
293
  end
67
294
 
295
+ # Finds an available language for wildcard matching.
296
+ #
297
+ # Returns the first available language that doesn't match any explicitly
298
+ # listed preferred language tag. This implements the RFC 2616 semantics
299
+ # where +*+ matches "any language not matched by another range".
300
+ #
301
+ # @param available_langtags [Array<String>] available tags
302
+ # @return [String, nil] the first non-matching tag or nil
68
303
  def any_other_langtag(*available_langtags)
304
+ langtags = preferred_langtags - [WILDCARD]
305
+
69
306
  available_langtags.find do |available_langtag|
70
- langtags = preferred_langtags - [WILDCARD]
71
- langtags.none? do |tag|
72
- pattern = /\A#{::Regexp.escape(tag)}/i
73
- available_langtag.match?(pattern)
74
- end
307
+ langtags.none? { |tag| prefix_match?(tag, available_langtag) }
75
308
  end
76
309
  end
77
310
 
311
+ # Removes explicitly excluded languages from the available set.
312
+ #
313
+ # Uses prefix matching for exclusions, so excluding +en+ also excludes
314
+ # +en-US+, +en-GB+, etc.
315
+ #
316
+ # @param available_langtags [Array<Symbol>] all available tags
317
+ # @return [Set<String>] tags not matching any exclusion
318
+ # @raise [TypeError] if any tag is not a Symbol
78
319
  def drop_unacceptable(*available_langtags)
79
- available_langtags.inject(::Set[]) do |langtags, available_langtag|
80
- next langtags if unacceptable?(available_langtag)
320
+ available_langtags.each_with_object(::Set[]) do |available_langtag, langtags|
321
+ raise ::TypeError, LANGTAG_TYPE_ERROR unless available_langtag.is_a?(::Symbol)
81
322
 
82
- langtags + ::Set[available_langtag]
323
+ available_langtag = "#{available_langtag}"
324
+ langtags << available_langtag unless unacceptable?(available_langtag)
83
325
  end
84
326
  end
85
327
 
328
+ # Checks if a language tag is explicitly excluded.
329
+ #
330
+ # @param langtag [String] the tag to check (as string)
331
+ # @return [Boolean] true if the tag matches any exclusion
86
332
  def unacceptable?(langtag)
87
- excluded_langtags.any? do |excluded_tag|
88
- pattern = /\A#{::Regexp.escape(excluded_tag)}/i
89
- langtag.match?(pattern)
90
- end
333
+ excluded_langtags.any? { |excluded_tag| prefix_match?(excluded_tag, langtag) }
91
334
  end
92
335
 
336
+ # Checks if a value is the wildcard character.
337
+ #
338
+ # @param value [String] the value to check
339
+ # @return [Boolean] true if the value is "*"
93
340
  def wildcard?(value)
94
341
  value.eql?(WILDCARD)
95
342
  end
343
+
344
+ # Implements RFC 2616 Section 14.4 prefix matching rule.
345
+ #
346
+ # From the specification:
347
+ #
348
+ # > A language-range matches a language-tag if it exactly equals the tag,
349
+ # > or if it exactly equals a prefix of the tag such that the first tag
350
+ # > character following the prefix is "-".
351
+ #
352
+ # This rule ensures that language ranges match at subtag boundaries:
353
+ #
354
+ # - +en+ matches +en+ (exact)
355
+ # - +en+ matches +en-US+ (prefix + hyphen)
356
+ # - +en+ does NOT match +eng+ (no hyphen after prefix)
357
+ # - +en-US+ does NOT match +en+ (prefix is longer than tag)
358
+ #
359
+ # Matching is case-insensitive per RFC 2616, using +casecmp?+ for
360
+ # efficient comparison without allocating new strings.
361
+ #
362
+ # @param prefix [String] the language-range to match (downcased)
363
+ # @param tag [String] the language-tag to test (any case)
364
+ # @return [Boolean] true if prefix matches tag per RFC 2616 rules
365
+ #
366
+ # @example Exact matches
367
+ # prefix_match?("en", "en") # => true
368
+ # prefix_match?("en", "EN") # => true
369
+ # prefix_match?("en-us", "en-US") # => true
370
+ #
371
+ # @example Prefix matches
372
+ # prefix_match?("en", "en-us") # => true
373
+ # prefix_match?("en", "en-GB") # => true
374
+ # prefix_match?("zh", "zh-Hant-TW") # => true
375
+ #
376
+ # @example Non-matches
377
+ # prefix_match?("en-us", "en") # => false (prefix longer than tag)
378
+ # prefix_match?("en", "eng") # => false (no hyphen boundary)
379
+ # prefix_match?("en", "fr") # => false (different language)
380
+ def prefix_match?(prefix, tag)
381
+ return true if tag.casecmp?(prefix)
382
+ return false if tag.length <= prefix.length
383
+
384
+ tag[0, prefix.length].casecmp?(prefix) && tag[prefix.length] == HYPHEN
385
+ end
96
386
  end
97
387
  end