accept_language 2.1.1 → 2.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: c1d4f90fc40c062ac4f250c1c3b8ac8540796b232cf63b7d5c25b5e2e3c9124e
4
- data.tar.gz: 150bb9def9e5f4799c432df6d6c364fedf88132b117d28e79a6ed44f467dcc70
3
+ metadata.gz: a993b9e4d4792701b09a650afb27011ff9a94ba104362a8c542d01ee389ca5e9
4
+ data.tar.gz: 129990017c1827e87e95847d8f8f42fb8c85b2d5b8146da5e6aeecb6ac7853ea
5
5
  SHA512:
6
- metadata.gz: 537d924a23dc3c0fe8fb523556a6da653e16471b6b8ca94e0ee57f36fa1d6842103e380c3ea1e2dbd468b06a65a698ecdbf8923235f4eac0031a52801ba446f8
7
- data.tar.gz: 8e48f57d2f0a4005483d29cf5503accc39eecffe19f78e8848dfa216422d1b2792f50cc5199c938674a483ef90ca20127fbfe452c0eebae9e30b36d7b8dc0bc2
6
+ metadata.gz: 2cf1e95c98cf16c78b33f7db1e666ce834f62d436a1fc84ffee28df36dfbe41a1595710cc539e7cdbc63431c13d51b55065fdcf037a01114cc639451de33498d
7
+ data.tar.gz: 63c161793225af35b1c5f73364dee830ba1429a2cc02e431c3422da2cf9e60edb76b9f601535f9cdb3afb982ff9d28b5f3c148a2fc8bad8320deba65115b1d23
data/README.md CHANGED
@@ -1,6 +1,6 @@
1
1
  # AcceptLanguage
2
2
 
3
- A lightweight, thread-safe Ruby library for parsing `Accept-Language` HTTP headers as defined in [RFC 2616](https://tools.ietf.org/html/rfc2616#section-14.4), with full support for [BCP 47](https://tools.ietf.org/html/bcp47) language tags.
3
+ A lightweight, thread-safe Ruby library for parsing the `Accept-Language` HTTP header as defined in [RFC 2616](https://tools.ietf.org/html/rfc2616#section-14.4), with full support for [BCP 47](https://tools.ietf.org/html/bcp47) language tags.
4
4
 
5
5
  [![Version](https://img.shields.io/github/v/tag/cyril/accept_language.rb?label=Version&logo=github)](https://github.com/cyril/accept_language.rb/tags)
6
6
  [![Yard documentation](https://img.shields.io/badge/Yard-documentation-blue.svg?logo=github)](https://rubydoc.info/github/cyril/accept_language.rb/main)
@@ -8,14 +8,6 @@ A lightweight, thread-safe Ruby library for parsing `Accept-Language` HTTP heade
8
8
  ![RuboCop](https://github.com/cyril/accept_language.rb/actions/workflows/rubocop.yml/badge.svg?branch=main)
9
9
  [![License](https://img.shields.io/github/license/cyril/accept_language.rb?label=License&logo=github)](https://github.com/cyril/accept_language.rb/raw/main/LICENSE.md)
10
10
 
11
- ## Features
12
-
13
- - Thread-safe
14
- - No framework dependencies
15
- - Case-insensitive matching
16
- - BCP 47 language tag support
17
- - Wildcard and exclusion handling
18
-
19
11
  ## Installation
20
12
 
21
13
  ```ruby
@@ -25,69 +17,145 @@ gem "accept_language"
25
17
  ## Usage
26
18
 
27
19
  ```ruby
28
- AcceptLanguage.parse("en-GB, en;q=0.9").match(:en, :"en-GB")
29
- # => :"en-GB"
20
+ AcceptLanguage.parse("da, en-GB;q=0.8, en;q=0.7").match(:en, :da)
21
+ # => :da
30
22
  ```
31
23
 
24
+ ## Behavior
25
+
32
26
  ### Quality values
33
27
 
34
- Quality values (q-values) indicate preference order from 0 to 1:
28
+ Quality values (q-values) express relative preference, ranging from `0` (unacceptable) to `1` (most preferred). When omitted, the default is `1`.
35
29
 
36
30
  ```ruby
37
31
  parser = AcceptLanguage.parse("da, en-GB;q=0.8, en;q=0.7")
38
32
 
39
- parser.match(:en, :da) # => :da
40
- parser.match(:en, :"en-GB") # => :"en-GB"
41
- parser.match(:fr) # => nil
33
+ parser.match(:en, :da) # => :da (q=1 > q=0.8)
34
+ parser.match(:en, :"en-GB") # => :"en-GB" (q=0.8 > q=0.7)
35
+ parser.match(:ja) # => nil (no match)
36
+ ```
37
+
38
+ Per RFC 2616 Section 3.9, valid q-values have at most three decimal places: `0`, `0.7`, `0.85`, `1.000`. Invalid q-values are ignored.
39
+
40
+ ### Identical quality values
41
+
42
+ When multiple languages share the same q-value, the order of declaration in the header determines priority—the first declared language is preferred:
43
+
44
+ ```ruby
45
+ AcceptLanguage.parse("en;q=0.8, fr;q=0.8").match(:en, :fr)
46
+ # => :en (declared first)
47
+
48
+ AcceptLanguage.parse("fr;q=0.8, en;q=0.8").match(:en, :fr)
49
+ # => :fr (declared first)
42
50
  ```
43
51
 
44
- ### Language variants
52
+ ### Prefix matching
45
53
 
46
- A generic language tag matches its regional variants, but not the reverse:
54
+ Per RFC 2616 Section 14.4, a language-range matches any language-tag that exactly equals the range or begins with the range followed by `-`:
47
55
 
48
56
  ```ruby
49
- AcceptLanguage.parse("fr").match(:"fr-CH") # => :"fr-CH"
50
- AcceptLanguage.parse("fr-CH").match(:fr) # => nil
57
+ AcceptLanguage.parse("zh").match(:"zh-TW")
58
+ # => :"zh-TW" ("zh" matches "zh-TW")
59
+
60
+ AcceptLanguage.parse("zh-TW").match(:zh)
61
+ # => nil ("zh-TW" does not match "zh")
62
+ ```
63
+
64
+ Note that prefix matching follows hyphen boundaries—`zh` does not match `zhx`:
65
+
66
+ ```ruby
67
+ AcceptLanguage.parse("zh").match(:zhx)
68
+ # => nil ("zhx" is a different language code)
51
69
  ```
52
70
 
53
- ### Wildcards and exclusions
71
+ ### Wildcards
54
72
 
55
- The wildcard `*` matches any language. A q-value of 0 explicitly excludes a language:
73
+ The wildcard `*` matches any language not matched by another range:
56
74
 
57
75
  ```ruby
58
- AcceptLanguage.parse("de-DE, *;q=0.5").match(:fr) # => :fr
59
- AcceptLanguage.parse("*, en;q=0").match(:en) # => nil
60
- AcceptLanguage.parse("*, en;q=0").match(:fr) # => :fr
76
+ AcceptLanguage.parse("de, *;q=0.5").match(:ja)
77
+ # => :ja (matched by wildcard)
78
+
79
+ AcceptLanguage.parse("de, *;q=0.5").match(:de, :ja)
80
+ # => :de (explicit match preferred over wildcard)
61
81
  ```
62
82
 
63
- ### Case sensitivity
83
+ ### Exclusions
64
84
 
65
- Matching is case-insensitive but preserves the case of the available language tag:
85
+ A q-value of `0` explicitly excludes a language:
66
86
 
67
87
  ```ruby
68
- AcceptLanguage.parse("en-GB").match("en-gb") # => "en-gb"
69
- AcceptLanguage.parse("en-gb").match("en-GB") # => "en-GB"
88
+ AcceptLanguage.parse("*, en;q=0").match(:en)
89
+ # => nil (English excluded)
90
+
91
+ AcceptLanguage.parse("*, en;q=0").match(:ja)
92
+ # => :ja (matched by wildcard)
70
93
  ```
71
94
 
72
- ### BCP 47 support
95
+ Exclusions apply to prefix matches:
96
+
97
+ ```ruby
98
+ AcceptLanguage.parse("*, en;q=0").match(:"en-GB")
99
+ # => nil (en-GB excluded via "en" prefix)
100
+ ```
73
101
 
74
- This library supports [BCP 47](https://tools.ietf.org/html/bcp47) language tags, including:
102
+ ### Case insensitivity
75
103
 
76
- - **Script subtags**: `zh-Hans` (Simplified Chinese), `zh-Hant` (Traditional Chinese)
77
- - **Region subtags**: `en-US`, `pt-BR`
78
- - **Variant subtags**: `sl-nedis` (Slovenian Nadiza dialect), `de-1996` (German orthography reform)
104
+ Matching is case-insensitive per RFC 2616, but the original case of available language tags is preserved:
79
105
 
80
106
  ```ruby
81
- # Script variants
82
- AcceptLanguage.parse("zh-Hans").match(:"zh-Hans-CN", :"zh-Hant-TW")
83
- # => :"zh-Hans-CN"
107
+ AcceptLanguage.parse("EN-GB").match(:"en-gb")
108
+ # => :"en-gb"
84
109
 
85
- # Orthography variants (numeric subtags)
110
+ AcceptLanguage.parse("en-gb").match(:"EN-GB")
111
+ # => :"EN-GB"
112
+ ```
113
+
114
+ ### BCP 47 language tags
115
+
116
+ Full support for [BCP 47](https://tools.ietf.org/html/bcp47) language tags:
117
+
118
+ ```ruby
119
+ # Script subtags
120
+ AcceptLanguage.parse("zh-Hant").match(:"zh-Hant-TW", :"zh-Hans-CN")
121
+ # => :"zh-Hant-TW"
122
+
123
+ # Variant subtags
86
124
  AcceptLanguage.parse("de-1996, de;q=0.9").match(:"de-CH-1996", :"de-CH")
87
125
  # => :"de-CH-1996"
88
126
  ```
89
127
 
90
- ## Rails integration
128
+ ## Integration examples
129
+
130
+ ### Rack
131
+
132
+ ```ruby
133
+ # config.ru
134
+ class LocaleMiddleware
135
+ def initialize(app, available_locales:, default_locale:)
136
+ @app = app
137
+ @available_locales = available_locales
138
+ @default_locale = default_locale
139
+ end
140
+
141
+ def call(env)
142
+ locale = detect_locale(env) || @default_locale
143
+ env["rack.locale"] = locale
144
+ @app.call(env)
145
+ end
146
+
147
+ private
148
+
149
+ def detect_locale(env)
150
+ header = env["HTTP_ACCEPT_LANGUAGE"]
151
+ return unless header
152
+
153
+ AcceptLanguage.parse(header).match(*@available_locales)
154
+ end
155
+ end
156
+ ```
157
+
158
+ ### Ruby on Rails
91
159
 
92
160
  ```ruby
93
161
  # app/controllers/application_controller.rb
@@ -118,13 +186,15 @@ end
118
186
 
119
187
  ## Documentation
120
188
 
121
- - [API Documentation](https://rubydoc.info/github/cyril/accept_language.rb/main)
189
+ - [API documentation](https://rubydoc.info/github/cyril/accept_language.rb/main)
190
+ - [RFC 2616 Section 14.4](https://tools.ietf.org/html/rfc2616#section-14.4)
191
+ - [BCP 47](https://tools.ietf.org/html/bcp47)
122
192
  - [Language negotiation with Ruby](https://dev.to/cyri_/language-negotiation-with-ruby-5166)
123
193
  - [Rubyで言語ネゴシエーション](https://qiita.com/cyril/items/45dc233edb7be9d614e7)
124
194
 
125
195
  ## Versioning
126
196
 
127
- This library follows [Semantic Versioning 2.0.0](https://semver.org/).
197
+ This library follows [Semantic Versioning 2.0](https://semver.org/).
128
198
 
129
199
  ## License
130
200
 
@@ -1,61 +1,280 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module AcceptLanguage
4
- # Matches Accept-Language header values against application-supported languages to determine
5
- # the optimal language choice. Handles quality values, wildcards, and language tag matching
6
- # according to RFC 2616 specifications.
4
+ # = Language Preference Matcher
5
+ #
6
+ # Matcher implements the language matching algorithm defined in RFC 2616
7
+ # Section 14.4. It takes parsed language preferences (from {Parser}) and
8
+ # determines the optimal language choice from a set of available languages.
9
+ #
10
+ # == Overview
11
+ #
12
+ # The matching process balances multiple factors:
13
+ #
14
+ # 1. **Quality values**: Higher q-values indicate stronger user preference
15
+ # 2. **Declaration order**: Tie-breaker when q-values are equal
16
+ # 3. **Prefix matching**: Allows +en+ to match +en-US+, +en-GB+, etc.
17
+ # 4. **Wildcards**: The +*+ range matches any otherwise unmatched language
18
+ # 5. **Exclusions**: Languages with +q=0+ are explicitly unacceptable
19
+ #
20
+ # == RFC 2616 Section 14.4 Compliance
21
+ #
22
+ # This implementation follows the Accept-Language matching rules:
23
+ #
24
+ # > A language-range matches a language-tag if it exactly equals the tag,
25
+ # > or if it exactly equals a prefix of the tag such that the first tag
26
+ # > character following the prefix is "-".
27
+ #
28
+ # This means:
29
+ # - +en+ matches +en+, +en-US+, +en-GB+, +en-Latn-US+
30
+ # - +en-US+ matches only +en-US+ (not +en+ or +en-GB+)
31
+ # - +en+ does NOT match +eng+ (no hyphen boundary)
32
+ #
33
+ # == Quality Value Semantics
34
+ #
35
+ # Quality values have specific meanings per RFC 2616:
36
+ #
37
+ # - +q=1+ (or omitted): Most preferred
38
+ # - +0 < q < 1+: Acceptable with relative preference
39
+ # - +q=0+: Explicitly NOT acceptable
40
+ #
41
+ # The +q=0+ case is special: it doesn't just indicate low preference, it
42
+ # completely excludes the language from consideration. This is used with
43
+ # wildcards to express "any language except X":
44
+ #
45
+ # Accept-Language: *, en;q=0
46
+ #
47
+ # == Wildcard Behavior
48
+ #
49
+ # The wildcard +*+ matches any language not explicitly matched by another
50
+ # language-range. When processing a wildcard:
51
+ #
52
+ # 1. Collect all explicitly listed language tags (excluding the wildcard)
53
+ # 2. Find available languages that don't match any explicit tag
54
+ # 3. Return the first such language
55
+ #
56
+ # This ensures explicit preferences always take priority over the wildcard.
57
+ #
58
+ # == Internal Design
59
+ #
60
+ # The Matcher separates languages into two categories during initialization:
61
+ #
62
+ # - **preferred_langtags**: Languages with q > 0, sorted by descending quality
63
+ # - **excluded_langtags**: Languages with q = 0 (explicitly unacceptable)
64
+ #
65
+ # This separation optimizes the matching algorithm by allowing quick
66
+ # filtering of excluded languages before attempting matches.
67
+ #
68
+ # == Thread Safety
69
+ #
70
+ # Matcher instances are immutable after initialization. Both +preferred_langtags+
71
+ # and +excluded_langtags+ are frozen, making instances safe for concurrent use.
7
72
  #
8
73
  # @api private
9
- # @note This class is intended for internal use by {Parser} and should not be instantiated directly.
74
+ # @note This class is used internally by {Parser#match} and should not be
75
+ # instantiated directly. Use {AcceptLanguage.parse} followed by
76
+ # {Parser#match} instead.
77
+ #
78
+ # @example Internal usage (via Parser)
79
+ # # Don't do this:
80
+ # matcher = AcceptLanguage::Matcher.new("en" => 1000, "fr" => 800)
81
+ #
82
+ # # Do this instead:
83
+ # AcceptLanguage.parse("en, fr;q=0.8").match(:en, :fr)
84
+ #
85
+ # @see Parser#match
86
+ # @see https://tools.ietf.org/html/rfc2616#section-14.4 RFC 2616 Section 14.4
10
87
  class Matcher
88
+ # The hyphen character used as a subtag delimiter in BCP 47 language tags.
89
+ #
90
+ # Per RFC 2616 Section 14.4, prefix matching must respect hyphen boundaries.
91
+ # A language-range matches a language-tag only if the character immediately
92
+ # following the prefix is a hyphen.
93
+ #
11
94
  # @api private
95
+ # @return [String] "-"
96
+ HYPHEN = "-"
97
+
98
+ # Error message raised when an available language tag is not a Symbol.
99
+ #
100
+ # This guards against accidental non-Symbol values in the available languages
101
+ # array, which would cause unexpected behavior during matching.
102
+ #
103
+ # @api private
104
+ # @return [String]
105
+ LANGTAG_TYPE_ERROR = "Language tag must be a Symbol"
106
+
107
+ # The wildcard character that matches any language not explicitly listed.
108
+ #
109
+ # Per RFC 2616 Section 14.4, the wildcard has special semantics:
110
+ # - It matches any language not matched by other ranges
111
+ # - +*;q=0+ makes all unlisted languages unacceptable
112
+ # - It has lower effective priority than explicit language tags
113
+ #
114
+ # @api private
115
+ # @return [String] "*"
12
116
  WILDCARD = "*"
13
117
 
118
+ # Language tags explicitly marked as unacceptable (+q=0+).
119
+ #
120
+ # These tags are filtered out from available languages before any
121
+ # matching occurs. Exclusions apply via prefix matching, so excluding
122
+ # +en+ also excludes +en-US+, +en-GB+, etc.
123
+ #
124
+ # @note The wildcard +*+ is never added to this set, even when +*;q=0+
125
+ # is specified. Wildcard exclusion is handled implicitly: when +*;q=0+
126
+ # and no other languages have +q > 0+, the preferred_langtags list is
127
+ # empty, resulting in no matches.
128
+ #
129
+ # @api private
130
+ # @return [Set<String>] downcased language tags with q=0
131
+ #
132
+ # @example
133
+ # # For "*, en;q=0, de;q=0"
134
+ # matcher.excluded_langtags
135
+ # # => #<Set: {"en", "de"}>
136
+ attr_reader :excluded_langtags
137
+
138
+ # Language tags sorted by preference (descending quality value).
139
+ #
140
+ # This array contains only tags with +q > 0+, ordered from most preferred
141
+ # to least preferred. When quality values are equal, the original
142
+ # declaration order from the Accept-Language header is preserved.
143
+ #
144
+ # The stable sort guarantee ensures deterministic matching: given the
145
+ # same header and available languages, the result is always the same.
146
+ #
14
147
  # @api private
15
- attr_reader :excluded_langtags, :preferred_langtags
148
+ # @return [Array<String>] downcased language tags, highest quality first
149
+ #
150
+ # @example
151
+ # # For "fr;q=0.8, en, de;q=0.9"
152
+ # # Sorted: en (q=1), de (q=0.9), fr (q=0.8)
153
+ # matcher.preferred_langtags
154
+ # # => ["en", "de", "fr"]
155
+ attr_reader :preferred_langtags
16
156
 
157
+ # Creates a new Matcher instance from parsed language preferences.
158
+ #
159
+ # The initialization process:
160
+ #
161
+ # 1. Separates excluded tags (+q=0+) from preferred tags (+q > 0+)
162
+ # 2. Sorts preferred tags by descending quality value
163
+ # 3. Preserves original order for tags with equal quality (stable sort)
164
+ #
165
+ # == Exclusion Rules
166
+ #
167
+ # Only specific language tags with +q=0+ are added to the exclusion set.
168
+ # The wildcard +*+ is explicitly NOT added even when +*;q=0+ is present,
169
+ # because:
170
+ #
171
+ # - Adding +*+ to exclusions would break prefix matching logic
172
+ # - +*;q=0+ semantics are: "no unlisted language is acceptable"
173
+ # - This is achieved by having an empty preferred_langtags (no wildcards)
174
+ #
175
+ # == Stable Sorting
176
+ #
177
+ # Ruby's +sort_by+ is stable since Ruby 2.0, meaning elements with equal
178
+ # sort keys maintain their relative order. This ensures that when multiple
179
+ # languages have the same quality value, the first one declared in the
180
+ # Accept-Language header wins.
181
+ #
17
182
  # @api private
183
+ # @param languages_range [Hash{String => Integer}] language tags mapped to
184
+ # quality values (0-1000), as produced by {Parser}
185
+ #
186
+ # @example
187
+ # Matcher.new("en" => 1000, "fr" => 800, "de" => 0)
188
+ # # preferred_langtags: ["en", "fr"]
189
+ # # excluded_langtags: #<Set: {"de"}>
18
190
  def initialize(**languages_range)
19
191
  @excluded_langtags = ::Set[]
20
- langtags = []
21
-
22
- languages_range.select do |langtag, quality|
23
- if quality.zero?
24
- # Exclude specific language tags, but NOT the wildcard.
25
- # When "*;q=0" is specified, all non-listed languages become
26
- # unacceptable implicitly (they won't match any preferred_langtags).
27
- # Adding "*" to excluded_langtags would break prefix_match? logic.
28
- @excluded_langtags << langtag unless wildcard?(langtag)
29
- else
30
- level = (quality * 1_000).to_i
31
- langtags[level] = langtag
32
- end
192
+
193
+ languages_range.each do |langtag, quality|
194
+ next unless quality.zero? && !wildcard?(langtag)
195
+
196
+ # Exclude specific language tags, but NOT the wildcard.
197
+ # When "*;q=0" is specified, all non-listed languages become
198
+ # unacceptable implicitly (they won't match any preferred_langtags).
199
+ # Adding "*" to excluded_langtags would break prefix_match? logic.
200
+ @excluded_langtags << langtag
33
201
  end
34
202
 
35
- @preferred_langtags = langtags.compact.reverse
203
+ # Sort by descending quality. Ruby's sort_by is stable, so languages
204
+ # with identical quality values preserve their original order from
205
+ # the Accept-Language header (first declared = higher priority).
206
+ @preferred_langtags = languages_range
207
+ .reject { |_, quality| quality.zero? }
208
+ .sort_by { |_, quality| -quality }
209
+ .map(&:first)
36
210
  end
37
211
 
212
+ # Finds the best matching language from the available options.
213
+ #
214
+ # == Algorithm
215
+ #
216
+ # 1. **Filter**: Remove available languages that match any excluded tag
217
+ # 2. **Match**: For each preferred tag (in quality order):
218
+ # - If it's a wildcard, return the first available language not
219
+ # matching any other preferred tag
220
+ # - Otherwise, return the first available language that matches
221
+ # via exact match or prefix match
222
+ # 3. **Result**: Return the first match found, or +nil+ if none
223
+ #
224
+ # == Return Value
225
+ #
226
+ # The returned value preserves the exact form (case) of the matched
227
+ # element from +available_langtags+. This is important for direct use
228
+ # with APIs like +I18n.locale=+ that may be case-sensitive.
229
+ #
38
230
  # @api private
231
+ # @param available_langtags [Array<Symbol>] languages to match against
232
+ # @return [Symbol, nil] the best matching language, or +nil+
233
+ # @raise [TypeError] if any available language tag is not a Symbol
234
+ #
235
+ # @example Basic matching
236
+ # matcher = Matcher.new("en" => 1000, "fr" => 800)
237
+ # matcher.call(:en, :fr, :de)
238
+ # # => :en
239
+ #
240
+ # @example Prefix matching
241
+ # matcher = Matcher.new("en" => 1000)
242
+ # matcher.call(:"en-US", :"en-GB")
243
+ # # => :"en-US"
244
+ #
245
+ # @example With exclusion
246
+ # matcher = Matcher.new("*" => 500, "en" => 0)
247
+ # matcher.call(:en, :fr)
248
+ # # => :fr
39
249
  def call(*available_langtags)
40
- raise ::ArgumentError, "Language tags cannot be nil" if available_langtags.any?(&:nil?)
41
-
42
250
  filtered_tags = drop_unacceptable(*available_langtags)
43
- return nil if filtered_tags.empty?
251
+ return if filtered_tags.empty?
44
252
 
45
253
  find_best_match(filtered_tags)
46
254
  end
47
255
 
48
256
  private
49
257
 
258
+ # Iterates through preferred languages to find the first match.
259
+ #
260
+ # @param available_langtags [Set<String>] pre-filtered available tags
261
+ # @return [Symbol, nil] the matched tag or nil
50
262
  def find_best_match(available_langtags)
51
263
  preferred_langtags.each do |preferred_tag|
52
264
  match = match_langtag(preferred_tag, available_langtags)
53
- return match if match
265
+ return :"#{match}" unless match.nil?
54
266
  end
55
267
 
56
268
  nil
57
269
  end
58
270
 
271
+ # Attempts to match a single preferred tag against available languages.
272
+ #
273
+ # Handles both wildcard and specific language tags differently.
274
+ #
275
+ # @param preferred_tag [String] the preferred language tag to match
276
+ # @param available_langtags [Set<String>] available tags to search
277
+ # @return [String, nil] the matched tag or nil
59
278
  def match_langtag(preferred_tag, available_langtags)
60
279
  if wildcard?(preferred_tag)
61
280
  any_other_langtag(*available_langtags)
@@ -64,44 +283,105 @@ module AcceptLanguage
64
283
  end
65
284
  end
66
285
 
286
+ # Finds an available language that matches via exact or prefix match.
287
+ #
288
+ # @param preferred_tag [String] the preferred tag (downcased)
289
+ # @param available_langtags [Set<String>] available tags
290
+ # @return [String, nil] the first matching tag or nil
67
291
  def find_matching_tag(preferred_tag, available_langtags)
68
- available_langtags.find { |tag| prefix_match?(preferred_tag, String(tag.downcase)) }
292
+ available_langtags.find { |tag| prefix_match?(preferred_tag, tag) }
69
293
  end
70
294
 
295
+ # Finds an available language for wildcard matching.
296
+ #
297
+ # Returns the first available language that doesn't match any explicitly
298
+ # listed preferred language tag. This implements the RFC 2616 semantics
299
+ # where +*+ matches "any language not matched by another range".
300
+ #
301
+ # @param available_langtags [Array<String>] available tags
302
+ # @return [String, nil] the first non-matching tag or nil
71
303
  def any_other_langtag(*available_langtags)
72
304
  langtags = preferred_langtags - [WILDCARD]
73
305
 
74
306
  available_langtags.find do |available_langtag|
75
- available_downcased = available_langtag.downcase
76
- langtags.none? { |tag| prefix_match?(tag, String(available_downcased)) }
307
+ langtags.none? { |tag| prefix_match?(tag, available_langtag) }
77
308
  end
78
309
  end
79
310
 
311
+ # Removes explicitly excluded languages from the available set.
312
+ #
313
+ # Uses prefix matching for exclusions, so excluding +en+ also excludes
314
+ # +en-US+, +en-GB+, etc.
315
+ #
316
+ # @param available_langtags [Array<Symbol>] all available tags
317
+ # @return [Set<String>] tags not matching any exclusion
318
+ # @raise [TypeError] if any tag is not a Symbol
80
319
  def drop_unacceptable(*available_langtags)
81
320
  available_langtags.each_with_object(::Set[]) do |available_langtag, langtags|
321
+ raise ::TypeError, LANGTAG_TYPE_ERROR unless available_langtag.is_a?(::Symbol)
322
+
323
+ available_langtag = "#{available_langtag}"
82
324
  langtags << available_langtag unless unacceptable?(available_langtag)
83
325
  end
84
326
  end
85
327
 
328
+ # Checks if a language tag is explicitly excluded.
329
+ #
330
+ # @param langtag [String] the tag to check (as string)
331
+ # @return [Boolean] true if the tag matches any exclusion
86
332
  def unacceptable?(langtag)
87
- langtag_downcased = langtag.downcase
88
- excluded_langtags.any? { |excluded_tag| prefix_match?(excluded_tag, String(langtag_downcased)) }
333
+ excluded_langtags.any? { |excluded_tag| prefix_match?(excluded_tag, langtag) }
89
334
  end
90
335
 
336
+ # Checks if a value is the wildcard character.
337
+ #
338
+ # @param value [String] the value to check
339
+ # @return [Boolean] true if the value is "*"
91
340
  def wildcard?(value)
92
341
  value.eql?(WILDCARD)
93
342
  end
94
343
 
95
- # Implements RFC 2616 Section 14.4 prefix matching rule:
96
- # "A language-range matches a language-tag if it exactly equals the tag,
97
- # or if it exactly equals a prefix of the tag such that the first tag
98
- # character following the prefix is '-'."
344
+ # Implements RFC 2616 Section 14.4 prefix matching rule.
345
+ #
346
+ # From the specification:
347
+ #
348
+ # > A language-range matches a language-tag if it exactly equals the tag,
349
+ # > or if it exactly equals a prefix of the tag such that the first tag
350
+ # > character following the prefix is "-".
351
+ #
352
+ # This rule ensures that language ranges match at subtag boundaries:
353
+ #
354
+ # - +en+ matches +en+ (exact)
355
+ # - +en+ matches +en-US+ (prefix + hyphen)
356
+ # - +en+ does NOT match +eng+ (no hyphen after prefix)
357
+ # - +en-US+ does NOT match +en+ (prefix is longer than tag)
99
358
  #
100
- # @param prefix [String] The language-range to match (downcased)
101
- # @param tag [String] The language-tag to test (downcased)
359
+ # Matching is case-insensitive per RFC 2616, using +casecmp?+ for
360
+ # efficient comparison without allocating new strings.
361
+ #
362
+ # @param prefix [String] the language-range to match (downcased)
363
+ # @param tag [String] the language-tag to test (any case)
102
364
  # @return [Boolean] true if prefix matches tag per RFC 2616 rules
365
+ #
366
+ # @example Exact matches
367
+ # prefix_match?("en", "en") # => true
368
+ # prefix_match?("en", "EN") # => true
369
+ # prefix_match?("en-us", "en-US") # => true
370
+ #
371
+ # @example Prefix matches
372
+ # prefix_match?("en", "en-us") # => true
373
+ # prefix_match?("en", "en-GB") # => true
374
+ # prefix_match?("zh", "zh-Hant-TW") # => true
375
+ #
376
+ # @example Non-matches
377
+ # prefix_match?("en-us", "en") # => false (prefix longer than tag)
378
+ # prefix_match?("en", "eng") # => false (no hyphen boundary)
379
+ # prefix_match?("en", "fr") # => false (different language)
103
380
  def prefix_match?(prefix, tag)
104
- tag == prefix || tag.start_with?("#{prefix}-")
381
+ return true if tag.casecmp?(prefix)
382
+ return false if tag.length <= prefix.length
383
+
384
+ tag[0, prefix.length].casecmp?(prefix) && tag[prefix.length] == HYPHEN
105
385
  end
106
386
  end
107
387
  end