unicode-scripts 1.9.0 → 1.11.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 3af7d40f02bdf4a276addf67e1f3dc0bdb76b4ba7658b68fb7de9a433ddba2a3
4
- data.tar.gz: 19af4744dc3fee1e6f61e07134fdcca25dd6883bcc71b60707f6e3e22a03e2d1
3
+ metadata.gz: feaabd20c3a3869a96e62e34d7c39b83739365549904ecdb129e83d9f73540d4
4
+ data.tar.gz: 40af16102c2aa63b35051f09b65cd8e2d14c32fbce21a7c802ea260121ade5b5
5
5
  SHA512:
6
- metadata.gz: 27ce63fcb077fff302de67202d8418327c097c9bd1bfd9b2cfe34bf452148fefda1d3caf07a6ae3d3d106c4603220769d277aaa9a6774a5a979873ef7e929e53
7
- data.tar.gz: ca20a49dc38d5cb9ca76747bcbd89f2b0d6256b39da6aa994fd9e69862ca159b8738f5a9c984e67ef23c8cdb6739d097b73c69adb6cd1d61bd499b1b5c47407f
6
+ metadata.gz: 8d5f215ed6b03d5192eef673d22f0705cac149e3701427570ab52b4e3c538ac1537b7ca5a0768a66e6d5ffdd4d66b9363b4fdafbdf74112df1e7e59ab639cf2c
7
+ data.tar.gz: 735b9611f0bfee72dd074a8873c3c269d23a150b6d7a9da64ca33b7d02c5a65316a45eaa47bb60fc44a554b70d3efbd3bcd8cf0c94185e01d7fdd5f767766839
data/CHANGELOG.md CHANGED
@@ -1,5 +1,13 @@
1
1
  ## CHANGELOG
2
2
 
3
+ ### 1.11.0
4
+
5
+ - Add augmented scripts and mixed-script detection (as described in UTS39)
6
+
7
+ ### 1.10.0
8
+
9
+ - Unicode 16.0
10
+
3
11
  ### 1.9.0
4
12
 
5
13
  - Unicode 15.1
data/Gemfile.lock CHANGED
@@ -1,7 +1,7 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- unicode-scripts (1.9.0)
4
+ unicode-scripts (1.11.0)
5
5
 
6
6
  GEM
7
7
  remote: https://rubygems.org/
data/MIT-LICENSE.txt CHANGED
@@ -1,4 +1,4 @@
1
- Copyright (c) 2016-2023 Jan Lelis, https://janlelis.com
1
+ Copyright (c) 2016-2024 Jan Lelis, https://janlelis.com
2
2
 
3
3
  Permission is hereby granted, free of charge, to any person obtaining
4
4
  a copy of this software and associated documentation files (the
data/README.md CHANGED
@@ -1,12 +1,12 @@
1
1
  # Unicode::Scripts [![[version]](https://badge.fury.io/rb/unicode-scripts.svg)](https://badge.fury.io/rb/unicode-scripts) [![[ci]](https://github.com/janlelis/unicode-scripts/workflows/Test/badge.svg)](https://github.com/janlelis/unicode-scripts/actions?query=workflow%3ATest)
2
2
 
3
- Retrieve the [Unicode script(s)](https://en.wikipedia.org/wiki/Script_%28Unicode%29) a string belongs to. Can also return the *Script_Extension* property which is defined as characters which are "commonly used with more than one script, but with a limited number of scripts".
3
+ Retrieve all [Unicode script(s)](https://en.wikipedia.org/wiki/Script_%28Unicode%29) a string belongs to. Can also return the *Script_Extension* property (scx) which is defined as characters which are "commonly used with more than one script, but with a limited number of scripts".
4
4
 
5
- Unicode version: **15.1.0** (September 2023)
5
+ Based on the *Script_Extension*, this library can also return the [augmented script set](https://www.unicode.org/reports/tr39/#def-augmented-script-set) to figure out if a string is **mixed-script** or **single-script**. Mixed scripts can be an indicator of suspicious user inputs.
6
6
 
7
- Supported Rubies: **3.2**, **3.1**, **3.0**
7
+ Unicode version: **16.0.0** (September 2024)
8
8
 
9
- Old Rubies that might still work: **2.X**
9
+ Supported Rubies: **3.x** (might work: **2.x**)
10
10
 
11
11
  ## Gemfile
12
12
 
@@ -14,7 +14,7 @@ Old Rubies that might still work: **2.X**
14
14
  gem "unicode-scripts"
15
15
  ```
16
16
 
17
- ## Usage
17
+ ## Usage - Scripts and Script Extensions
18
18
 
19
19
  ```ruby
20
20
  require "unicode/scripts"
@@ -29,381 +29,104 @@ Unicode::Scripts.script("ᴦ") # => "Greek"
29
29
 
30
30
  # Script_Extension property
31
31
  Unicode::Scripts.script_extensions("॥")
32
- # => ["Bengali", "Devanagari", "Dogra", "Grantha", "Gujarati","Gunjala_Gondi", "Gurmukhi", "Kannada",
33
- "Khudawadi", "Limbu", "Mahajani", "Malayalam", "Masaram_Gondi", "Nandinagari", "Oriya", "Sinhala",
34
- "Syloti_Nagri", "Takri", "Tamil", "Telugu", "Tirhuta"]
32
+ # => ["Bengali", "Devanagari", "Dogra", "Grantha", "Gujarati", "Gunjala_Gondi", "Gurmukhi","Gurung_Khema",
33
+ "Kannada","Khudawadi", "Limbu", "Mahajani", "Malayalam", "Masaram_Gondi", "Nandinagari", "Ol_Onal",
34
+ "Oriya", "Sinhala", "Syloti_Nagri", "Takri", "Tamil", "Telugu", "Tirhuta"]
35
35
  ```
36
36
 
37
- ## Hints
38
- ### Regex Matching
37
+ ## Usage - Augmented Scripts
39
38
 
40
- If you have a string and want to match a substring/character from a specific Unicode script, you actually won't need this gem. Instead, you can use the [Regexp Unicode Property Syntax `\p{}`](https://ruby-doc.org/core/Regexp.html#class-Regexp-label-Character+Properties):
39
+ Like script extensions, but adds meta scripts for Asian languages and treats _Common_/_Inherited_ values as ALL scripts.
41
40
 
42
41
  ```ruby
43
- "Coptic letter: ⲁ".scan(/\p{Coptic}/) # => ["ⲁ"]
42
+ require "unicode/scripts"
43
+
44
+ Unicode::Scripts.augmented_scripts("ねガ") # => ['Hira', 'Kana', 'Jpan']
45
+ Unicode::Scripts.augmented_scripts("1") # => ["Adlm", "Aghb", "Ahom", … ]
44
46
  ```
45
47
 
46
- See [Idiosyncratic Ruby: Proper Unicoding](https://idiosyncratic-ruby.com/41-proper-unicoding.html) for more info.
48
+ ## Usage - Resolved Script
49
+
50
+ Intersection of all augmented scripts per character.
51
+
52
+ ```ruby
53
+ require "unicode/scripts"
54
+
55
+ Unicode::Scripts.resolved_scripts("СігсӀе") # => [ 'Cyrl' ]
56
+ Unicode::Scripts.resolved_scripts("Сirсlе") # => []
57
+ Unicode::Scripts.resolved_scripts("𝖢𝗂𝗋𝖼𝗅𝖾") # => ['Adlm', 'Aghb', 'Ahom', … ]
58
+ Unicode::Scripts.resolved_scripts("1") # => ['Adlm','Aghb', 'Ahom', … ]
59
+ Unicode::Scripts.resolved_scripts("ねガ") # => ['Hira', 'Kana', 'Jpan']
60
+ ```
61
+
62
+ Please note that the **resolved script** can contain multiple scripts, as per standard.
63
+
64
+ ## Usage - Mixed-Script Detection
47
65
 
48
- ### Script Names
66
+ Mixed-script if resolved script set is empty, single-script otherwise.
67
+
68
+ ```ruby
69
+ require "unicode/scripts"
70
+
71
+ Unicode::Scripts.mixed?("СігсӀе"); # => false
72
+ Unicode::Scripts.mixed?("Сirсlе"); # => true
73
+ Unicode::Scripts.mixed?("𝖢𝗂𝗋𝖼𝗅𝖾"); # => false
74
+ Unicode::Scripts.mixed?("1"); # => false
75
+ Unicode::Scripts.mixed?("ねガ"); # => false
76
+
77
+ Unicode::Scripts.single?("СігсӀе"); # => true
78
+ Unicode::Scripts.single?("Сirсlе"); # => false
79
+ Unicode::Scripts.single?("𝖢𝗂𝗋𝖼𝗅𝖾"); # => true
80
+ Unicode::Scripts.single?("1"); # => true
81
+ Unicode::Scripts.single?("ねガ"); # => true
82
+ ```
83
+
84
+ Please note that a **single-script** string might actually contain multiple scripts, as per standard (e.g. for Asian languages)
85
+
86
+ ### List of All Scripts
49
87
 
50
88
  You can extract all script names from the gem like this:
51
89
 
52
90
  ```ruby
53
91
  require "unicode/scripts"
54
- puts Unicode::Scripts.names
55
-
56
- # # # Output # # #
57
-
58
- Adlam
59
- Ahom
60
- Anatolian_Hieroglyphs
61
- Arabic
62
- Armenian
63
- Avestan
64
- Balinese
65
- Bamum
66
- Bassa_Vah
67
- Batak
68
- Bengali
69
- Bhaiksuki
70
- Bopomofo
71
- Brahmi
72
- Braille
73
- Buginese
74
- Buhid
75
- Canadian_Aboriginal
76
- Carian
77
- Caucasian_Albanian
78
- Chakma
79
- Cham
80
- Cherokee
81
- Chorasmian
82
- Common
83
- Coptic
84
- Cuneiform
85
- Cypriot
86
- Cypro_Minoan
87
- Cyrillic
88
- Deseret
89
- Devanagari
90
- Dives_Akuru
91
- Dogra
92
- Duployan
93
- Egyptian_Hieroglyphs
94
- Elbasan
95
- Elymaic
96
- Ethiopic
97
- Georgian
98
- Glagolitic
99
- Gothic
100
- Grantha
101
- Greek
102
- Gujarati
103
- Gunjala_Gondi
104
- Gurmukhi
105
- Han
106
- Hangul
107
- Hanifi_Rohingya
108
- Hanunoo
109
- Hatran
110
- Hebrew
111
- Hiragana
112
- Imperial_Aramaic
113
- Inherited
114
- Inscriptional_Pahlavi
115
- Inscriptional_Parthian
116
- Javanese
117
- Kaithi
118
- Kannada
119
- Katakana
120
- Katakana_Or_Hiragana
121
- Kawi
122
- Kayah_Li
123
- Kharoshthi
124
- Khitan_Small_Script
125
- Khmer
126
- Khojki
127
- Khudawadi
128
- Lao
129
- Latin
130
- Lepcha
131
- Limbu
132
- Linear_A
133
- Linear_B
134
- Lisu
135
- Lycian
136
- Lydian
137
- Mahajani
138
- Makasar
139
- Malayalam
140
- Mandaic
141
- Manichaean
142
- Marchen
143
- Masaram_Gondi
144
- Medefaidrin
145
- Meetei_Mayek
146
- Mende_Kikakui
147
- Meroitic_Cursive
148
- Meroitic_Hieroglyphs
149
- Miao
150
- Modi
151
- Mongolian
152
- Mro
153
- Multani
154
- Myanmar
155
- Nabataean
156
- Nag_Mundari
157
- Nandinagari
158
- New_Tai_Lue
159
- Newa
160
- Nko
161
- Nushu
162
- Nyiakeng_Puachue_Hmong
163
- Ogham
164
- Ol_Chiki
165
- Old_Hungarian
166
- Old_Italic
167
- Old_North_Arabian
168
- Old_Permic
169
- Old_Persian
170
- Old_Sogdian
171
- Old_South_Arabian
172
- Old_Turkic
173
- Old_Uyghur
174
- Oriya
175
- Osage
176
- Osmanya
177
- Pahawh_Hmong
178
- Palmyrene
179
- Pau_Cin_Hau
180
- Phags_Pa
181
- Phoenician
182
- Psalter_Pahlavi
183
- Rejang
184
- Runic
185
- Samaritan
186
- Saurashtra
187
- Sharada
188
- Shavian
189
- Siddham
190
- SignWriting
191
- Sinhala
192
- Sogdian
193
- Sora_Sompeng
194
- Soyombo
195
- Sundanese
196
- Syloti_Nagri
197
- Syriac
198
- Tagalog
199
- Tagbanwa
200
- Tai_Le
201
- Tai_Tham
202
- Tai_Viet
203
- Takri
204
- Tamil
205
- Tangsa
206
- Tangut
207
- Telugu
208
- Thaana
209
- Thai
210
- Tibetan
211
- Tifinagh
212
- Tirhuta
213
- Toto
214
- Ugaritic
215
- Unknown
216
- Vai
217
- Vithkuqi
218
- Wancho
219
- Warang_Citi
220
- Yezidi
221
- Yi
222
- Zanabazar_Square
92
+ puts Unicode::Scripts.names # list of scripts
223
93
  ```
224
94
 
225
- ### Short Script Names
95
+ To get all 4 letter script codes (ISO 15924):
96
+
97
+ ```ruby
98
+ require "unicode/scripts"
99
+ puts Unicode::Scripts.names(format: :short) # list of scripts
100
+ ```
226
101
 
227
- You can extract all 4 letter script names from the gem like this:
102
+ Augmented scripts:
228
103
 
229
104
  ```ruby
230
105
  require "unicode/scripts"
231
- puts Unicode::Scripts.names(format: :short)
232
-
233
- # # # Output # # #
234
-
235
- Adlm
236
- Aghb
237
- Ahom
238
- Arab
239
- Armi
240
- Armn
241
- Avst
242
- Bali
243
- Bamu
244
- Bass
245
- Batk
246
- Beng
247
- Bhks
248
- Bopo
249
- Brah
250
- Brai
251
- Bugi
252
- Buhd
253
- Cakm
254
- Cans
255
- Cari
256
- Cham
257
- Cher
258
- Chrs
259
- Copt
260
- Cpmn
261
- Cprt
262
- Cyrl
263
- Deva
264
- Diak
265
- Dogr
266
- Dsrt
267
- Dupl
268
- Egyp
269
- Elba
270
- Elym
271
- Ethi
272
- Geor
273
- Glag
274
- Gong
275
- Gonm
276
- Goth
277
- Gran
278
- Grek
279
- Gujr
280
- Guru
281
- Hang
282
- Hani
283
- Hano
284
- Hatr
285
- Hebr
286
- Hira
287
- Hluw
288
- Hmng
289
- Hmnp
290
- Hrkt
291
- Hung
292
- Ital
293
- Java
294
- Kali
295
- Kana
296
- Kawi
297
- Khar
298
- Khmr
299
- Khoj
300
- Kits
301
- Knda
302
- Kthi
303
- Lana
304
- Laoo
305
- Latn
306
- Lepc
307
- Limb
308
- Lina
309
- Linb
310
- Lisu
311
- Lyci
312
- Lydi
313
- Mahj
314
- Maka
315
- Mand
316
- Mani
317
- Marc
318
- Medf
319
- Mend
320
- Merc
321
- Mero
322
- Mlym
323
- Modi
324
- Mong
325
- Mroo
326
- Mtei
327
- Mult
328
- Mymr
329
- Nagm
330
- Nand
331
- Narb
332
- Nbat
333
- Newa
334
- Nkoo
335
- Nshu
336
- Ogam
337
- Olck
338
- Orkh
339
- Orya
340
- Osge
341
- Osma
342
- Ougr
343
- Palm
344
- Pauc
345
- Perm
346
- Phag
347
- Phli
348
- Phlp
349
- Phnx
350
- Plrd
351
- Prti
352
- Qaac
353
- Qaai
354
- Rjng
355
- Rohg
356
- Runr
357
- Samr
358
- Sarb
359
- Saur
360
- Sgnw
361
- Shaw
362
- Shrd
363
- Sidd
364
- Sind
365
- Sinh
366
- Sogd
367
- Sogo
368
- Sora
369
- Soyo
370
- Sund
371
- Sylo
372
- Syrc
373
- Tagb
374
- Takr
375
- Tale
376
- Talu
377
- Taml
378
- Tang
379
- Tavt
380
- Telu
381
- Tfng
382
- Tglg
383
- Thaa
384
- Thai
385
- Tibt
386
- Tirh
387
- Tnsa
388
- Toto
389
- Ugar
390
- Vaii
391
- Vith
392
- Wara
393
- Wcho
394
- Xpeo
395
- Xsux
396
- Yezi
397
- Yiii
398
- Zanb
399
- Zinh
400
- Zyyy
401
- Zzzz
106
+ puts Unicode::Scripts.names(format: :short, augmented: :only)
402
107
  ```
403
108
 
404
- See [unicode-x](https://github.com/janlelis/unicode-x) for more Unicode related micro libraries.
109
+ You can find a list of all scripts in Unicode, with links to Wikipedia on [character.construction/scripts](https://character.construction/scripts)
110
+
111
+ ## Hints
112
+ ### Regex Matching
113
+
114
+ If you have a string and want to match a substring/character from a specific Unicode script, you actually won't need this gem. Instead, you can use the [Regexp Unicode Property Syntax `\p{}`](https://ruby-doc.org/core/Regexp.html#class-Regexp-label-Character+Properties):
115
+
116
+ ```ruby
117
+ "Coptic letter: ⲁ".scan(/\p{Coptic}/) # => ["ⲁ"]
118
+ ```
119
+
120
+ See [Idiosyncratic Ruby: Proper Unicoding](https://idiosyncratic-ruby.com/41-proper-unicoding.html) for more info.
121
+
122
+ ## Also See
123
+
124
+ - JavaScript implementation (same data & algorithms): [unicode-script.js](https://github.com/janlelis/unicode-script.js)
125
+ - Index created with: [unicoder](https://github.com/janlelis/unicoder)
126
+ - Get the Unicode blocks of a string: [unicode-blocks gem](https://github.com/janlelis/unicode-blocks)
127
+ - See [unicode-x](https://github.com/janlelis/unicode-x) for more Unicode related micro libraries for Ruby.
405
128
 
406
129
  ## MIT License
407
130
 
408
- - Copyright (C) 2016-2023 Jan Lelis <https://janlelis.com>. Released under the MIT license.
131
+ - Copyright (C) 2016-2024 Jan Lelis <https://janlelis.com>. Released under the MIT license.
409
132
  - Unicode data: https://www.unicode.org/copyright.html#Exhibit1
Binary file
@@ -2,9 +2,11 @@
2
2
 
3
3
  module Unicode
4
4
  module Scripts
5
- VERSION = "1.9.0"
6
- UNICODE_VERSION = "15.1.0"
5
+ VERSION = "1.11.0"
6
+ UNICODE_VERSION = "16.0.0"
7
7
  DATA_DIRECTORY = File.expand_path(File.dirname(__FILE__) + "/../../../data/").freeze
8
8
  INDEX_FILENAME = (DATA_DIRECTORY + "/scripts.marshal.gz").freeze
9
+
10
+ AUGMENTED_SCRIPT_CODES = ["Hanb", "Jpan", "Kore"]
9
11
  end
10
12
  end
@@ -46,11 +46,77 @@ module Unicode
46
46
  }.sort
47
47
  end
48
48
 
49
- def self.names(format: :long)
49
+ def self.augmented_scripts(string)
50
50
  require_relative 'scripts/index' unless defined? ::Unicode::Scripts::INDEX
51
- format == :long ?
52
- INDEX[:SCRIPT_NAMES].sort :
53
- INDEX[:SCRIPT_ALIASES].keys.sort
51
+
52
+ augmented = string.each_codepoint.inject([]){ |res, codepoint|
53
+ if new_scripts = INDEX[:SCRIPT_EXTENSIONS][codepoint]
54
+ script_extension_names = new_scripts.map{ |new_script|
55
+ INDEX[:SCRIPT_ALIASES].key(new_script)
56
+ }
57
+ else
58
+ script_extension_names = scripts([codepoint].pack("U"), format: :short)
59
+ end
60
+
61
+ res | script_extension_names
62
+ }
63
+
64
+ if augmented.include? "Hani"
65
+ augmented |= ["Hanb", "Jpan", "Kore"]
66
+ end
67
+ if augmented.include?("Hira") || augmented.include?("Kana")
68
+ augmented |= ["Jpan"]
69
+ end
70
+ if augmented.include? "Hang"
71
+ augmented |= ["Kore"]
72
+ end
73
+ if augmented.include? "Bopo"
74
+ augmented |= ["Hanb"]
75
+ end
76
+ if augmented.include?("Zyyy") || augmented.include?("Zinh")
77
+ augmented |= names(format: :short, augmented: :include )
78
+ end
79
+
80
+ augmented.sort
81
+ end
82
+
83
+ def self.resolved_scripts(string)
84
+ string.chars.reduce(
85
+ Unicode::Scripts.names(format: :short, augmented: :include)
86
+ ){ |acc, char|
87
+ acc & augmented_scripts(char)
88
+ }
89
+ end
90
+
91
+ def self.mixed?(string)
92
+ resolved_scripts(string).empty?
93
+ end
94
+
95
+ def self.single?(string)
96
+ !resolved_scripts(string).empty?
97
+ end
98
+
99
+ # Lists scripts. Options:
100
+ # - format - :long, :short
101
+ # - augmented - :include, :exclude, :only
102
+ def self.names(format: :long, augmented: :exclude)
103
+ if format == :long && augmented != :exclude
104
+ raise ArgumentError, "only short four-letter script codes (ISO 15924) supported when listing augmented scripts"
105
+ end
106
+
107
+ if augmented == :only
108
+ return AUGMENTED_SCRIPT_CODES
109
+ end
110
+
111
+ require_relative 'scripts/index' unless defined? ::Unicode::Scripts::INDEX
112
+
113
+ if format == :long
114
+ INDEX[:SCRIPT_NAMES].sort
115
+ elsif augmented == :exclude
116
+ INDEX[:SCRIPT_ALIASES].keys.sort
117
+ else
118
+ (INDEX[:SCRIPT_ALIASES].keys + AUGMENTED_SCRIPT_CODES).sort
119
+ end
54
120
  end
55
121
  end
56
122
  end
@@ -63,6 +63,7 @@ describe Unicode::Scripts do
63
63
  "Gujarati",
64
64
  "Gunjala_Gondi",
65
65
  "Gurmukhi",
66
+ "Gurung_Khema",
66
67
  "Kannada",
67
68
  "Khudawadi",
68
69
  "Limbu",
@@ -70,6 +71,7 @@ describe Unicode::Scripts do
70
71
  "Malayalam",
71
72
  "Masaram_Gondi",
72
73
  "Nandinagari",
74
+ "Ol_Onal",
73
75
  "Oriya",
74
76
  "Sinhala",
75
77
  "Syloti_Nagri",
@@ -89,12 +91,14 @@ describe Unicode::Scripts do
89
91
  "Gonm",
90
92
  "Gran",
91
93
  "Gujr",
94
+ "Gukh",
92
95
  "Guru",
93
96
  "Knda",
94
97
  "Limb",
95
98
  "Mahj",
96
99
  "Mlym",
97
100
  "Nand",
101
+ "Onao",
98
102
  "Orya",
99
103
  "Sind",
100
104
  "Sinh",
@@ -126,11 +130,61 @@ describe Unicode::Scripts do
126
130
  end
127
131
  end
128
132
 
133
+ describe ".augmented_scripts" do
134
+ it "will always return an Array" do
135
+ assert_equal [], Unicode::Scripts.augmented_scripts("")
136
+ end
137
+
138
+ it "will return all extended scripts that characters in the string belong to + augmented" do
139
+ assert_equal ["Hira", "Jpan", "Kana"], Unicode::Scripts.augmented_scripts("ねガ")
140
+ end
141
+
142
+ it "will replace Common with all scripts" do
143
+ assert_equal \
144
+ Unicode::Scripts.names(format: :short, augmented: :include),
145
+ Unicode::Scripts.augmented_scripts("1")
146
+ end
147
+ end
148
+
149
+ describe ".resolved_scripts" do
150
+ it "return intersection of augmented scripts per character" do
151
+ assert_equal ["Cyrl"], Unicode::Scripts.resolved_scripts("СігсӀе")
152
+ assert_equal [], Unicode::Scripts.resolved_scripts("Сirсlе")
153
+ assert_equal \
154
+ Unicode::Scripts.names(format: :short, augmented: :include),
155
+ Unicode::Scripts.resolved_scripts("𝖢𝗂𝗋𝖼𝗅𝖾")
156
+ end
157
+ end
158
+
159
+ describe "mixed?" do
160
+ it "will return true if .resolved_scripts(string) is empty" do
161
+ assert_equal false, Unicode::Scripts.mixed?("СігсӀе")
162
+ assert Unicode::Scripts.mixed?("Сirсlе")
163
+ assert_equal false, Unicode::Scripts.mixed?("𝖢𝗂𝗋𝖼𝗅𝖾")
164
+ assert_equal false, Unicode::Scripts.mixed?("1")
165
+ assert_equal false, Unicode::Scripts.mixed?("ねガ")
166
+ end
167
+ end
168
+
169
+ describe "single?" do
170
+ it "will return true if .resolved_scripts(string) is not empty" do
171
+ assert Unicode::Scripts.single?("СігсӀе")
172
+ assert_equal false, Unicode::Scripts.single?("Сirсlе")
173
+ assert Unicode::Scripts.single?("𝖢𝗂𝗋𝖼𝗅𝖾")
174
+ assert Unicode::Scripts.single?("1")
175
+ assert Unicode::Scripts.single?("ねガ")
176
+ end
177
+ end
178
+
129
179
  describe ".names" do
130
180
  it "will return a list of all script names" do
131
181
  assert_kind_of Array, Unicode::Scripts.names
132
182
  assert_includes Unicode::Scripts.names, "Inscriptional_Parthian"
133
183
  end
184
+
185
+ it "will return a list of all augmented script codes" do
186
+ assert_equal Unicode::Scripts.names(format: :short, augmented: :only), ["Hanb", "Jpan", "Kore"]
187
+ end
134
188
  end
135
189
  end
136
190
 
metadata CHANGED
@@ -1,16 +1,16 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: unicode-scripts
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.9.0
4
+ version: 1.11.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Jan Lelis
8
- autorequire:
8
+ autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2023-10-01 00:00:00.000000000 Z
11
+ date: 2024-11-03 00:00:00.000000000 Z
12
12
  dependencies: []
13
- description: "[Unicode 15.1.0] Retrieve the Unicode script(s) a string belongs to.
13
+ description: "[Unicode 16.0.0] Retrieve the Unicode script(s) a string belongs to.
14
14
  Can also return the Script_Extension property which is defined as characters which
15
15
  are 'commonly used with more than one script, but with a limited number of scripts'. "
16
16
  email:
@@ -39,7 +39,7 @@ licenses:
39
39
  - MIT
40
40
  metadata:
41
41
  rubygems_mfa_required: 'true'
42
- post_install_message:
42
+ post_install_message:
43
43
  rdoc_options: []
44
44
  require_paths:
45
45
  - lib
@@ -54,8 +54,8 @@ required_rubygems_version: !ruby/object:Gem::Requirement
54
54
  - !ruby/object:Gem::Version
55
55
  version: '0'
56
56
  requirements: []
57
- rubygems_version: 3.4.4
58
- signing_key:
57
+ rubygems_version: 3.5.21
58
+ signing_key:
59
59
  specification_version: 4
60
60
  summary: Which script(s) does a Unicode string belong to?
61
61
  test_files: