unicode-scripts 1.3.0 → 1.7.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +16 -0
- data/Gemfile +1 -0
- data/Gemfile.lock +4 -2
- data/MIT-LICENSE.txt +1 -1
- data/README.md +38 -11
- data/Rakefile +5 -1
- data/data/scripts.marshal.gz +0 -0
- data/lib/unicode/scripts/constants.rb +6 -5
- data/spec/unicode_scripts_spec.rb +15 -2
- data/unicode-scripts.gemspec +2 -2
- metadata +6 -8
- data/.travis.yml +0 -22
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: c907fd10ee383e77217efda8304b1e6182aa59b2dd5f38eb4f6399fed987e1cc
|
4
|
+
data.tar.gz: e51d4c2a61b4c59c6ac5c0ad8cb36ae455e81d4132e6ebc8b1dbf6b1a8735591
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: d01c9f77598090ad2026476eee867cdf4564fdc4cb666e67a6adb99b8ff1fb1ad9345ba3133eff6a53a23ee4ed52a1c1a1503799bf42abe52748f1f10e8c6a60
|
7
|
+
data.tar.gz: 7663f839ebaf0375276356e8815f117180559fbfc959b8c28afadc35999da82a7ca56d67b55a8ef2c0e5263ac6de354e1edee60879e7ec3f8017e40ee6be7d3d
|
data/CHANGELOG.md
CHANGED
data/Gemfile
CHANGED
data/Gemfile.lock
CHANGED
@@ -1,11 +1,12 @@
|
|
1
1
|
PATH
|
2
2
|
remote: .
|
3
3
|
specs:
|
4
|
-
unicode-scripts (1.
|
4
|
+
unicode-scripts (1.5.0)
|
5
5
|
|
6
6
|
GEM
|
7
7
|
remote: https://rubygems.org/
|
8
8
|
specs:
|
9
|
+
irb (1.0.0)
|
9
10
|
minitest (5.9.0)
|
10
11
|
rake (12.0.0)
|
11
12
|
|
@@ -13,9 +14,10 @@ PLATFORMS
|
|
13
14
|
ruby
|
14
15
|
|
15
16
|
DEPENDENCIES
|
17
|
+
irb
|
16
18
|
minitest
|
17
19
|
rake
|
18
20
|
unicode-scripts!
|
19
21
|
|
20
22
|
BUNDLED WITH
|
21
|
-
1.
|
23
|
+
1.17.2
|
data/MIT-LICENSE.txt
CHANGED
data/README.md
CHANGED
@@ -1,12 +1,12 @@
|
|
1
|
-
# Unicode::Scripts [![[version]](https://badge.fury.io/rb/unicode-scripts.svg)](
|
1
|
+
# Unicode::Scripts [![[version]](https://badge.fury.io/rb/unicode-scripts.svg)](https://badge.fury.io/rb/unicode-scripts) [![[ci]](https://github.com/janlelis/unicode-scripts/workflows/Test/badge.svg)](https://github.com/janlelis/unicode-scripts/actions?query=workflow%3ATest)
|
2
2
|
|
3
3
|
Retrieve the [Unicode script(s)](https://en.wikipedia.org/wiki/Script_%28Unicode%29) a string belongs to. Can also return the *Script_Extension* property which is defined as characters which are "commonly used with more than one script, but with a limited number of scripts".
|
4
4
|
|
5
|
-
Unicode version: **
|
5
|
+
Unicode version: **14.0.0** (September 2021)
|
6
6
|
|
7
|
-
Supported Rubies: **
|
7
|
+
Supported Rubies: **3.0**, **2.7**
|
8
8
|
|
9
|
-
Old Rubies that might still work: **2.2**, **2.1**, **2.0**
|
9
|
+
Old Rubies that might still work: **2.6**, **2.5**, **2.4**, **2.3**, **2.2**, **2.1**, **2.0**
|
10
10
|
|
11
11
|
## Gemfile
|
12
12
|
|
@@ -29,21 +29,22 @@ Unicode::Scripts.script("ᴦ") # => "Greek"
|
|
29
29
|
|
30
30
|
# Script_Extension property
|
31
31
|
Unicode::Scripts.script_extensions("॥")
|
32
|
-
# => ["Bengali", "Devanagari", "Dogra", "Grantha", "Gujarati",
|
33
|
-
"
|
34
|
-
"
|
35
|
-
"Takri", "Tamil", "Telugu", "Tirhuta"]
|
32
|
+
# => ["Bengali", "Devanagari", "Dogra", "Grantha", "Gujarati","Gunjala_Gondi", "Gurmukhi", "Kannada",
|
33
|
+
"Khudawadi", "Limbu", "Mahajani", "Malayalam", "Masaram_Gondi", "Nandinagari", "Oriya", "Sinhala",
|
34
|
+
"Syloti_Nagri", "Takri", "Tamil", "Telugu", "Tirhuta"]
|
36
35
|
```
|
37
36
|
|
38
37
|
## Hints
|
39
38
|
### Regex Matching
|
40
39
|
|
41
|
-
If you have a string and want to match a substring/character from a specific Unicode script, you actually won't need this gem. Instead, you can use the [Regexp Unicode Property Syntax `\p{}`](
|
40
|
+
If you have a string and want to match a substring/character from a specific Unicode script, you actually won't need this gem. Instead, you can use the [Regexp Unicode Property Syntax `\p{}`](https://ruby-doc.org/core/Regexp.html#class-Regexp-label-Character+Properties):
|
42
41
|
|
43
42
|
```ruby
|
44
43
|
"Coptic letter: ⲁ".scan(/\p{Coptic}/) # => ["ⲁ"]
|
45
44
|
```
|
46
45
|
|
46
|
+
See [Idiosyncratic Ruby: Proper Unicoding](https://idiosyncratic-ruby.com/41-proper-unicoding.html) for more info.
|
47
|
+
|
47
48
|
### Script Names
|
48
49
|
|
49
50
|
You can extract all script names from the gem like this:
|
@@ -77,17 +78,21 @@ Caucasian_Albanian
|
|
77
78
|
Chakma
|
78
79
|
Cham
|
79
80
|
Cherokee
|
81
|
+
Chorasmian
|
80
82
|
Common
|
81
83
|
Coptic
|
82
84
|
Cuneiform
|
83
85
|
Cypriot
|
86
|
+
Cypro_Minoan
|
84
87
|
Cyrillic
|
85
88
|
Deseret
|
86
89
|
Devanagari
|
90
|
+
Dives_Akuru
|
87
91
|
Dogra
|
88
92
|
Duployan
|
89
93
|
Egyptian_Hieroglyphs
|
90
94
|
Elbasan
|
95
|
+
Elymaic
|
91
96
|
Ethiopic
|
92
97
|
Georgian
|
93
98
|
Glagolitic
|
@@ -115,6 +120,7 @@ Katakana
|
|
115
120
|
Katakana_Or_Hiragana
|
116
121
|
Kayah_Li
|
117
122
|
Kharoshthi
|
123
|
+
Khitan_Small_Script
|
118
124
|
Khmer
|
119
125
|
Khojki
|
120
126
|
Khudawadi
|
@@ -146,10 +152,12 @@ Mro
|
|
146
152
|
Multani
|
147
153
|
Myanmar
|
148
154
|
Nabataean
|
155
|
+
Nandinagari
|
149
156
|
New_Tai_Lue
|
150
157
|
Newa
|
151
158
|
Nko
|
152
159
|
Nushu
|
160
|
+
Nyiakeng_Puachue_Hmong
|
153
161
|
Ogham
|
154
162
|
Ol_Chiki
|
155
163
|
Old_Hungarian
|
@@ -160,6 +168,7 @@ Old_Persian
|
|
160
168
|
Old_Sogdian
|
161
169
|
Old_South_Arabian
|
162
170
|
Old_Turkic
|
171
|
+
Old_Uyghur
|
163
172
|
Oriya
|
164
173
|
Osage
|
165
174
|
Osmanya
|
@@ -191,6 +200,7 @@ Tai_Tham
|
|
191
200
|
Tai_Viet
|
192
201
|
Takri
|
193
202
|
Tamil
|
203
|
+
Tangsa
|
194
204
|
Tangut
|
195
205
|
Telugu
|
196
206
|
Thaana
|
@@ -198,10 +208,14 @@ Thai
|
|
198
208
|
Tibetan
|
199
209
|
Tifinagh
|
200
210
|
Tirhuta
|
211
|
+
Toto
|
201
212
|
Ugaritic
|
202
213
|
Unknown
|
203
214
|
Vai
|
215
|
+
Vithkuqi
|
216
|
+
Wancho
|
204
217
|
Warang_Citi
|
218
|
+
Yezidi
|
205
219
|
Yi
|
206
220
|
Zanabazar_Square
|
207
221
|
```
|
@@ -239,15 +253,19 @@ Cans
|
|
239
253
|
Cari
|
240
254
|
Cham
|
241
255
|
Cher
|
256
|
+
Chrs
|
242
257
|
Copt
|
258
|
+
Cpmn
|
243
259
|
Cprt
|
244
260
|
Cyrl
|
245
261
|
Deva
|
262
|
+
Diak
|
246
263
|
Dogr
|
247
264
|
Dsrt
|
248
265
|
Dupl
|
249
266
|
Egyp
|
250
267
|
Elba
|
268
|
+
Elym
|
251
269
|
Ethi
|
252
270
|
Geor
|
253
271
|
Glag
|
@@ -266,6 +284,7 @@ Hebr
|
|
266
284
|
Hira
|
267
285
|
Hluw
|
268
286
|
Hmng
|
287
|
+
Hmnp
|
269
288
|
Hrkt
|
270
289
|
Hung
|
271
290
|
Ital
|
@@ -275,6 +294,7 @@ Kana
|
|
275
294
|
Khar
|
276
295
|
Khmr
|
277
296
|
Khoj
|
297
|
+
Kits
|
278
298
|
Knda
|
279
299
|
Kthi
|
280
300
|
Lana
|
@@ -303,6 +323,7 @@ Mroo
|
|
303
323
|
Mtei
|
304
324
|
Mult
|
305
325
|
Mymr
|
326
|
+
Nand
|
306
327
|
Narb
|
307
328
|
Nbat
|
308
329
|
Newa
|
@@ -314,6 +335,7 @@ Orkh
|
|
314
335
|
Orya
|
315
336
|
Osge
|
316
337
|
Osma
|
338
|
+
Ougr
|
317
339
|
Palm
|
318
340
|
Pauc
|
319
341
|
Perm
|
@@ -358,11 +380,16 @@ Thaa
|
|
358
380
|
Thai
|
359
381
|
Tibt
|
360
382
|
Tirh
|
383
|
+
Tnsa
|
384
|
+
Toto
|
361
385
|
Ugar
|
362
386
|
Vaii
|
387
|
+
Vith
|
363
388
|
Wara
|
389
|
+
Wcho
|
364
390
|
Xpeo
|
365
391
|
Xsux
|
392
|
+
Yezi
|
366
393
|
Yiii
|
367
394
|
Zanb
|
368
395
|
Zinh
|
@@ -374,5 +401,5 @@ See [unicode-x](https://github.com/janlelis/unicode-x) for more Unicode related
|
|
374
401
|
|
375
402
|
## MIT License
|
376
403
|
|
377
|
-
- Copyright (C) 2016-
|
378
|
-
- Unicode data:
|
404
|
+
- Copyright (C) 2016-2021 Jan Lelis <https://janlelis.com>. Released under the MIT license.
|
405
|
+
- Unicode data: https://www.unicode.org/copyright.html#Exhibit1
|
data/Rakefile
CHANGED
@@ -32,6 +32,10 @@ end
|
|
32
32
|
|
33
33
|
desc "#{gemspec.name} | Spec"
|
34
34
|
task :spec do
|
35
|
-
|
35
|
+
if RbConfig::CONFIG['host_os'] =~ /mswin|mingw/
|
36
|
+
sh "for %f in (spec/\*.rb) do ruby spec/%f"
|
37
|
+
else
|
38
|
+
sh "for file in spec/*.rb; do ruby $file; done"
|
39
|
+
end
|
36
40
|
end
|
37
41
|
task default: :spec
|
data/data/scripts.marshal.gz
CHANGED
Binary file
|
@@ -1,9 +1,10 @@
|
|
1
|
+
# frozen_string_literal: true
|
2
|
+
|
1
3
|
module Unicode
|
2
4
|
module Scripts
|
3
|
-
VERSION = "1.
|
4
|
-
UNICODE_VERSION = "
|
5
|
-
DATA_DIRECTORY = File.expand_path(File.dirname(__FILE__) +
|
6
|
-
INDEX_FILENAME = (DATA_DIRECTORY +
|
5
|
+
VERSION = "1.7.0"
|
6
|
+
UNICODE_VERSION = "14.0.0"
|
7
|
+
DATA_DIRECTORY = File.expand_path(File.dirname(__FILE__) + "/../../../data/").freeze
|
8
|
+
INDEX_FILENAME = (DATA_DIRECTORY + "/scripts.marshal.gz").freeze
|
7
9
|
end
|
8
10
|
end
|
9
|
-
|
@@ -18,8 +18,13 @@ describe Unicode::Scripts do
|
|
18
18
|
|
19
19
|
it "will call .script for every character" do
|
20
20
|
mocked_method = MiniTest::Mock.new
|
21
|
-
|
22
|
-
|
21
|
+
if RUBY_VERSION >= "2.7"
|
22
|
+
mocked_method.expect :call, "first script", ["С"]
|
23
|
+
mocked_method.expect :call, "second script", ["A"]
|
24
|
+
else
|
25
|
+
mocked_method.expect :call, "first script", ["С", {}]
|
26
|
+
mocked_method.expect :call, "second script", ["A", {}]
|
27
|
+
end
|
23
28
|
Unicode::Scripts.stub :script, mocked_method do
|
24
29
|
Unicode::Scripts.of("СA")
|
25
30
|
end
|
@@ -51,14 +56,18 @@ describe Unicode::Scripts do
|
|
51
56
|
assert_equal [
|
52
57
|
"Bengali",
|
53
58
|
"Devanagari",
|
59
|
+
"Dogra",
|
54
60
|
"Grantha",
|
55
61
|
"Gujarati",
|
62
|
+
"Gunjala_Gondi",
|
56
63
|
"Gurmukhi",
|
57
64
|
"Kannada",
|
58
65
|
"Khudawadi",
|
59
66
|
"Limbu",
|
60
67
|
"Mahajani",
|
61
68
|
"Malayalam",
|
69
|
+
"Masaram_Gondi",
|
70
|
+
"Nandinagari",
|
62
71
|
"Oriya",
|
63
72
|
"Sinhala",
|
64
73
|
"Syloti_Nagri",
|
@@ -73,6 +82,9 @@ describe Unicode::Scripts do
|
|
73
82
|
assert_equal [
|
74
83
|
"Beng",
|
75
84
|
"Deva",
|
85
|
+
"Dogr",
|
86
|
+
"Gong",
|
87
|
+
"Gonm",
|
76
88
|
"Gran",
|
77
89
|
"Gujr",
|
78
90
|
"Guru",
|
@@ -80,6 +92,7 @@ describe Unicode::Scripts do
|
|
80
92
|
"Limb",
|
81
93
|
"Mahj",
|
82
94
|
"Mlym",
|
95
|
+
"Nand",
|
83
96
|
"Orya",
|
84
97
|
"Sind",
|
85
98
|
"Sinh",
|
data/unicode-scripts.gemspec
CHANGED
@@ -8,7 +8,7 @@ Gem::Specification.new do |gem|
|
|
8
8
|
gem.summary = "Which script(s) does a Unicode string belong to?"
|
9
9
|
gem.description = "[Unicode #{Unicode::Scripts::UNICODE_VERSION}] Retrieve the Unicode script(s) a string belongs to. Can also return the Script_Extension property which is defined as characters which are 'commonly used with more than one script, but with a limited number of scripts'. "
|
10
10
|
gem.authors = ["Jan Lelis"]
|
11
|
-
gem.email = ["
|
11
|
+
gem.email = ["hi@ruby.consulting"]
|
12
12
|
gem.homepage = "https://github.com/janlelis/unicode-scripts"
|
13
13
|
gem.license = "MIT"
|
14
14
|
|
@@ -17,5 +17,5 @@ Gem::Specification.new do |gem|
|
|
17
17
|
gem.test_files = gem.files.grep(%r{^(test|spec|features)/})
|
18
18
|
gem.require_paths = ["lib"]
|
19
19
|
|
20
|
-
gem.required_ruby_version = "
|
20
|
+
gem.required_ruby_version = ">= 2.0"
|
21
21
|
end
|
metadata
CHANGED
@@ -1,26 +1,25 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: unicode-scripts
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 1.
|
4
|
+
version: 1.7.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Jan Lelis
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date:
|
11
|
+
date: 2021-09-15 00:00:00.000000000 Z
|
12
12
|
dependencies: []
|
13
|
-
description: "[Unicode
|
13
|
+
description: "[Unicode 14.0.0] Retrieve the Unicode script(s) a string belongs to.
|
14
14
|
Can also return the Script_Extension property which is defined as characters which
|
15
15
|
are 'commonly used with more than one script, but with a limited number of scripts'. "
|
16
16
|
email:
|
17
|
-
-
|
17
|
+
- hi@ruby.consulting
|
18
18
|
executables: []
|
19
19
|
extensions: []
|
20
20
|
extra_rdoc_files: []
|
21
21
|
files:
|
22
22
|
- ".gitignore"
|
23
|
-
- ".travis.yml"
|
24
23
|
- CHANGELOG.md
|
25
24
|
- CODE_OF_CONDUCT.md
|
26
25
|
- Gemfile
|
@@ -45,7 +44,7 @@ require_paths:
|
|
45
44
|
- lib
|
46
45
|
required_ruby_version: !ruby/object:Gem::Requirement
|
47
46
|
requirements:
|
48
|
-
- - "
|
47
|
+
- - ">="
|
49
48
|
- !ruby/object:Gem::Version
|
50
49
|
version: '2.0'
|
51
50
|
required_rubygems_version: !ruby/object:Gem::Requirement
|
@@ -54,8 +53,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
54
53
|
- !ruby/object:Gem::Version
|
55
54
|
version: '0'
|
56
55
|
requirements: []
|
57
|
-
|
58
|
-
rubygems_version: 2.7.6
|
56
|
+
rubygems_version: 3.2.3
|
59
57
|
signing_key:
|
60
58
|
specification_version: 4
|
61
59
|
summary: Which script(s) does a Unicode string belong to?
|
data/.travis.yml
DELETED
@@ -1,22 +0,0 @@
|
|
1
|
-
sudo: false
|
2
|
-
language: ruby
|
3
|
-
|
4
|
-
script: bundle exec ruby spec/unicode_scripts_spec.rb
|
5
|
-
|
6
|
-
rvm:
|
7
|
-
- ruby-head
|
8
|
-
- 2.5.1
|
9
|
-
- 2.4.4
|
10
|
-
- 2.3.7
|
11
|
-
- 2.2
|
12
|
-
- 2.1
|
13
|
-
- 2.0
|
14
|
-
- jruby-head
|
15
|
-
- jruby-9.1.16.0
|
16
|
-
|
17
|
-
matrix:
|
18
|
-
allow_failures:
|
19
|
-
- rvm: 2.2
|
20
|
-
- rvm: 2.1
|
21
|
-
- rvm: 2.0
|
22
|
-
- rvm: jruby-head
|