unicode-scripts 1.3.0 → 1.7.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/CHANGELOG.md +16 -0
- data/Gemfile +1 -0
- data/Gemfile.lock +4 -2
- data/MIT-LICENSE.txt +1 -1
- data/README.md +38 -11
- data/Rakefile +5 -1
- data/data/scripts.marshal.gz +0 -0
- data/lib/unicode/scripts/constants.rb +6 -5
- data/spec/unicode_scripts_spec.rb +15 -2
- data/unicode-scripts.gemspec +2 -2
- metadata +6 -8
- data/.travis.yml +0 -22
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: c907fd10ee383e77217efda8304b1e6182aa59b2dd5f38eb4f6399fed987e1cc
|
4
|
+
data.tar.gz: e51d4c2a61b4c59c6ac5c0ad8cb36ae455e81d4132e6ebc8b1dbf6b1a8735591
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: d01c9f77598090ad2026476eee867cdf4564fdc4cb666e67a6adb99b8ff1fb1ad9345ba3133eff6a53a23ee4ed52a1c1a1503799bf42abe52748f1f10e8c6a60
|
7
|
+
data.tar.gz: 7663f839ebaf0375276356e8815f117180559fbfc959b8c28afadc35999da82a7ca56d67b55a8ef2c0e5263ac6de354e1edee60879e7ec3f8017e40ee6be7d3d
|
data/CHANGELOG.md
CHANGED
data/Gemfile
CHANGED
data/Gemfile.lock
CHANGED
@@ -1,11 +1,12 @@
|
|
1
1
|
PATH
|
2
2
|
remote: .
|
3
3
|
specs:
|
4
|
-
unicode-scripts (1.
|
4
|
+
unicode-scripts (1.5.0)
|
5
5
|
|
6
6
|
GEM
|
7
7
|
remote: https://rubygems.org/
|
8
8
|
specs:
|
9
|
+
irb (1.0.0)
|
9
10
|
minitest (5.9.0)
|
10
11
|
rake (12.0.0)
|
11
12
|
|
@@ -13,9 +14,10 @@ PLATFORMS
|
|
13
14
|
ruby
|
14
15
|
|
15
16
|
DEPENDENCIES
|
17
|
+
irb
|
16
18
|
minitest
|
17
19
|
rake
|
18
20
|
unicode-scripts!
|
19
21
|
|
20
22
|
BUNDLED WITH
|
21
|
-
1.
|
23
|
+
1.17.2
|
data/MIT-LICENSE.txt
CHANGED
data/README.md
CHANGED
@@ -1,12 +1,12 @@
|
|
1
|
-
# Unicode::Scripts [![[version]](https://badge.fury.io/rb/unicode-scripts.svg)](
|
1
|
+
# Unicode::Scripts [![[version]](https://badge.fury.io/rb/unicode-scripts.svg)](https://badge.fury.io/rb/unicode-scripts) [![[ci]](https://github.com/janlelis/unicode-scripts/workflows/Test/badge.svg)](https://github.com/janlelis/unicode-scripts/actions?query=workflow%3ATest)
|
2
2
|
|
3
3
|
Retrieve the [Unicode script(s)](https://en.wikipedia.org/wiki/Script_%28Unicode%29) a string belongs to. Can also return the *Script_Extension* property which is defined as characters which are "commonly used with more than one script, but with a limited number of scripts".
|
4
4
|
|
5
|
-
Unicode version: **
|
5
|
+
Unicode version: **14.0.0** (September 2021)
|
6
6
|
|
7
|
-
Supported Rubies: **
|
7
|
+
Supported Rubies: **3.0**, **2.7**
|
8
8
|
|
9
|
-
Old Rubies that might still work: **2.2**, **2.1**, **2.0**
|
9
|
+
Old Rubies that might still work: **2.6**, **2.5**, **2.4**, **2.3**, **2.2**, **2.1**, **2.0**
|
10
10
|
|
11
11
|
## Gemfile
|
12
12
|
|
@@ -29,21 +29,22 @@ Unicode::Scripts.script("ᴦ") # => "Greek"
|
|
29
29
|
|
30
30
|
# Script_Extension property
|
31
31
|
Unicode::Scripts.script_extensions("॥")
|
32
|
-
# => ["Bengali", "Devanagari", "Dogra", "Grantha", "Gujarati",
|
33
|
-
"
|
34
|
-
"
|
35
|
-
"Takri", "Tamil", "Telugu", "Tirhuta"]
|
32
|
+
# => ["Bengali", "Devanagari", "Dogra", "Grantha", "Gujarati","Gunjala_Gondi", "Gurmukhi", "Kannada",
|
33
|
+
"Khudawadi", "Limbu", "Mahajani", "Malayalam", "Masaram_Gondi", "Nandinagari", "Oriya", "Sinhala",
|
34
|
+
"Syloti_Nagri", "Takri", "Tamil", "Telugu", "Tirhuta"]
|
36
35
|
```
|
37
36
|
|
38
37
|
## Hints
|
39
38
|
### Regex Matching
|
40
39
|
|
41
|
-
If you have a string and want to match a substring/character from a specific Unicode script, you actually won't need this gem. Instead, you can use the [Regexp Unicode Property Syntax `\p{}`](
|
40
|
+
If you have a string and want to match a substring/character from a specific Unicode script, you actually won't need this gem. Instead, you can use the [Regexp Unicode Property Syntax `\p{}`](https://ruby-doc.org/core/Regexp.html#class-Regexp-label-Character+Properties):
|
42
41
|
|
43
42
|
```ruby
|
44
43
|
"Coptic letter: ⲁ".scan(/\p{Coptic}/) # => ["ⲁ"]
|
45
44
|
```
|
46
45
|
|
46
|
+
See [Idiosyncratic Ruby: Proper Unicoding](https://idiosyncratic-ruby.com/41-proper-unicoding.html) for more info.
|
47
|
+
|
47
48
|
### Script Names
|
48
49
|
|
49
50
|
You can extract all script names from the gem like this:
|
@@ -77,17 +78,21 @@ Caucasian_Albanian
|
|
77
78
|
Chakma
|
78
79
|
Cham
|
79
80
|
Cherokee
|
81
|
+
Chorasmian
|
80
82
|
Common
|
81
83
|
Coptic
|
82
84
|
Cuneiform
|
83
85
|
Cypriot
|
86
|
+
Cypro_Minoan
|
84
87
|
Cyrillic
|
85
88
|
Deseret
|
86
89
|
Devanagari
|
90
|
+
Dives_Akuru
|
87
91
|
Dogra
|
88
92
|
Duployan
|
89
93
|
Egyptian_Hieroglyphs
|
90
94
|
Elbasan
|
95
|
+
Elymaic
|
91
96
|
Ethiopic
|
92
97
|
Georgian
|
93
98
|
Glagolitic
|
@@ -115,6 +120,7 @@ Katakana
|
|
115
120
|
Katakana_Or_Hiragana
|
116
121
|
Kayah_Li
|
117
122
|
Kharoshthi
|
123
|
+
Khitan_Small_Script
|
118
124
|
Khmer
|
119
125
|
Khojki
|
120
126
|
Khudawadi
|
@@ -146,10 +152,12 @@ Mro
|
|
146
152
|
Multani
|
147
153
|
Myanmar
|
148
154
|
Nabataean
|
155
|
+
Nandinagari
|
149
156
|
New_Tai_Lue
|
150
157
|
Newa
|
151
158
|
Nko
|
152
159
|
Nushu
|
160
|
+
Nyiakeng_Puachue_Hmong
|
153
161
|
Ogham
|
154
162
|
Ol_Chiki
|
155
163
|
Old_Hungarian
|
@@ -160,6 +168,7 @@ Old_Persian
|
|
160
168
|
Old_Sogdian
|
161
169
|
Old_South_Arabian
|
162
170
|
Old_Turkic
|
171
|
+
Old_Uyghur
|
163
172
|
Oriya
|
164
173
|
Osage
|
165
174
|
Osmanya
|
@@ -191,6 +200,7 @@ Tai_Tham
|
|
191
200
|
Tai_Viet
|
192
201
|
Takri
|
193
202
|
Tamil
|
203
|
+
Tangsa
|
194
204
|
Tangut
|
195
205
|
Telugu
|
196
206
|
Thaana
|
@@ -198,10 +208,14 @@ Thai
|
|
198
208
|
Tibetan
|
199
209
|
Tifinagh
|
200
210
|
Tirhuta
|
211
|
+
Toto
|
201
212
|
Ugaritic
|
202
213
|
Unknown
|
203
214
|
Vai
|
215
|
+
Vithkuqi
|
216
|
+
Wancho
|
204
217
|
Warang_Citi
|
218
|
+
Yezidi
|
205
219
|
Yi
|
206
220
|
Zanabazar_Square
|
207
221
|
```
|
@@ -239,15 +253,19 @@ Cans
|
|
239
253
|
Cari
|
240
254
|
Cham
|
241
255
|
Cher
|
256
|
+
Chrs
|
242
257
|
Copt
|
258
|
+
Cpmn
|
243
259
|
Cprt
|
244
260
|
Cyrl
|
245
261
|
Deva
|
262
|
+
Diak
|
246
263
|
Dogr
|
247
264
|
Dsrt
|
248
265
|
Dupl
|
249
266
|
Egyp
|
250
267
|
Elba
|
268
|
+
Elym
|
251
269
|
Ethi
|
252
270
|
Geor
|
253
271
|
Glag
|
@@ -266,6 +284,7 @@ Hebr
|
|
266
284
|
Hira
|
267
285
|
Hluw
|
268
286
|
Hmng
|
287
|
+
Hmnp
|
269
288
|
Hrkt
|
270
289
|
Hung
|
271
290
|
Ital
|
@@ -275,6 +294,7 @@ Kana
|
|
275
294
|
Khar
|
276
295
|
Khmr
|
277
296
|
Khoj
|
297
|
+
Kits
|
278
298
|
Knda
|
279
299
|
Kthi
|
280
300
|
Lana
|
@@ -303,6 +323,7 @@ Mroo
|
|
303
323
|
Mtei
|
304
324
|
Mult
|
305
325
|
Mymr
|
326
|
+
Nand
|
306
327
|
Narb
|
307
328
|
Nbat
|
308
329
|
Newa
|
@@ -314,6 +335,7 @@ Orkh
|
|
314
335
|
Orya
|
315
336
|
Osge
|
316
337
|
Osma
|
338
|
+
Ougr
|
317
339
|
Palm
|
318
340
|
Pauc
|
319
341
|
Perm
|
@@ -358,11 +380,16 @@ Thaa
|
|
358
380
|
Thai
|
359
381
|
Tibt
|
360
382
|
Tirh
|
383
|
+
Tnsa
|
384
|
+
Toto
|
361
385
|
Ugar
|
362
386
|
Vaii
|
387
|
+
Vith
|
363
388
|
Wara
|
389
|
+
Wcho
|
364
390
|
Xpeo
|
365
391
|
Xsux
|
392
|
+
Yezi
|
366
393
|
Yiii
|
367
394
|
Zanb
|
368
395
|
Zinh
|
@@ -374,5 +401,5 @@ See [unicode-x](https://github.com/janlelis/unicode-x) for more Unicode related
|
|
374
401
|
|
375
402
|
## MIT License
|
376
403
|
|
377
|
-
- Copyright (C) 2016-
|
378
|
-
- Unicode data:
|
404
|
+
- Copyright (C) 2016-2021 Jan Lelis <https://janlelis.com>. Released under the MIT license.
|
405
|
+
- Unicode data: https://www.unicode.org/copyright.html#Exhibit1
|
data/Rakefile
CHANGED
@@ -32,6 +32,10 @@ end
|
|
32
32
|
|
33
33
|
desc "#{gemspec.name} | Spec"
|
34
34
|
task :spec do
|
35
|
-
|
35
|
+
if RbConfig::CONFIG['host_os'] =~ /mswin|mingw/
|
36
|
+
sh "for %f in (spec/\*.rb) do ruby spec/%f"
|
37
|
+
else
|
38
|
+
sh "for file in spec/*.rb; do ruby $file; done"
|
39
|
+
end
|
36
40
|
end
|
37
41
|
task default: :spec
|
data/data/scripts.marshal.gz
CHANGED
Binary file
|
@@ -1,9 +1,10 @@
|
|
1
|
+
# frozen_string_literal: true
|
2
|
+
|
1
3
|
module Unicode
|
2
4
|
module Scripts
|
3
|
-
VERSION = "1.
|
4
|
-
UNICODE_VERSION = "
|
5
|
-
DATA_DIRECTORY = File.expand_path(File.dirname(__FILE__) +
|
6
|
-
INDEX_FILENAME = (DATA_DIRECTORY +
|
5
|
+
VERSION = "1.7.0"
|
6
|
+
UNICODE_VERSION = "14.0.0"
|
7
|
+
DATA_DIRECTORY = File.expand_path(File.dirname(__FILE__) + "/../../../data/").freeze
|
8
|
+
INDEX_FILENAME = (DATA_DIRECTORY + "/scripts.marshal.gz").freeze
|
7
9
|
end
|
8
10
|
end
|
9
|
-
|
@@ -18,8 +18,13 @@ describe Unicode::Scripts do
|
|
18
18
|
|
19
19
|
it "will call .script for every character" do
|
20
20
|
mocked_method = MiniTest::Mock.new
|
21
|
-
|
22
|
-
|
21
|
+
if RUBY_VERSION >= "2.7"
|
22
|
+
mocked_method.expect :call, "first script", ["С"]
|
23
|
+
mocked_method.expect :call, "second script", ["A"]
|
24
|
+
else
|
25
|
+
mocked_method.expect :call, "first script", ["С", {}]
|
26
|
+
mocked_method.expect :call, "second script", ["A", {}]
|
27
|
+
end
|
23
28
|
Unicode::Scripts.stub :script, mocked_method do
|
24
29
|
Unicode::Scripts.of("СA")
|
25
30
|
end
|
@@ -51,14 +56,18 @@ describe Unicode::Scripts do
|
|
51
56
|
assert_equal [
|
52
57
|
"Bengali",
|
53
58
|
"Devanagari",
|
59
|
+
"Dogra",
|
54
60
|
"Grantha",
|
55
61
|
"Gujarati",
|
62
|
+
"Gunjala_Gondi",
|
56
63
|
"Gurmukhi",
|
57
64
|
"Kannada",
|
58
65
|
"Khudawadi",
|
59
66
|
"Limbu",
|
60
67
|
"Mahajani",
|
61
68
|
"Malayalam",
|
69
|
+
"Masaram_Gondi",
|
70
|
+
"Nandinagari",
|
62
71
|
"Oriya",
|
63
72
|
"Sinhala",
|
64
73
|
"Syloti_Nagri",
|
@@ -73,6 +82,9 @@ describe Unicode::Scripts do
|
|
73
82
|
assert_equal [
|
74
83
|
"Beng",
|
75
84
|
"Deva",
|
85
|
+
"Dogr",
|
86
|
+
"Gong",
|
87
|
+
"Gonm",
|
76
88
|
"Gran",
|
77
89
|
"Gujr",
|
78
90
|
"Guru",
|
@@ -80,6 +92,7 @@ describe Unicode::Scripts do
|
|
80
92
|
"Limb",
|
81
93
|
"Mahj",
|
82
94
|
"Mlym",
|
95
|
+
"Nand",
|
83
96
|
"Orya",
|
84
97
|
"Sind",
|
85
98
|
"Sinh",
|
data/unicode-scripts.gemspec
CHANGED
@@ -8,7 +8,7 @@ Gem::Specification.new do |gem|
|
|
8
8
|
gem.summary = "Which script(s) does a Unicode string belong to?"
|
9
9
|
gem.description = "[Unicode #{Unicode::Scripts::UNICODE_VERSION}] Retrieve the Unicode script(s) a string belongs to. Can also return the Script_Extension property which is defined as characters which are 'commonly used with more than one script, but with a limited number of scripts'. "
|
10
10
|
gem.authors = ["Jan Lelis"]
|
11
|
-
gem.email = ["
|
11
|
+
gem.email = ["hi@ruby.consulting"]
|
12
12
|
gem.homepage = "https://github.com/janlelis/unicode-scripts"
|
13
13
|
gem.license = "MIT"
|
14
14
|
|
@@ -17,5 +17,5 @@ Gem::Specification.new do |gem|
|
|
17
17
|
gem.test_files = gem.files.grep(%r{^(test|spec|features)/})
|
18
18
|
gem.require_paths = ["lib"]
|
19
19
|
|
20
|
-
gem.required_ruby_version = "
|
20
|
+
gem.required_ruby_version = ">= 2.0"
|
21
21
|
end
|
metadata
CHANGED
@@ -1,26 +1,25 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: unicode-scripts
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 1.
|
4
|
+
version: 1.7.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Jan Lelis
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date:
|
11
|
+
date: 2021-09-15 00:00:00.000000000 Z
|
12
12
|
dependencies: []
|
13
|
-
description: "[Unicode
|
13
|
+
description: "[Unicode 14.0.0] Retrieve the Unicode script(s) a string belongs to.
|
14
14
|
Can also return the Script_Extension property which is defined as characters which
|
15
15
|
are 'commonly used with more than one script, but with a limited number of scripts'. "
|
16
16
|
email:
|
17
|
-
-
|
17
|
+
- hi@ruby.consulting
|
18
18
|
executables: []
|
19
19
|
extensions: []
|
20
20
|
extra_rdoc_files: []
|
21
21
|
files:
|
22
22
|
- ".gitignore"
|
23
|
-
- ".travis.yml"
|
24
23
|
- CHANGELOG.md
|
25
24
|
- CODE_OF_CONDUCT.md
|
26
25
|
- Gemfile
|
@@ -45,7 +44,7 @@ require_paths:
|
|
45
44
|
- lib
|
46
45
|
required_ruby_version: !ruby/object:Gem::Requirement
|
47
46
|
requirements:
|
48
|
-
- - "
|
47
|
+
- - ">="
|
49
48
|
- !ruby/object:Gem::Version
|
50
49
|
version: '2.0'
|
51
50
|
required_rubygems_version: !ruby/object:Gem::Requirement
|
@@ -54,8 +53,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
54
53
|
- !ruby/object:Gem::Version
|
55
54
|
version: '0'
|
56
55
|
requirements: []
|
57
|
-
|
58
|
-
rubygems_version: 2.7.6
|
56
|
+
rubygems_version: 3.2.3
|
59
57
|
signing_key:
|
60
58
|
specification_version: 4
|
61
59
|
summary: Which script(s) does a Unicode string belong to?
|
data/.travis.yml
DELETED
@@ -1,22 +0,0 @@
|
|
1
|
-
sudo: false
|
2
|
-
language: ruby
|
3
|
-
|
4
|
-
script: bundle exec ruby spec/unicode_scripts_spec.rb
|
5
|
-
|
6
|
-
rvm:
|
7
|
-
- ruby-head
|
8
|
-
- 2.5.1
|
9
|
-
- 2.4.4
|
10
|
-
- 2.3.7
|
11
|
-
- 2.2
|
12
|
-
- 2.1
|
13
|
-
- 2.0
|
14
|
-
- jruby-head
|
15
|
-
- jruby-9.1.16.0
|
16
|
-
|
17
|
-
matrix:
|
18
|
-
allow_failures:
|
19
|
-
- rvm: 2.2
|
20
|
-
- rvm: 2.1
|
21
|
-
- rvm: 2.0
|
22
|
-
- rvm: jruby-head
|