script_detector_2 0.1.0 → 0.4.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 27b39d88507e32b9e36090c9612b98e58102694728ffe3a92d0016bfd4be05b3
4
- data.tar.gz: 3938cf9b3c1f0cafeb4a760c498c54ae8cfeabb991cce8b3bd960a3df5537f63
3
+ metadata.gz: 37ab1716845b98ca15a072e67cd900f5854dadd3c3641667fb5887c1626c4a85
4
+ data.tar.gz: 5142f341e40601f3d1fff211ff1fa566047f6353d0d3f6bc5062653827e759db
5
5
  SHA512:
6
- metadata.gz: 390462194fb378cc0ced1b412c20b25a8c234a627db963fada315e2ab419b87da59f5c64f850278af4d0cbb2275739d47707434cdd3dab39c09a19ed8800b83b
7
- data.tar.gz: 8ea146afc460673c14f229762380440c35688d951965ea676e6b3de5caf95bd8a7d56da295afce9ea6fb807e24693e641a3bd1bac6b1db835e8504b102668792
6
+ metadata.gz: 35b1771a4d8898d6ca02f38b4a016adc927600e7456b0a41ae515f88e85c846d8a41951ce78fa12c7a2851fc596a13ab48a3537dd97ff4579e8cd45d19b50ccf
7
+ data.tar.gz: ea6da32d1c4b9a10a4c04208fda4bee7bc6bad7d8ffa635f163c608e256160b4115fe3c1a8f4e7c9ec8b65f435ddca1fcb7af9f857a82042edd11457685d7424
data/.rubocop.yml CHANGED
@@ -1,5 +1,9 @@
1
1
  inherit_from: .rubocop_todo.yml
2
2
 
3
+ inherit_mode:
4
+ merge:
5
+ - Exclude
6
+
3
7
  AllCops:
4
8
  TargetRubyVersion: 2.5
5
9
  SuggestExtensions: false
data/.rubocop_todo.yml CHANGED
@@ -1,12 +1,27 @@
1
1
  # This configuration was generated by
2
2
  # `rubocop --auto-gen-config`
3
- # on 2021-08-21 13:26:50 UTC using RuboCop version 1.19.1.
3
+ # on 2021-11-24 02:35:34 UTC using RuboCop version 1.23.0.
4
4
  # The point is for the user to remove these configuration records
5
5
  # one by one as the offenses are removed from the code base.
6
6
  # Note that changes in the inspected code, or installation of new
7
7
  # versions of RuboCop, may require this file to be generated again.
8
8
 
9
- # Offense count: 1
9
+ # Offense count: 2
10
+ # Configuration parameters: IgnoredMethods, CountRepeatedAttributes.
11
+ Metrics/AbcSize:
12
+ Max: 20
13
+
14
+ # Offense count: 2
15
+ # Configuration parameters: IgnoredMethods.
16
+ Metrics/CyclomaticComplexity:
17
+ Max: 13
18
+
19
+ # Offense count: 2
10
20
  # Configuration parameters: CountComments, CountAsOne, ExcludedMethods, IgnoredMethods.
11
21
  Metrics/MethodLength:
22
+ Max: 15
23
+
24
+ # Offense count: 1
25
+ # Configuration parameters: IgnoredMethods.
26
+ Metrics/PerceivedComplexity:
12
27
  Max: 14
data/.ruby-version ADDED
@@ -0,0 +1 @@
1
+ 2.7.4
data/CHANGELOG.md CHANGED
@@ -1,5 +1,26 @@
1
1
  ## [Unreleased]
2
2
 
3
+ ## [0.4.0] - 2021-11-24
4
+
5
+ - Add `identify_scripts` method
6
+ - Improve accuracy of `identify_script` method
7
+
8
+ ## [0.3.0] - 2021-10-13
9
+
10
+ - Add `kana?` and `hangul?` methods
11
+ - Improve accuracy of `identify_script` method
12
+ - `chinese?` method now returns actual boolean instead of merely something
13
+ truthy
14
+
15
+ ## [0.2.0] - 2021-08-23
16
+
17
+ - Slight optimization of script-matching regexps
18
+ - Script-matching regexps now match against entire string
19
+
20
+ ## [0.1.1] - 2021-08-21
21
+
22
+ - Improve identification of ambiguous Chinese
23
+
3
24
  ## [0.1.0] - 2021-08-21
4
25
 
5
26
  - Initial release
data/Gemfile.lock CHANGED
@@ -1,14 +1,14 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- script_detector_2 (0.1.0)
4
+ script_detector_2 (0.4.0)
5
5
 
6
6
  GEM
7
7
  remote: https://rubygems.org/
8
8
  specs:
9
9
  ast (2.4.2)
10
10
  backport (1.2.0)
11
- benchmark (0.1.1)
11
+ benchmark (0.2.0)
12
12
  byebug (11.1.3)
13
13
  diff-lcs (1.4.4)
14
14
  e2mmap (0.1.0)
@@ -18,32 +18,32 @@ GEM
18
18
  kramdown-parser-gfm (1.1.0)
19
19
  kramdown (~> 2.0)
20
20
  minitest (5.14.4)
21
- nokogiri (1.12.3-x86_64-darwin)
21
+ nokogiri (1.12.5-x86_64-darwin)
22
22
  racc (~> 1.4)
23
- parallel (1.20.1)
23
+ parallel (1.21.0)
24
24
  parser (3.0.2.0)
25
25
  ast (~> 2.4.1)
26
- racc (1.5.2)
26
+ racc (1.6.0)
27
27
  rainbow (3.0.0)
28
28
  rake (13.0.6)
29
29
  regexp_parser (2.1.1)
30
- reverse_markdown (2.0.0)
30
+ reverse_markdown (2.1.1)
31
31
  nokogiri
32
32
  rexml (3.2.5)
33
- rubocop (1.19.1)
33
+ rubocop (1.23.0)
34
34
  parallel (~> 1.10)
35
35
  parser (>= 3.0.0.0)
36
36
  rainbow (>= 2.2.2, < 4.0)
37
37
  regexp_parser (>= 1.8, < 3.0)
38
38
  rexml
39
- rubocop-ast (>= 1.9.1, < 2.0)
39
+ rubocop-ast (>= 1.12.0, < 2.0)
40
40
  ruby-progressbar (~> 1.7)
41
41
  unicode-display_width (>= 1.4.0, < 3.0)
42
- rubocop-ast (1.10.0)
42
+ rubocop-ast (1.13.0)
43
43
  parser (>= 3.0.1.1)
44
44
  ruby-progressbar (1.11.0)
45
45
  rubyzip (2.3.2)
46
- solargraph (0.43.0)
46
+ solargraph (0.44.2)
47
47
  backport (~> 1.2)
48
48
  benchmark
49
49
  bundler (>= 1.17.2)
@@ -60,7 +60,7 @@ GEM
60
60
  yard (~> 0.9, >= 0.9.24)
61
61
  thor (1.1.0)
62
62
  tilt (2.0.10)
63
- unicode-display_width (2.0.0)
63
+ unicode-display_width (2.1.0)
64
64
  yard (0.9.26)
65
65
 
66
66
  PLATFORMS
@@ -76,4 +76,4 @@ DEPENDENCIES
76
76
  solargraph
77
77
 
78
78
  BUNDLED WITH
79
- 2.2.26
79
+ 2.2.32
data/README.md CHANGED
@@ -12,7 +12,7 @@ Unlike the original script_detector, this gem:
12
12
  - Uses the
13
13
  [kUnihanCore2020](https://www.unicode.org/reports/tr38/#kUnihanCore2020)
14
14
  property of the Unicode Unihan database to determine which characters belong
15
- to which script (Unicode 13)
15
+ to which script (Unicode 14)
16
16
  ([details](http://www.unicode.org/L2/L2019/19388-unihan-core-2020.pdf))
17
17
  - Uses [ISO 15924 script names](https://en.wikipedia.org/wiki/ISO_15924) in
18
18
  symbol form as return values (instead of English strings)
@@ -42,6 +42,7 @@ The main detection methods are:
42
42
  - `ScriptDetector2.simplified_chinese?`
43
43
  - `ScriptDetector2.traditional_chinese?`
44
44
  - `ScriptDetector2.identify_script`
45
+ - `ScriptDetector2.identify_scripts`
45
46
 
46
47
  Regexp patterns are used to identify the script to which Han characters belong.
47
48
  These can be used directly as well:
@@ -55,6 +56,15 @@ These can be used directly as well:
55
56
  - `ScriptDetector2::KOREAN_PATTERN`: matches all Han characters in the
56
57
  kUnihanCore2020 set marked as ROK (K) or DPRK (P)
57
58
 
59
+ Each of the above patterns matches an entire string containing only Han
60
+ characters of the indicated script, i.e.
61
+
62
+ ```ruby
63
+ ScriptDetector2::JAPANESE_PATTERN.match?('日本語') # => true
64
+ ScriptDetector2::JAPANESE_PATTERN.match?('你好') # => false
65
+ ScriptDetector2::JAPANESE_PATTERN.match?('Hello 日本語') # => false
66
+ ```
67
+
58
68
  To recreate the script_detector gem's extension of the String class, use the
59
69
  supplied refinement like so:
60
70