script_detector_2 0.1.0 → 0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 27b39d88507e32b9e36090c9612b98e58102694728ffe3a92d0016bfd4be05b3
4
- data.tar.gz: 3938cf9b3c1f0cafeb4a760c498c54ae8cfeabb991cce8b3bd960a3df5537f63
3
+ metadata.gz: 37ab1716845b98ca15a072e67cd900f5854dadd3c3641667fb5887c1626c4a85
4
+ data.tar.gz: 5142f341e40601f3d1fff211ff1fa566047f6353d0d3f6bc5062653827e759db
5
5
  SHA512:
6
- metadata.gz: 390462194fb378cc0ced1b412c20b25a8c234a627db963fada315e2ab419b87da59f5c64f850278af4d0cbb2275739d47707434cdd3dab39c09a19ed8800b83b
7
- data.tar.gz: 8ea146afc460673c14f229762380440c35688d951965ea676e6b3de5caf95bd8a7d56da295afce9ea6fb807e24693e641a3bd1bac6b1db835e8504b102668792
6
+ metadata.gz: 35b1771a4d8898d6ca02f38b4a016adc927600e7456b0a41ae515f88e85c846d8a41951ce78fa12c7a2851fc596a13ab48a3537dd97ff4579e8cd45d19b50ccf
7
+ data.tar.gz: ea6da32d1c4b9a10a4c04208fda4bee7bc6bad7d8ffa635f163c608e256160b4115fe3c1a8f4e7c9ec8b65f435ddca1fcb7af9f857a82042edd11457685d7424
data/.rubocop.yml CHANGED
@@ -1,5 +1,9 @@
1
1
  inherit_from: .rubocop_todo.yml
2
2
 
3
+ inherit_mode:
4
+ merge:
5
+ - Exclude
6
+
3
7
  AllCops:
4
8
  TargetRubyVersion: 2.5
5
9
  SuggestExtensions: false
data/.rubocop_todo.yml CHANGED
@@ -1,12 +1,27 @@
1
1
  # This configuration was generated by
2
2
  # `rubocop --auto-gen-config`
3
- # on 2021-08-21 13:26:50 UTC using RuboCop version 1.19.1.
3
+ # on 2021-11-24 02:35:34 UTC using RuboCop version 1.23.0.
4
4
  # The point is for the user to remove these configuration records
5
5
  # one by one as the offenses are removed from the code base.
6
6
  # Note that changes in the inspected code, or installation of new
7
7
  # versions of RuboCop, may require this file to be generated again.
8
8
 
9
- # Offense count: 1
9
+ # Offense count: 2
10
+ # Configuration parameters: IgnoredMethods, CountRepeatedAttributes.
11
+ Metrics/AbcSize:
12
+ Max: 20
13
+
14
+ # Offense count: 2
15
+ # Configuration parameters: IgnoredMethods.
16
+ Metrics/CyclomaticComplexity:
17
+ Max: 13
18
+
19
+ # Offense count: 2
10
20
  # Configuration parameters: CountComments, CountAsOne, ExcludedMethods, IgnoredMethods.
11
21
  Metrics/MethodLength:
22
+ Max: 15
23
+
24
+ # Offense count: 1
25
+ # Configuration parameters: IgnoredMethods.
26
+ Metrics/PerceivedComplexity:
12
27
  Max: 14
data/.ruby-version ADDED
@@ -0,0 +1 @@
1
+ 2.7.4
data/CHANGELOG.md CHANGED
@@ -1,5 +1,26 @@
1
1
  ## [Unreleased]
2
2
 
3
+ ## [0.4.0] - 2021-11-24
4
+
5
+ - Add `identify_scripts` method
6
+ - Improve accuracy of `identify_script` method
7
+
8
+ ## [0.3.0] - 2021-10-13
9
+
10
+ - Add `kana?` and `hangul?` methods
11
+ - Improve accuracy of `identify_script` method
12
+ - `chinese?` method now returns actual boolean instead of merely something
13
+ truthy
14
+
15
+ ## [0.2.0] - 2021-08-23
16
+
17
+ - Slight optimization of script-matching regexps
18
+ - Script-matching regexps now match against entire string
19
+
20
+ ## [0.1.1] - 2021-08-21
21
+
22
+ - Improve identification of ambiguous Chinese
23
+
3
24
  ## [0.1.0] - 2021-08-21
4
25
 
5
26
  - Initial release
data/Gemfile.lock CHANGED
@@ -1,14 +1,14 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- script_detector_2 (0.1.0)
4
+ script_detector_2 (0.4.0)
5
5
 
6
6
  GEM
7
7
  remote: https://rubygems.org/
8
8
  specs:
9
9
  ast (2.4.2)
10
10
  backport (1.2.0)
11
- benchmark (0.1.1)
11
+ benchmark (0.2.0)
12
12
  byebug (11.1.3)
13
13
  diff-lcs (1.4.4)
14
14
  e2mmap (0.1.0)
@@ -18,32 +18,32 @@ GEM
18
18
  kramdown-parser-gfm (1.1.0)
19
19
  kramdown (~> 2.0)
20
20
  minitest (5.14.4)
21
- nokogiri (1.12.3-x86_64-darwin)
21
+ nokogiri (1.12.5-x86_64-darwin)
22
22
  racc (~> 1.4)
23
- parallel (1.20.1)
23
+ parallel (1.21.0)
24
24
  parser (3.0.2.0)
25
25
  ast (~> 2.4.1)
26
- racc (1.5.2)
26
+ racc (1.6.0)
27
27
  rainbow (3.0.0)
28
28
  rake (13.0.6)
29
29
  regexp_parser (2.1.1)
30
- reverse_markdown (2.0.0)
30
+ reverse_markdown (2.1.1)
31
31
  nokogiri
32
32
  rexml (3.2.5)
33
- rubocop (1.19.1)
33
+ rubocop (1.23.0)
34
34
  parallel (~> 1.10)
35
35
  parser (>= 3.0.0.0)
36
36
  rainbow (>= 2.2.2, < 4.0)
37
37
  regexp_parser (>= 1.8, < 3.0)
38
38
  rexml
39
- rubocop-ast (>= 1.9.1, < 2.0)
39
+ rubocop-ast (>= 1.12.0, < 2.0)
40
40
  ruby-progressbar (~> 1.7)
41
41
  unicode-display_width (>= 1.4.0, < 3.0)
42
- rubocop-ast (1.10.0)
42
+ rubocop-ast (1.13.0)
43
43
  parser (>= 3.0.1.1)
44
44
  ruby-progressbar (1.11.0)
45
45
  rubyzip (2.3.2)
46
- solargraph (0.43.0)
46
+ solargraph (0.44.2)
47
47
  backport (~> 1.2)
48
48
  benchmark
49
49
  bundler (>= 1.17.2)
@@ -60,7 +60,7 @@ GEM
60
60
  yard (~> 0.9, >= 0.9.24)
61
61
  thor (1.1.0)
62
62
  tilt (2.0.10)
63
- unicode-display_width (2.0.0)
63
+ unicode-display_width (2.1.0)
64
64
  yard (0.9.26)
65
65
 
66
66
  PLATFORMS
@@ -76,4 +76,4 @@ DEPENDENCIES
76
76
  solargraph
77
77
 
78
78
  BUNDLED WITH
79
- 2.2.26
79
+ 2.2.32
data/README.md CHANGED
@@ -12,7 +12,7 @@ Unlike the original script_detector, this gem:
12
12
  - Uses the
13
13
  [kUnihanCore2020](https://www.unicode.org/reports/tr38/#kUnihanCore2020)
14
14
  property of the Unicode Unihan database to determine which characters belong
15
- to which script (Unicode 13)
15
+ to which script (Unicode 14)
16
16
  ([details](http://www.unicode.org/L2/L2019/19388-unihan-core-2020.pdf))
17
17
  - Uses [ISO 15924 script names](https://en.wikipedia.org/wiki/ISO_15924) in
18
18
  symbol form as return values (instead of English strings)
@@ -42,6 +42,7 @@ The main detection methods are:
42
42
  - `ScriptDetector2.simplified_chinese?`
43
43
  - `ScriptDetector2.traditional_chinese?`
44
44
  - `ScriptDetector2.identify_script`
45
+ - `ScriptDetector2.identify_scripts`
45
46
 
46
47
  Regexp patterns are used to identify the script to which Han characters belong.
47
48
  These can be used directly as well:
@@ -55,6 +56,15 @@ These can be used directly as well:
55
56
  - `ScriptDetector2::KOREAN_PATTERN`: matches all Han characters in the
56
57
  kUnihanCore2020 set marked as ROK (K) or DPRK (P)
57
58
 
59
+ Each of the above patterns matches an entire string containing only Han
60
+ characters of the indicated script, i.e.
61
+
62
+ ```ruby
63
+ ScriptDetector2::JAPANESE_PATTERN.match?('日本語') # => true
64
+ ScriptDetector2::JAPANESE_PATTERN.match?('你好') # => false
65
+ ScriptDetector2::JAPANESE_PATTERN.match?('Hello 日本語') # => false
66
+ ```
67
+
58
68
  To recreate the script_detector gem's extension of the String class, use the
59
69
  supplied refinement like so:
60
70