auto-correct 0.2.1 → 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 0601f319b91d591f58b600938a273a7f2865b98c836271718addbd1113ba0ee0
4
- data.tar.gz: 36acc277500cc855010af480979dae9c47b7fa941703d5459a2c302050bbc86a
3
+ metadata.gz: dc4ebaa39c033494ca5d6e7b704a7563755f62ea2a120000f012908780a70375
4
+ data.tar.gz: 33a17e83502ced4f06c0a11b222038d036e46c235ddf6b4bda68131dcaf58e84
5
5
  SHA512:
6
- metadata.gz: e08ebc0e999f36ecd8b64fd82e6433044c99d53a47c80f1114557a483958647454888bfd577729805c140416e5160f0acc9ad302dbfec090807bfff919de051a
7
- data.tar.gz: 358a903572c678ea9b233c69d71e714dd5265d97d7c644c8214e838f7928656b7a86029096a1fc9181489c684d884587002593d6b8bb4c58a96d433dd8cab7ec
6
+ metadata.gz: f88c421b4f2c5cf7fc16063778d57d4f6e3b4aa7eef8fe51075cec0df20fc539d760de99c37ddfc2557069836cbffb310087a2accbce27494d15edfe1748a742
7
+ data.tar.gz: 2e4bef9e663b45553e1a930a3d16c30283f8ed091f34bc26416d1658c7e45e36b4643cb15686e906adbc771e18c2d1befd07e8268742f8bd71409f47d24571b5
data/README.md CHANGED
@@ -1,21 +1,20 @@
1
1
  # auto-correct
2
2
 
3
- Automatically add spaces between Chinese and English words.
3
+ Automatically add whitespace between CJK (Chinese, Japanese, Korean) and half-width characters (alphabetical letters, numerical digits and symbols).
4
4
 
5
- 中文英文混排自动纠正补充空格,此方式已在 Ruby China 使用多年,支持 HTML 处理。
6
-
7
- [![Gem Version](https://badge.fury.io/rb/auto-correct.svg)](https://rubygems.org/gems/auto-correct) [![Build
8
- Status](https://api.travis-ci.org/huacnlee/auto-correct.svg?branch=master&.svg)](http://travis-ci.org/huacnlee/auto-correct)
5
+ 中文、日语、韩语 + 英文混排自动纠正补充空格,此方式已在 Ruby China 使用多年,支持 HTML 处理。
9
6
 
7
+ [![Gem Version](https://badge.fury.io/rb/auto-correct.svg)](https://rubygems.org/gems/auto-correct) [![build](https://github.com/huacnlee/auto-correct/workflows/build/badge.svg)](https://github.com/huacnlee/auto-correct/actions?query=workflow%3Abuild)
10
8
 
11
9
  ## Other implements
12
10
 
13
- - [auto-correct](https://github.com/huacnlee/auto-correct) - Ruby
14
- - [go-auto-correct](https://github.com/huacnlee/go-auto-correct) - Go
11
+ - Ruby - [auto-correct](https://github.com/huacnlee/auto-correct).
12
+ - Go - [go-auto-correct](https://github.com/huacnlee/go-auto-correct).
13
+ - Rust - [autocorrect](https://github.com/huacnlee/autocorrect).
15
14
 
16
15
  ## Features
17
16
 
18
- - Auto add spacings between Chinese and English words.
17
+ - Auto add spacings between CJK (Chinese) and English words.
19
18
  - HTML content support.
20
19
 
21
20
  [Examples](https://github.com/huacnlee/auto-correct/blob/master/test/format_test.rb)
@@ -35,7 +34,16 @@ AutoCorrect.format("于3月10日开始")
35
34
  # => "于 3 月 10 日开始"
36
35
 
37
36
  AutoCorrect.format("包装日期为2013年3月10日")
38
- # => "包装日期为2013年3月10日"
37
+ # => "包装日期为 2013 3 10 日"
38
+
39
+ AutoCorrect.format("生产环境中使用Ruby")
40
+ # => "生产环境中使用 Ruby"
41
+
42
+ AutoCorrect.format("本番環境でRubyを使用する")
43
+ # => "本番環境で Ruby を使用する"
44
+
45
+ AutoCorrect.format("프로덕션환경에서Ruby사용")
46
+ # => "프로덕션환경에서 Ruby 사용"
39
47
  ```
40
48
 
41
49
  `AutoCorrect.format_html` method for HTML content.
@@ -47,15 +55,40 @@ AutoCorrect.format_html("<div><p>长桥LongBridge App下载</p><p>最新版本1.
47
55
 
48
56
  ## Benchmark
49
57
 
50
- TODO
58
+ Run `rake bench` to test:
59
+
60
+ ```
61
+ Warming up --------------------------------------
62
+ format 50 chars 1.886k i/100ms
63
+ format 100 chars 1.060k i/100ms
64
+ format 400 chars 342.000 i/100ms
65
+ format_html 85.000 i/100ms
66
+ Calculating -------------------------------------
67
+ format 50 chars 18.842k (± 1.5%) i/s - 94.300k in 5.005815s
68
+ format 100 chars 10.357k (± 1.8%) i/s - 51.940k in 5.016770s
69
+ format 400 chars 3.336k (± 2.2%) i/s - 16.758k in 5.026230s
70
+ format_html 839.761 (± 2.1%) i/s - 4.250k in 5.063225s
71
+ ```
72
+
73
+ | Total chars | Duration |
74
+ | ----------- | -------- |
75
+ | 50 | 0.33 ms |
76
+ | 100 | 0.60 ms |
77
+ | 400 | 2 ms |
78
+
79
+ ### FormatHTML
80
+
81
+ | Total chars | Duration |
82
+ | ----------- | -------- |
83
+ | 2K | 7 ms |
51
84
 
52
85
  ## Use cases
53
86
 
54
- * [Ruby China](https://ruby-china.org) - 目前整站的标题都做了自动转换处理。
87
+ - [Ruby China](https://ruby-china.org) - 目前整站都做了自动转换处理。
55
88
 
56
89
  ## Links
57
90
 
58
- * [Chinese Copywriting Guidelines](https://github.com/sparanoid/chinese-copywriting-guidelines)
91
+ - [Chinese Copywriting Guidelines](https://github.com/sparanoid/chinese-copywriting-guidelines)
59
92
 
60
93
  ## License
61
94
 
data/lib/auto-correct.rb CHANGED
@@ -2,10 +2,7 @@ require "auto-correct/strategery"
2
2
  require "auto-correct/base"
3
3
  require "auto-correct/format"
4
4
  require "auto-correct/html"
5
- require "auto-correct/string"
6
5
  require "auto-correct/version"
7
6
 
8
7
  class AutoCorrect
9
8
  end
10
-
11
- String.send :include, AutoCorrect::String
@@ -1,36 +1,40 @@
1
1
  class AutoCorrect
2
+ CJK = '\p{Han}|\p{Hangul}|\p{Hanunoo}|\p{Katakana}|\p{Hiragana}|\p{Bopomofo}'
3
+ SPACE = "[ ]"
4
+
2
5
  # rubocop:disable Style/StringLiterals
3
6
  # EnglishLetter
4
- rule '\p{Han}', '[0-9a-zA-Z]', space: true, reverse: true
7
+ rule CJK.to_s, '[a-zA-Z0-9]', space: true, reverse: true
5
8
 
6
9
  # SpecialSymbol
7
- rule '\p{Han}', '[\|+$@#*]', space: true, reverse: true
8
- rule '\p{Han}', '[\[\(‘“]', space: true
9
- rule '[’”\]\)!%]', '\p{Han}', space: true
10
+ rule CJK.to_s, '[\|+*]', space: true, reverse: true
11
+ rule CJK.to_s, '[@]', space: true, reverse: false
12
+ rule CJK.to_s, '[\[\(‘“]', space: true
13
+ rule '[’”\]\)!%]', CJK.to_s, space: true
10
14
  rule '[”\]\)!]', '[a-zA-Z0-9]+', space: true
11
15
 
12
- # FullwidthPunctuation
13
- rule '[\w\p{Han}]', '[,。!?:;」》】”’]', reverse: true
14
- rule '[‘“【「《]', '[\w\p{Han}]', reverse: true
16
+ # FullwidthPunctuation remove space case, Fullwidth can safe to remove spaces
17
+ rule %r{[\w#{CJK}]}o, '[,。!?:;)」》】”’]', reverse: true
18
+ rule '[‘“【「《(]', %r{[\w#{CJK}]}o, reverse: true
15
19
 
16
20
  class << self
17
- FULLDATE_RE = /[\s]{0,}\d+[\s]{0,}年[\s]{0,}\d+[\s]{0,}月[\s]{0,}\d+[\s]{0,}[日号][\s]{0,}/u
21
+ DASH_HAN_RE = /([#{CJK})】」》”’])(-+)([#{CJK}(【「《“‘])/
22
+ LEFT_QUOTE_RE = /#{SPACE}([(【「《])/
23
+ RIGHT_QUOTE_RE = /([)】」》])#{SPACE}/
18
24
 
19
25
  def format(str)
20
- out = str
21
- self.strategies.each do |s|
22
- out = s.format(out)
26
+ strategies.each do |s|
27
+ str = s.format(str)
23
28
  end
24
- out = remove_full_date_spacing(out)
25
- out
29
+ space_dash_with_hans(str)
26
30
  end
27
31
 
28
32
  private
29
33
 
30
- def remove_full_date_spacing(str)
31
- str.gsub(FULLDATE_RE) do |m|
32
- m.gsub(/\s+/, "")
33
- end
34
- end
34
+ def space_dash_with_hans(str)
35
+ str = str.gsub(DASH_HAN_RE, '\1 \2 \3')
36
+ str = str.gsub(LEFT_QUOTE_RE, '\1')
37
+ str.gsub(RIGHT_QUOTE_RE, '\1')
38
+ end
35
39
  end
36
40
  end
@@ -13,8 +13,8 @@ class AutoCorrect
13
13
  ]
14
14
 
15
15
  @remove_space_rules = [
16
- /(#{one})\s+(#{other})/u,
17
- /(#{other})\s+(#{one})/u
16
+ /(#{one})#{SPACE}+(#{other})/u,
17
+ /(#{other})#{SPACE}+(#{one})/u
18
18
  ]
19
19
  end
20
20
 
@@ -1,3 +1,3 @@
1
1
  class AutoCorrect
2
- VERSION = "0.2.1"
2
+ VERSION = "1.0.0"
3
3
  end
metadata CHANGED
@@ -1,15 +1,15 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: auto-correct
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.2.1
4
+ version: 1.0.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Luikore
8
8
  - Jason Lee
9
- autorequire:
9
+ autorequire:
10
10
  bindir: bin
11
11
  cert_chain: []
12
- date: 2020-03-06 00:00:00.000000000 Z
12
+ date: 2021-07-27 00:00:00.000000000 Z
13
13
  dependencies:
14
14
  - !ruby/object:Gem::Dependency
15
15
  name: nokogiri
@@ -40,12 +40,11 @@ files:
40
40
  - lib/auto-correct/format.rb
41
41
  - lib/auto-correct/html.rb
42
42
  - lib/auto-correct/strategery.rb
43
- - lib/auto-correct/string.rb
44
43
  - lib/auto-correct/version.rb
45
44
  homepage: https://github.com/huacnlee/auto-correct
46
45
  licenses: []
47
46
  metadata: {}
48
- post_install_message:
47
+ post_install_message:
49
48
  rdoc_options: []
50
49
  require_paths:
51
50
  - lib
@@ -60,8 +59,8 @@ required_rubygems_version: !ruby/object:Gem::Requirement
60
59
  - !ruby/object:Gem::Version
61
60
  version: '0'
62
61
  requirements: []
63
- rubygems_version: 3.0.3
64
- signing_key:
62
+ rubygems_version: 3.2.3
63
+ signing_key:
65
64
  specification_version: 4
66
65
  summary: Automatically add whitespace between Chinese and and half-width characters
67
66
  (alphabetical letters, numerical digits and symbols).
@@ -1,13 +0,0 @@
1
- class AutoCorrect
2
- module String
3
- def auto_space!
4
- ActiveSupport::Deprecation.warn("String.auto_space! is deprecated and will be removed in auto-corrrect 1.0, please use AutoCorrect.format instead.")
5
- self.sub!(self, AutoCorrect.format(self))
6
- end
7
-
8
- def auto_correct!
9
- ActiveSupport::Deprecation.warn("String.auto_correct! is deprecated and will be removed in auto-corrrect 1.0, please use AutoCorrect.format instead.")
10
- self.sub!(self, AutoCorrect.format(self))
11
- end
12
- end
13
- end