auto-correct 0.1.0.pre0 → 0.3.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
- SHA1:
3
- metadata.gz: 764ae2dccd3a0b65e9e5b071a3568620ade98d52
4
- data.tar.gz: 01a7ea3ed26327875a22530df5722af6a5253f14
2
+ SHA256:
3
+ metadata.gz: 12a8889fb7704f8233e1d83e029b0ae04565674396d549744e8c5a6d21156b4d
4
+ data.tar.gz: 042ab0e095196c62bc50f66bb9014fd71c93bcf48a815ca711e8c14e147b8a78
5
5
  SHA512:
6
- metadata.gz: 8253513e4b424534a08ef122023b2d7c03c1f1fe518f4028bd0658e87884eee5bba95e9f002b0dd915c0f57bf08e8fc7f06c6710c15cd3a1020ec8f55a0ef494
7
- data.tar.gz: 00fb083fcf99ddfbd779a57ad0a847ffc41e7dbff551ada550a35ba6a8118d50290af6c880897e97c549f2bb0425a56b54a44de7b14b356114990014863205cf
6
+ metadata.gz: 167d2641cf2f49f7b562e96d1c6ac38c27c8d212f07af3a42fc210fab4297635f1e82a67bc0e56026a984c3d4ff15b358bf0efe0d004aecbd6071fb414a59868
7
+ data.tar.gz: 4a2c9591624888b1ae8d1c5ece57a85a06d38f4f533e265fb0ac011093dfff244d549406762a01a2d87f523cafa7fa56960551fec06d6ccb6398b335f0268cbe
data/README.md CHANGED
@@ -1,62 +1,97 @@
1
1
  # auto-correct
2
2
 
3
- 自动纠正中文英文混排是一些不够好的写法,纠正错误的名词大小写。
3
+ Automatically add whitespace between CJK (Chinese, Japanese, Korean) and half-width characters (alphabetical letters, numerical digits and symbols).
4
4
 
5
- Before
5
+ 中文、日语、韩语 + 英文混排自动纠正补充空格,此方式已在 Ruby China 使用多年,支持 HTML 处理。
6
6
 
7
- ```
8
- [经验之谈]转行做ruby程序员的8个月, mysql 经验
9
- ```
7
+ [![Gem Version](https://badge.fury.io/rb/auto-correct.svg)](https://rubygems.org/gems/auto-correct) [![Build
8
+ Status](https://api.travis-ci.org/huacnlee/auto-correct.svg?branch=master&.svg)](http://travis-ci.org/huacnlee/auto-correct)
10
9
 
11
- After
12
10
 
13
- ```
14
- [经验之谈] 转行做 Ruby 程序员的 8 个月, MySQL 经验
15
- ```
11
+ ## Other implements
12
+
13
+ - Ruby - [auto-correct](https://github.com/huacnlee/auto-correct).
14
+ - Go - [go-auto-correct](https://github.com/huacnlee/go-auto-correct).
15
+ - Rust - [auto-correct.rs](https://github.com/huacnlee/auto-correct.rs).
16
+
17
+ ## Features
18
+
19
+ - Auto add spacings between CJK (Chinese) and English words.
20
+ - HTML content support.
21
+
22
+ [Examples](https://github.com/huacnlee/auto-correct/blob/master/test/format_test.rb)
16
23
 
17
- [![Gem Version](https://badge.fury.io/rb/auto-space.png)](https://rubygems.org/gems/auto-space) [![Build
18
- Status](https://secure.travis-ci.org/huacnlee/auto-space.png?branch=master&.png)](http://travis-ci.org/huacnlee/auto-space)
24
+ ## Usage
19
25
 
20
- ## 使用说明
26
+ `AutoCorrect.format` method for plain text.
21
27
 
22
- ```irb
23
- irb> require 'auto-correct'
24
- true
28
+ ```ruby
29
+ AutoCorrect.format("那里找到Ruby China App下载地址")
30
+ # => "那里找到 Ruby China App 下载地址"
25
31
 
26
- irb> "关于SSH连接的Permission denied(publickey).".auto_space!
27
- 关于 SSH 连接的 Permission denied (publickey).
32
+ AutoCorrect.format("Ruby 2.7版本第1次发布")
33
+ # => "Ruby 2.7 版本第 1 次发布"
28
34
 
29
- irb> "怎样追踪一个repo的新feature 和进展呢?".auto_space!
30
- 怎样追踪一个 repo 的新 feature 和进展呢?
35
+ AutoCorrect.format("于3月10日开始")
36
+ # => "于 3 月 10 日开始"
31
37
 
32
- irb> "vps上sessions不生效,但在本地的环境是ok的,why?".auto_space!
33
- vps sessions 不生效,但在本地的环境是 OK 的,why?
38
+ AutoCorrect.format("包装日期为2013年3月10日")
39
+ # => "包装日期为2013年3月10日"
34
40
 
35
- irb> "bootstrap control-group对齐问题".auto_space!
36
- bootstrap control-group 对齐问
41
+ AutoCorrect.format("生产环境中使用Ruby")
42
+ # => "生产环境中使用 Ruby"
43
+
44
+ AutoCorrect.format("本番環境でRubyを使用する")
45
+ # => "本番環境で Ruby を使用する"
46
+
47
+ AutoCorrect.format("프로덕션환경에서Ruby사용")
48
+ # => "프로덕션환경에서 Ruby 사용"
49
+ ```
50
+
51
+ `AutoCorrect.format_html` method for HTML content.
52
+
53
+ ```ruby
54
+ AutoCorrect.format_html("<div><p>长桥LongBridge App下载</p><p>最新版本1.0</p></div>")
55
+ # => "<div><p>长桥 LongBridge App 下载</p><p>最新版本 1.0</p></div>"
37
56
  ```
38
57
 
39
- ## 性能
58
+ ## Benchmark
40
59
 
41
- 详见 Rakefile
60
+ Run `rake bench` to test:
42
61
 
43
62
  ```
44
- $ rake benchmark
45
- user system total real
46
- 100 times 0.000000 0.000000 0.000000 ( 0.002223)
47
- 1000 times 0.030000 0.000000 0.030000 ( 0.024711)
48
- 10000 times 0.230000 0.000000 0.230000 ( 0.240850)
63
+ Warming up --------------------------------------
64
+ format 50 chars 1.886k i/100ms
65
+ format 100 chars 1.060k i/100ms
66
+ format 400 chars 342.000 i/100ms
67
+ format_html 85.000 i/100ms
68
+ Calculating -------------------------------------
69
+ format 50 chars 18.842k (± 1.5%) i/s - 94.300k in 5.005815s
70
+ format 100 chars 10.357k (± 1.8%) i/s - 51.940k in 5.016770s
71
+ format 400 chars 3.336k (± 2.2%) i/s - 16.758k in 5.026230s
72
+ format_html 839.761 (± 2.1%) i/s - 4.250k in 5.063225s
49
73
  ```
50
74
 
51
- ## TODO
75
+ | Total chars | Duration |
76
+ | ----- | ------- |
77
+ | 50 | 0.33 ms |
78
+ | 100 | 0.60 ms |
79
+ | 400 | 2 ms |
80
+
81
+ ### FormatHTML
52
82
 
53
- * 'Foo'的"Bar" -> 'Foo' "Bar"
54
- * 什么,时候 -> 什么, 时候 -> 什么,时候
83
+ | Total chars | Duration |
84
+ | ----- | ------- |
85
+ | 2K | 7 ms |
55
86
 
56
- ## 应用案例
87
+ ## Use cases
57
88
 
58
- * [Ruby China](http://ruby-china.org) - 目前整站的标题都做了自动转换处理。
89
+ * [Ruby China](https://ruby-china.org) - 目前整站的标题都做了自动转换处理。
59
90
 
60
- ## 参考内容
91
+ ## Links
61
92
 
62
93
  * [Chinese Copywriting Guidelines](https://github.com/sparanoid/chinese-copywriting-guidelines)
94
+
95
+ ## License
96
+
97
+ This project under MIT license.
@@ -1,40 +1,11 @@
1
- # coding: utf-8
2
- require "auto-correct/dicts"
1
+ require "auto-correct/strategery"
2
+ require "auto-correct/base"
3
+ require "auto-correct/format"
4
+ require "auto-correct/html"
5
+ require "auto-correct/string"
6
+ require "auto-correct/version"
3
7
 
4
- class String
5
- def auto_space!
6
- self.gsub! /((?![年月日号])\p{Han})([a-zA-Z0-9+$@#\[\(\/‘“])/u do
7
- "#$1 #$2"
8
- end
9
-
10
- self.gsub! /([a-zA-Z0-9+$’”\]\)@#!\/]|[\d[年月日]]{2,})((?![年月日号])\p{Han})/u do
11
- "#$1 #$2"
12
- end
13
-
14
- # Fix () [] near the English and number
15
- self.gsub! /([a-zA-Z0-9]+)([\[\(‘“])/u do
16
- "#$1 #$2"
17
- end
18
-
19
- self.gsub! /([\)\]’”])([a-zA-Z0-9]+)/u do
20
- "#$1 #$2"
21
- end
22
-
23
- self
24
- end
25
-
26
- def auto_correct!
27
- self.auto_space!
28
-
29
- self.gsub! /([\d\p{Han}]|\s|^)([a-zA-Z\d\-\_\.]+)([\d\p{Han}]|\s|$)/u do
30
- key = "#$2".downcase
31
- if AutoCorrect::DICTS.has_key?(key)
32
- ["#$1",AutoCorrect::DICTS[key],"#$3"].join("")
33
- else
34
- "#$1#$2#$3"
35
- end
36
- end
37
-
38
- self
39
- end
8
+ class AutoCorrect
40
9
  end
10
+
11
+ String.send :include, AutoCorrect::String
@@ -0,0 +1,13 @@
1
+ class AutoCorrect
2
+ @@strategies = []
3
+
4
+ class << self
5
+ def rule(one, other, space: false, reverse: false)
6
+ @@strategies << AutoCorrect::Strategery.new(one, other, space: space, reverse: reverse)
7
+ end
8
+
9
+ def strategies
10
+ @@strategies
11
+ end
12
+ end
13
+ end
@@ -0,0 +1,50 @@
1
+ class AutoCorrect
2
+ CJK = '\p{Han}|\p{Hangul}|\p{Hanunoo}|\p{Katakana}|\p{Hiragana}|\p{Bopomofo}'
3
+ SPACE = "[ ]"
4
+
5
+ # rubocop:disable Style/StringLiterals
6
+ # EnglishLetter
7
+ rule "#{CJK}", '[0-9a-zA-Z]', space: true, reverse: true
8
+
9
+ # SpecialSymbol
10
+ rule "#{CJK}", '[\|+$@#*]', space: true, reverse: true
11
+ rule "#{CJK}", '[\[\(‘“]', space: true
12
+ rule '[’”\]\)!%]', "#{CJK}", space: true
13
+ rule '[”\]\)!]', '[a-zA-Z0-9]+', space: true
14
+
15
+ # FullwidthPunctuation
16
+ rule %r([\w#{CJK}]), '[,。!?:;」》】”’]', reverse: true
17
+ rule '[‘“【「《]', %r([\w#{CJK}]), reverse: true
18
+
19
+ class << self
20
+ FULLDATE_RE = /#{SPACE}{0,}\d+#{SPACE}{0,}年#{SPACE}{0,}\d+#{SPACE}{0,}月#{SPACE}{0,}\d+#{SPACE}{0,}[日号]#{SPACE}{0,}/u
21
+ DASH_HAN_RE = /([#{CJK})】」》”’])([\-]+)([#{CJK}(【「《“‘])/
22
+ LEFT_QUOTE_RE = /#{SPACE}([(【「《])/
23
+ RIGHT_QUOTE_RE = /([)】」》])#{SPACE}/
24
+
25
+ def format(str)
26
+ out = str
27
+ self.strategies.each do |s|
28
+ out = s.format(out)
29
+ end
30
+ out = remove_full_date_spacing(out)
31
+ out = space_dash_with_hans(out)
32
+ out
33
+ end
34
+
35
+ private
36
+
37
+ def remove_full_date_spacing(str)
38
+ str.gsub(FULLDATE_RE) do |m|
39
+ m.gsub(/\s+/, "")
40
+ end
41
+ end
42
+
43
+ def space_dash_with_hans(str)
44
+ str = str.gsub(DASH_HAN_RE, '\1 \2 \3')
45
+ str = str.gsub(LEFT_QUOTE_RE, '\1')
46
+ str = str.gsub(RIGHT_QUOTE_RE, '\1')
47
+ str
48
+ end
49
+ end
50
+ end
@@ -0,0 +1,14 @@
1
+ require "nokogiri"
2
+
3
+ class AutoCorrect
4
+ class << self
5
+ def format_html(html)
6
+ doc = Nokogiri::HTML(html)
7
+ doc.traverse do |node|
8
+ next unless node.node_type == Nokogiri::XML::Node::TEXT_NODE
9
+ node.content = AutoCorrect.format(node.content)
10
+ end
11
+ doc.css("body").inner_html
12
+ end
13
+ end
14
+ end
@@ -0,0 +1,43 @@
1
+ class AutoCorrect
2
+ class Strategery
3
+ attr_reader :space, :reverse
4
+ attr_reader :add_space_rules, :remove_space_rules
5
+
6
+ def initialize(one, other, space: false, reverse: false)
7
+ @space = space
8
+ @reverse = reverse
9
+
10
+ @add_space_rules = [
11
+ /(#{one})(#{other})/u,
12
+ /(#{other})(#{one})/u
13
+ ]
14
+
15
+ @remove_space_rules = [
16
+ /(#{one})#{SPACE}+(#{other})/u,
17
+ /(#{other})#{SPACE}+(#{one})/u
18
+ ]
19
+ end
20
+
21
+ def format(str)
22
+ self.space ? add_space(str) : remove_space(str)
23
+ end
24
+
25
+ def add_space(str)
26
+ r0, r1 = add_space_rules
27
+ str = str.gsub(r0) { "#$1 #$2" }
28
+ if self.reverse
29
+ str = str.gsub(r1) { "#$1 #$2" }
30
+ end
31
+ str
32
+ end
33
+
34
+ def remove_space(str)
35
+ r0, r1 = remove_space_rules
36
+ str = str.gsub(r0) { "#$1 #$2" }
37
+ if self.reverse
38
+ str = str.gsub(r1) { "#$1 #$2" }
39
+ end
40
+ str
41
+ end
42
+ end
43
+ end
@@ -0,0 +1,13 @@
1
+ class AutoCorrect
2
+ module String
3
+ def auto_space!
4
+ ActiveSupport::Deprecation.warn("String.auto_space! is deprecated and will be removed in auto-corrrect 1.0, please use AutoCorrect.format instead.")
5
+ self.sub!(self, AutoCorrect.format(self))
6
+ end
7
+
8
+ def auto_correct!
9
+ ActiveSupport::Deprecation.warn("String.auto_correct! is deprecated and will be removed in auto-corrrect 1.0, please use AutoCorrect.format instead.")
10
+ self.sub!(self, AutoCorrect.format(self))
11
+ end
12
+ end
13
+ end
@@ -0,0 +1,3 @@
1
+ class AutoCorrect
2
+ VERSION = "0.3.1"
3
+ end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: auto-correct
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.0.pre0
4
+ version: 0.3.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - Luikore
@@ -9,23 +9,24 @@ authors:
9
9
  autorequire:
10
10
  bindir: bin
11
11
  cert_chain: []
12
- date: 2014-07-15 00:00:00.000000000 Z
12
+ date: 2020-08-07 00:00:00.000000000 Z
13
13
  dependencies:
14
14
  - !ruby/object:Gem::Dependency
15
- name: activesupport
15
+ name: nokogiri
16
16
  requirement: !ruby/object:Gem::Requirement
17
17
  requirements:
18
- - - ">"
18
+ - - ">="
19
19
  - !ruby/object:Gem::Version
20
- version: 3.0.0
20
+ version: '1.4'
21
21
  type: :runtime
22
22
  prerelease: false
23
23
  version_requirements: !ruby/object:Gem::Requirement
24
24
  requirements:
25
- - - ">"
25
+ - - ">="
26
26
  - !ruby/object:Gem::Version
27
- version: 3.0.0
28
- description: "自动给中文英文之间加入合理的空格"
27
+ version: '1.4'
28
+ description: Automatically add whitespace between Chinese and and half-width characters
29
+ (alphabetical letters, numerical digits and symbols).
29
30
  email:
30
31
  - usurffx@gmail.com
31
32
  - huacnlee@gmail.com
@@ -35,7 +36,12 @@ extra_rdoc_files: []
35
36
  files:
36
37
  - README.md
37
38
  - lib/auto-correct.rb
38
- - lib/auto-correct/dicts.rb
39
+ - lib/auto-correct/base.rb
40
+ - lib/auto-correct/format.rb
41
+ - lib/auto-correct/html.rb
42
+ - lib/auto-correct/strategery.rb
43
+ - lib/auto-correct/string.rb
44
+ - lib/auto-correct/version.rb
39
45
  homepage: https://github.com/huacnlee/auto-correct
40
46
  licenses: []
41
47
  metadata: {}
@@ -50,14 +56,13 @@ required_ruby_version: !ruby/object:Gem::Requirement
50
56
  version: '0'
51
57
  required_rubygems_version: !ruby/object:Gem::Requirement
52
58
  requirements:
53
- - - ">"
59
+ - - ">="
54
60
  - !ruby/object:Gem::Version
55
- version: 1.3.1
61
+ version: '0'
56
62
  requirements: []
57
- rubyforge_project:
58
- rubygems_version: 2.2.2
63
+ rubygems_version: 3.0.3
59
64
  signing_key:
60
65
  specification_version: 4
61
- summary: "自动给中文英文之间加入合理的空格"
66
+ summary: Automatically add whitespace between Chinese and and half-width characters
67
+ (alphabetical letters, numerical digits and symbols).
62
68
  test_files: []
63
- has_rdoc:
@@ -1,103 +0,0 @@
1
- module AutoCorrect
2
- DICTS = {
3
- # Ruby
4
- "ruby" => "Ruby",
5
- "rails" => "Rails",
6
- "rubygems" => "RubyGems",
7
- "ror" => "Ruby on Rails",
8
- "rubyconf" => "RubyConf",
9
- "railsconf" => "RailsConf",
10
- "rubytuesday" => "Ruby Tuesday",
11
- "jruby" => "JRuby",
12
- "mruby" => "mRuby",
13
- "rvm" => "RVM",
14
- "rbenv" => "rbenv",
15
- "yard" => "YARD",
16
- "rdoc" => "RDoc",
17
- "rspec" => "RSpec",
18
- "minitest" => "MiniTest",
19
- "coffeescript" => "CoffeeScript",
20
- "scss" => "SCSS",
21
- "sass" => "Sass",
22
- "sidekiq" => "Sidekiq",
23
- "railscasts" => "RailsCasts",
24
- "execjs" => "ExecJS",
25
-
26
- # Python
27
-
28
- # Node.js
29
- "nodejs" => "Node.js",
30
-
31
- # Go
32
-
33
- # Cocoa
34
- "reactivecocoa" => "ReactiveCocoa",
35
-
36
- # Programming
37
- "ssh" => "SSH",
38
- "css" => "CSS",
39
- "html" => "HTML",
40
- "javascript" => "JavaScript",
41
- "js" => "JS",
42
- "png" => "PNG",
43
- "dsl" => "DSL",
44
- "tdd" => "TDD",
45
- "bdd" => "BDD",
46
-
47
- # Sites
48
- "github" => "GitHub",
49
- "gist" => "Gist",
50
- "ruby_china" => "Ruby China",
51
- "ruby-china" => "Ruby China",
52
- "rubychina" => "Ruby China",
53
- "v2ex" => "V2EX",
54
- "heroku" => "Heroku",
55
- "stackoverflow" => "Stack Overflow",
56
- "stackexchange" => "StackExchange",
57
-
58
-
59
- # Databases
60
- "mysql" => "MySQL",
61
- "postgresql" => "PostgreSQL",
62
- "sqlite" => "SQLite",
63
- "mongodb" => "MongoDB",
64
- "rethinkdb" => "RethinkDB",
65
- "elasticsearch" => "Elasticsearch",
66
- "sphinx" => "Sphinx",
67
-
68
- # OpenSource Projects
69
- "gitlab" => "GitLab",
70
- "gitlabci" => "GitLab CI",
71
- "fontawsome" => "FontAwsome",
72
- "bootstrap" => "Bootstrap",
73
- "less" => "Less",
74
- "jquery" => "jQuery",
75
- "requirejs" => "RequireJS",
76
- "underscore" => "Underscore",
77
- "backbone" => "Backbone",
78
- "seajs" => "SeaJS",
79
- "imagemagick" => "ImageMagick",
80
-
81
- # Tools
82
- "vim" => "VIM",
83
- "emacs" => "Emacs",
84
- "textmate" => "TextMate",
85
- "sublime" => "Sublime",
86
- "rubymine" => "RubyMine",
87
- "sequelpro" => "Sequel Pro",
88
- "virtualbox" => "VirtualBox",
89
- "safari" => "Safari",
90
- "chrome" => "Chrome",
91
- "ie" => "IE",
92
-
93
- # Misc
94
- "ios" => "iOS",
95
- "iphone" => "iPhone",
96
- "android" => "Android",
97
- "osx" => "OS X",
98
- "mac" => "Mac",
99
- "api" => "API",
100
- "wi-fi" => "Wi-Fi",
101
- "wifi" => "Wi-Fi"
102
- }
103
- end