auto-correct 0.1.0.pre0 → 0.3.1

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
- SHA1:
3
- metadata.gz: 764ae2dccd3a0b65e9e5b071a3568620ade98d52
4
- data.tar.gz: 01a7ea3ed26327875a22530df5722af6a5253f14
2
+ SHA256:
3
+ metadata.gz: 12a8889fb7704f8233e1d83e029b0ae04565674396d549744e8c5a6d21156b4d
4
+ data.tar.gz: 042ab0e095196c62bc50f66bb9014fd71c93bcf48a815ca711e8c14e147b8a78
5
5
  SHA512:
6
- metadata.gz: 8253513e4b424534a08ef122023b2d7c03c1f1fe518f4028bd0658e87884eee5bba95e9f002b0dd915c0f57bf08e8fc7f06c6710c15cd3a1020ec8f55a0ef494
7
- data.tar.gz: 00fb083fcf99ddfbd779a57ad0a847ffc41e7dbff551ada550a35ba6a8118d50290af6c880897e97c549f2bb0425a56b54a44de7b14b356114990014863205cf
6
+ metadata.gz: 167d2641cf2f49f7b562e96d1c6ac38c27c8d212f07af3a42fc210fab4297635f1e82a67bc0e56026a984c3d4ff15b358bf0efe0d004aecbd6071fb414a59868
7
+ data.tar.gz: 4a2c9591624888b1ae8d1c5ece57a85a06d38f4f533e265fb0ac011093dfff244d549406762a01a2d87f523cafa7fa56960551fec06d6ccb6398b335f0268cbe
data/README.md CHANGED
@@ -1,62 +1,97 @@
1
1
  # auto-correct
2
2
 
3
- 自动纠正中文英文混排是一些不够好的写法,纠正错误的名词大小写。
3
+ Automatically add whitespace between CJK (Chinese, Japanese, Korean) and half-width characters (alphabetical letters, numerical digits and symbols).
4
4
 
5
- Before
5
+ 中文、日语、韩语 + 英文混排自动纠正补充空格,此方式已在 Ruby China 使用多年,支持 HTML 处理。
6
6
 
7
- ```
8
- [经验之谈]转行做ruby程序员的8个月, mysql 经验
9
- ```
7
+ [![Gem Version](https://badge.fury.io/rb/auto-correct.svg)](https://rubygems.org/gems/auto-correct) [![Build
8
+ Status](https://api.travis-ci.org/huacnlee/auto-correct.svg?branch=master&.svg)](http://travis-ci.org/huacnlee/auto-correct)
10
9
 
11
- After
12
10
 
13
- ```
14
- [经验之谈] 转行做 Ruby 程序员的 8 个月, MySQL 经验
15
- ```
11
+ ## Other implements
12
+
13
+ - Ruby - [auto-correct](https://github.com/huacnlee/auto-correct).
14
+ - Go - [go-auto-correct](https://github.com/huacnlee/go-auto-correct).
15
+ - Rust - [auto-correct.rs](https://github.com/huacnlee/auto-correct.rs).
16
+
17
+ ## Features
18
+
19
+ - Auto add spacings between CJK (Chinese) and English words.
20
+ - HTML content support.
21
+
22
+ [Examples](https://github.com/huacnlee/auto-correct/blob/master/test/format_test.rb)
16
23
 
17
- [![Gem Version](https://badge.fury.io/rb/auto-space.png)](https://rubygems.org/gems/auto-space) [![Build
18
- Status](https://secure.travis-ci.org/huacnlee/auto-space.png?branch=master&.png)](http://travis-ci.org/huacnlee/auto-space)
24
+ ## Usage
19
25
 
20
- ## 使用说明
26
+ `AutoCorrect.format` method for plain text.
21
27
 
22
- ```irb
23
- irb> require 'auto-correct'
24
- true
28
+ ```ruby
29
+ AutoCorrect.format("那里找到Ruby China App下载地址")
30
+ # => "那里找到 Ruby China App 下载地址"
25
31
 
26
- irb> "关于SSH连接的Permission denied(publickey).".auto_space!
27
- 关于 SSH 连接的 Permission denied (publickey).
32
+ AutoCorrect.format("Ruby 2.7版本第1次发布")
33
+ # => "Ruby 2.7 版本第 1 次发布"
28
34
 
29
- irb> "怎样追踪一个repo的新feature 和进展呢?".auto_space!
30
- 怎样追踪一个 repo 的新 feature 和进展呢?
35
+ AutoCorrect.format("于3月10日开始")
36
+ # => "于 3 月 10 日开始"
31
37
 
32
- irb> "vps上sessions不生效,但在本地的环境是ok的,why?".auto_space!
33
- vps sessions 不生效,但在本地的环境是 OK 的,why?
38
+ AutoCorrect.format("包装日期为2013年3月10日")
39
+ # => "包装日期为2013年3月10日"
34
40
 
35
- irb> "bootstrap control-group对齐问题".auto_space!
36
- bootstrap control-group 对齐问
41
+ AutoCorrect.format("生产环境中使用Ruby")
42
+ # => "生产环境中使用 Ruby"
43
+
44
+ AutoCorrect.format("本番環境でRubyを使用する")
45
+ # => "本番環境で Ruby を使用する"
46
+
47
+ AutoCorrect.format("프로덕션환경에서Ruby사용")
48
+ # => "프로덕션환경에서 Ruby 사용"
49
+ ```
50
+
51
+ `AutoCorrect.format_html` method for HTML content.
52
+
53
+ ```ruby
54
+ AutoCorrect.format_html("<div><p>长桥LongBridge App下载</p><p>最新版本1.0</p></div>")
55
+ # => "<div><p>长桥 LongBridge App 下载</p><p>最新版本 1.0</p></div>"
37
56
  ```
38
57
 
39
- ## 性能
58
+ ## Benchmark
40
59
 
41
- 详见 Rakefile
60
+ Run `rake bench` to test:
42
61
 
43
62
  ```
44
- $ rake benchmark
45
- user system total real
46
- 100 times 0.000000 0.000000 0.000000 ( 0.002223)
47
- 1000 times 0.030000 0.000000 0.030000 ( 0.024711)
48
- 10000 times 0.230000 0.000000 0.230000 ( 0.240850)
63
+ Warming up --------------------------------------
64
+ format 50 chars 1.886k i/100ms
65
+ format 100 chars 1.060k i/100ms
66
+ format 400 chars 342.000 i/100ms
67
+ format_html 85.000 i/100ms
68
+ Calculating -------------------------------------
69
+ format 50 chars 18.842k (± 1.5%) i/s - 94.300k in 5.005815s
70
+ format 100 chars 10.357k (± 1.8%) i/s - 51.940k in 5.016770s
71
+ format 400 chars 3.336k (± 2.2%) i/s - 16.758k in 5.026230s
72
+ format_html 839.761 (± 2.1%) i/s - 4.250k in 5.063225s
49
73
  ```
50
74
 
51
- ## TODO
75
+ | Total chars | Duration |
76
+ | ----- | ------- |
77
+ | 50 | 0.33 ms |
78
+ | 100 | 0.60 ms |
79
+ | 400 | 2 ms |
80
+
81
+ ### FormatHTML
52
82
 
53
- * 'Foo'的"Bar" -> 'Foo' "Bar"
54
- * 什么,时候 -> 什么, 时候 -> 什么,时候
83
+ | Total chars | Duration |
84
+ | ----- | ------- |
85
+ | 2K | 7 ms |
55
86
 
56
- ## 应用案例
87
+ ## Use cases
57
88
 
58
- * [Ruby China](http://ruby-china.org) - 目前整站的标题都做了自动转换处理。
89
+ * [Ruby China](https://ruby-china.org) - 目前整站的标题都做了自动转换处理。
59
90
 
60
- ## 参考内容
91
+ ## Links
61
92
 
62
93
  * [Chinese Copywriting Guidelines](https://github.com/sparanoid/chinese-copywriting-guidelines)
94
+
95
+ ## License
96
+
97
+ This project under MIT license.
@@ -1,40 +1,11 @@
1
- # coding: utf-8
2
- require "auto-correct/dicts"
1
+ require "auto-correct/strategery"
2
+ require "auto-correct/base"
3
+ require "auto-correct/format"
4
+ require "auto-correct/html"
5
+ require "auto-correct/string"
6
+ require "auto-correct/version"
3
7
 
4
- class String
5
- def auto_space!
6
- self.gsub! /((?![年月日号])\p{Han})([a-zA-Z0-9+$@#\[\(\/‘“])/u do
7
- "#$1 #$2"
8
- end
9
-
10
- self.gsub! /([a-zA-Z0-9+$’”\]\)@#!\/]|[\d[年月日]]{2,})((?![年月日号])\p{Han})/u do
11
- "#$1 #$2"
12
- end
13
-
14
- # Fix () [] near the English and number
15
- self.gsub! /([a-zA-Z0-9]+)([\[\(‘“])/u do
16
- "#$1 #$2"
17
- end
18
-
19
- self.gsub! /([\)\]’”])([a-zA-Z0-9]+)/u do
20
- "#$1 #$2"
21
- end
22
-
23
- self
24
- end
25
-
26
- def auto_correct!
27
- self.auto_space!
28
-
29
- self.gsub! /([\d\p{Han}]|\s|^)([a-zA-Z\d\-\_\.]+)([\d\p{Han}]|\s|$)/u do
30
- key = "#$2".downcase
31
- if AutoCorrect::DICTS.has_key?(key)
32
- ["#$1",AutoCorrect::DICTS[key],"#$3"].join("")
33
- else
34
- "#$1#$2#$3"
35
- end
36
- end
37
-
38
- self
39
- end
8
+ class AutoCorrect
40
9
  end
10
+
11
+ String.send :include, AutoCorrect::String
@@ -0,0 +1,13 @@
1
+ class AutoCorrect
2
+ @@strategies = []
3
+
4
+ class << self
5
+ def rule(one, other, space: false, reverse: false)
6
+ @@strategies << AutoCorrect::Strategery.new(one, other, space: space, reverse: reverse)
7
+ end
8
+
9
+ def strategies
10
+ @@strategies
11
+ end
12
+ end
13
+ end
@@ -0,0 +1,50 @@
1
+ class AutoCorrect
2
+ CJK = '\p{Han}|\p{Hangul}|\p{Hanunoo}|\p{Katakana}|\p{Hiragana}|\p{Bopomofo}'
3
+ SPACE = "[ ]"
4
+
5
+ # rubocop:disable Style/StringLiterals
6
+ # EnglishLetter
7
+ rule "#{CJK}", '[0-9a-zA-Z]', space: true, reverse: true
8
+
9
+ # SpecialSymbol
10
+ rule "#{CJK}", '[\|+$@#*]', space: true, reverse: true
11
+ rule "#{CJK}", '[\[\(‘“]', space: true
12
+ rule '[’”\]\)!%]', "#{CJK}", space: true
13
+ rule '[”\]\)!]', '[a-zA-Z0-9]+', space: true
14
+
15
+ # FullwidthPunctuation
16
+ rule %r([\w#{CJK}]), '[,。!?:;」》】”’]', reverse: true
17
+ rule '[‘“【「《]', %r([\w#{CJK}]), reverse: true
18
+
19
+ class << self
20
+ FULLDATE_RE = /#{SPACE}{0,}\d+#{SPACE}{0,}年#{SPACE}{0,}\d+#{SPACE}{0,}月#{SPACE}{0,}\d+#{SPACE}{0,}[日号]#{SPACE}{0,}/u
21
+ DASH_HAN_RE = /([#{CJK})】」》”’])([\-]+)([#{CJK}(【「《“‘])/
22
+ LEFT_QUOTE_RE = /#{SPACE}([(【「《])/
23
+ RIGHT_QUOTE_RE = /([)】」》])#{SPACE}/
24
+
25
+ def format(str)
26
+ out = str
27
+ self.strategies.each do |s|
28
+ out = s.format(out)
29
+ end
30
+ out = remove_full_date_spacing(out)
31
+ out = space_dash_with_hans(out)
32
+ out
33
+ end
34
+
35
+ private
36
+
37
+ def remove_full_date_spacing(str)
38
+ str.gsub(FULLDATE_RE) do |m|
39
+ m.gsub(/\s+/, "")
40
+ end
41
+ end
42
+
43
+ def space_dash_with_hans(str)
44
+ str = str.gsub(DASH_HAN_RE, '\1 \2 \3')
45
+ str = str.gsub(LEFT_QUOTE_RE, '\1')
46
+ str = str.gsub(RIGHT_QUOTE_RE, '\1')
47
+ str
48
+ end
49
+ end
50
+ end
@@ -0,0 +1,14 @@
1
+ require "nokogiri"
2
+
3
+ class AutoCorrect
4
+ class << self
5
+ def format_html(html)
6
+ doc = Nokogiri::HTML(html)
7
+ doc.traverse do |node|
8
+ next unless node.node_type == Nokogiri::XML::Node::TEXT_NODE
9
+ node.content = AutoCorrect.format(node.content)
10
+ end
11
+ doc.css("body").inner_html
12
+ end
13
+ end
14
+ end
@@ -0,0 +1,43 @@
1
+ class AutoCorrect
2
+ class Strategery
3
+ attr_reader :space, :reverse
4
+ attr_reader :add_space_rules, :remove_space_rules
5
+
6
+ def initialize(one, other, space: false, reverse: false)
7
+ @space = space
8
+ @reverse = reverse
9
+
10
+ @add_space_rules = [
11
+ /(#{one})(#{other})/u,
12
+ /(#{other})(#{one})/u
13
+ ]
14
+
15
+ @remove_space_rules = [
16
+ /(#{one})#{SPACE}+(#{other})/u,
17
+ /(#{other})#{SPACE}+(#{one})/u
18
+ ]
19
+ end
20
+
21
+ def format(str)
22
+ self.space ? add_space(str) : remove_space(str)
23
+ end
24
+
25
+ def add_space(str)
26
+ r0, r1 = add_space_rules
27
+ str = str.gsub(r0) { "#$1 #$2" }
28
+ if self.reverse
29
+ str = str.gsub(r1) { "#$1 #$2" }
30
+ end
31
+ str
32
+ end
33
+
34
+ def remove_space(str)
35
+ r0, r1 = remove_space_rules
36
+ str = str.gsub(r0) { "#$1 #$2" }
37
+ if self.reverse
38
+ str = str.gsub(r1) { "#$1 #$2" }
39
+ end
40
+ str
41
+ end
42
+ end
43
+ end
@@ -0,0 +1,13 @@
1
+ class AutoCorrect
2
+ module String
3
+ def auto_space!
4
+ ActiveSupport::Deprecation.warn("String.auto_space! is deprecated and will be removed in auto-corrrect 1.0, please use AutoCorrect.format instead.")
5
+ self.sub!(self, AutoCorrect.format(self))
6
+ end
7
+
8
+ def auto_correct!
9
+ ActiveSupport::Deprecation.warn("String.auto_correct! is deprecated and will be removed in auto-corrrect 1.0, please use AutoCorrect.format instead.")
10
+ self.sub!(self, AutoCorrect.format(self))
11
+ end
12
+ end
13
+ end
@@ -0,0 +1,3 @@
1
+ class AutoCorrect
2
+ VERSION = "0.3.1"
3
+ end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: auto-correct
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.0.pre0
4
+ version: 0.3.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - Luikore
@@ -9,23 +9,24 @@ authors:
9
9
  autorequire:
10
10
  bindir: bin
11
11
  cert_chain: []
12
- date: 2014-07-15 00:00:00.000000000 Z
12
+ date: 2020-08-07 00:00:00.000000000 Z
13
13
  dependencies:
14
14
  - !ruby/object:Gem::Dependency
15
- name: activesupport
15
+ name: nokogiri
16
16
  requirement: !ruby/object:Gem::Requirement
17
17
  requirements:
18
- - - ">"
18
+ - - ">="
19
19
  - !ruby/object:Gem::Version
20
- version: 3.0.0
20
+ version: '1.4'
21
21
  type: :runtime
22
22
  prerelease: false
23
23
  version_requirements: !ruby/object:Gem::Requirement
24
24
  requirements:
25
- - - ">"
25
+ - - ">="
26
26
  - !ruby/object:Gem::Version
27
- version: 3.0.0
28
- description: "自动给中文英文之间加入合理的空格"
27
+ version: '1.4'
28
+ description: Automatically add whitespace between Chinese and and half-width characters
29
+ (alphabetical letters, numerical digits and symbols).
29
30
  email:
30
31
  - usurffx@gmail.com
31
32
  - huacnlee@gmail.com
@@ -35,7 +36,12 @@ extra_rdoc_files: []
35
36
  files:
36
37
  - README.md
37
38
  - lib/auto-correct.rb
38
- - lib/auto-correct/dicts.rb
39
+ - lib/auto-correct/base.rb
40
+ - lib/auto-correct/format.rb
41
+ - lib/auto-correct/html.rb
42
+ - lib/auto-correct/strategery.rb
43
+ - lib/auto-correct/string.rb
44
+ - lib/auto-correct/version.rb
39
45
  homepage: https://github.com/huacnlee/auto-correct
40
46
  licenses: []
41
47
  metadata: {}
@@ -50,14 +56,13 @@ required_ruby_version: !ruby/object:Gem::Requirement
50
56
  version: '0'
51
57
  required_rubygems_version: !ruby/object:Gem::Requirement
52
58
  requirements:
53
- - - ">"
59
+ - - ">="
54
60
  - !ruby/object:Gem::Version
55
- version: 1.3.1
61
+ version: '0'
56
62
  requirements: []
57
- rubyforge_project:
58
- rubygems_version: 2.2.2
63
+ rubygems_version: 3.0.3
59
64
  signing_key:
60
65
  specification_version: 4
61
- summary: "自动给中文英文之间加入合理的空格"
66
+ summary: Automatically add whitespace between Chinese and and half-width characters
67
+ (alphabetical letters, numerical digits and symbols).
62
68
  test_files: []
63
- has_rdoc:
@@ -1,103 +0,0 @@
1
- module AutoCorrect
2
- DICTS = {
3
- # Ruby
4
- "ruby" => "Ruby",
5
- "rails" => "Rails",
6
- "rubygems" => "RubyGems",
7
- "ror" => "Ruby on Rails",
8
- "rubyconf" => "RubyConf",
9
- "railsconf" => "RailsConf",
10
- "rubytuesday" => "Ruby Tuesday",
11
- "jruby" => "JRuby",
12
- "mruby" => "mRuby",
13
- "rvm" => "RVM",
14
- "rbenv" => "rbenv",
15
- "yard" => "YARD",
16
- "rdoc" => "RDoc",
17
- "rspec" => "RSpec",
18
- "minitest" => "MiniTest",
19
- "coffeescript" => "CoffeeScript",
20
- "scss" => "SCSS",
21
- "sass" => "Sass",
22
- "sidekiq" => "Sidekiq",
23
- "railscasts" => "RailsCasts",
24
- "execjs" => "ExecJS",
25
-
26
- # Python
27
-
28
- # Node.js
29
- "nodejs" => "Node.js",
30
-
31
- # Go
32
-
33
- # Cocoa
34
- "reactivecocoa" => "ReactiveCocoa",
35
-
36
- # Programming
37
- "ssh" => "SSH",
38
- "css" => "CSS",
39
- "html" => "HTML",
40
- "javascript" => "JavaScript",
41
- "js" => "JS",
42
- "png" => "PNG",
43
- "dsl" => "DSL",
44
- "tdd" => "TDD",
45
- "bdd" => "BDD",
46
-
47
- # Sites
48
- "github" => "GitHub",
49
- "gist" => "Gist",
50
- "ruby_china" => "Ruby China",
51
- "ruby-china" => "Ruby China",
52
- "rubychina" => "Ruby China",
53
- "v2ex" => "V2EX",
54
- "heroku" => "Heroku",
55
- "stackoverflow" => "Stack Overflow",
56
- "stackexchange" => "StackExchange",
57
-
58
-
59
- # Databases
60
- "mysql" => "MySQL",
61
- "postgresql" => "PostgreSQL",
62
- "sqlite" => "SQLite",
63
- "mongodb" => "MongoDB",
64
- "rethinkdb" => "RethinkDB",
65
- "elasticsearch" => "Elasticsearch",
66
- "sphinx" => "Sphinx",
67
-
68
- # OpenSource Projects
69
- "gitlab" => "GitLab",
70
- "gitlabci" => "GitLab CI",
71
- "fontawsome" => "FontAwsome",
72
- "bootstrap" => "Bootstrap",
73
- "less" => "Less",
74
- "jquery" => "jQuery",
75
- "requirejs" => "RequireJS",
76
- "underscore" => "Underscore",
77
- "backbone" => "Backbone",
78
- "seajs" => "SeaJS",
79
- "imagemagick" => "ImageMagick",
80
-
81
- # Tools
82
- "vim" => "VIM",
83
- "emacs" => "Emacs",
84
- "textmate" => "TextMate",
85
- "sublime" => "Sublime",
86
- "rubymine" => "RubyMine",
87
- "sequelpro" => "Sequel Pro",
88
- "virtualbox" => "VirtualBox",
89
- "safari" => "Safari",
90
- "chrome" => "Chrome",
91
- "ie" => "IE",
92
-
93
- # Misc
94
- "ios" => "iOS",
95
- "iphone" => "iPhone",
96
- "android" => "Android",
97
- "osx" => "OS X",
98
- "mac" => "Mac",
99
- "api" => "API",
100
- "wi-fi" => "Wi-Fi",
101
- "wifi" => "Wi-Fi"
102
- }
103
- end