gigo 1.2.0 → 1.3.0

Sign up to get free protection for your applications and to get access to all the features.
data/README.md CHANGED
@@ -5,9 +5,7 @@ Or better yet, Garbage In, Gold Out! - The GIGO gem aims to fix ruby string enco
5
5
 
6
6
  The GIGO gem is not likely the proper solutions. If you have bad encodings in your database, you should fix them and write consistent encodings. That said, if you have no other choice, GIGO can help.
7
7
 
8
- This gem depends on one of the many public forks of `CharDet` for ruby. Since `CharDet` is not a public gem and following proper semantic versioning, we have decided to vendor the [kirillrdy/rchardet](http://github.com/kirillrdy/rchardet) repo. We have even made sure that our vendored version stays in our namesacpe by using `GIGO::CharDet`. So if you have another version bundled, feel confident that the two will not conflict.
9
-
10
- We use `GIGO::CharDet` to do the grunt work of finding the proper encoding of an untrusted string. Once found, we use the [EnsureValidEncoding](http://github.com/jrochkind/ensure_valid_encoding) gem to either force an encoding while removing any non-convertable characters.
8
+ This gem depends on a series of transcoders including `ActiveSupport::Multibyte#tidy_bytes` along with one of the many public forks of `CharDet` for ruby. Since `CharDet` is not a public gem and following proper semantic versioning, we have decided to vendor the [kirillrdy/rchardet](http://github.com/kirillrdy/rchardet) repo. We have even made sure that our vendored version stays in our namesacpe by using `GIGO::CharDet`. So if you have another version bundled, feel confident that the two will not conflict.
11
9
 
12
10
 
13
11
  ## Usage
@@ -26,6 +24,19 @@ def comments
26
24
  end
27
25
  ```
28
26
 
27
+ GIGO's encoding can be configured using the `GIGO.encoding` accessor. By default this is either `Encoding.default_internal` with a fallback to `Encoding::UTF_8`.
28
+
29
+
30
+ ## Transcoders
31
+
32
+ GIGO transcoders can be any module or class that implements the `transcode` method. This method takes one argument, the string to transcode and can hook into the `GIGO.encoding` if needed. The default list of transcoders is.
33
+
34
+ * GIGO::Transcoders::ActiveSupport
35
+ * GIGO::Transcoders::CharDet
36
+ * GIGO::Transcoders::Blind
37
+
38
+ GIGO attempts to use each in that order. Upon successful transcoding, we use the [EnsureValidEncoding](http://github.com/jrochkind/ensure_valid_encoding) gem to force an encoding to match the `GIGO.encoding` while removing any non-convertable characters.
39
+
29
40
 
30
41
  ## Toe Dough List
31
42
 
@@ -45,6 +56,6 @@ $ bundle exec rake appraisal test
45
56
  We use the [appraisal](https://github.com/thoughtbot/appraisal) gem from Thoughtbot to help us generate the individual gemfiles for each ActiveSupport version and to run the tests locally against each generated Gemfile. The `rake appraisal test` command actually runs our test suite against all Rails versions in our `Appraisal` file. If you want to run the tests for a specific Rails version, use `rake -T` for a list. For example, the following command will run the tests for Rails 3.2 only.
46
57
 
47
58
  ```shell
48
- $ bundle exec rake appraisal:rails32 test
59
+ $ bundle exec rake appraisal:activesupport32 test
49
60
  ```
50
61
 
@@ -1,7 +1,7 @@
1
1
  PATH
2
2
  remote: /Users/kencollins/Repositories/customink/gigo
3
3
  specs:
4
- gigo (1.1.0)
4
+ gigo (1.2.0)
5
5
  activesupport (>= 3.0)
6
6
  ensure_valid_encoding (~> 0.5.3)
7
7
 
@@ -9,12 +9,12 @@ GEM
9
9
  remote: https://rubygems.org/
10
10
  specs:
11
11
  activesupport (3.0.20)
12
- appraisal (0.5.1)
12
+ appraisal (0.5.2)
13
13
  bundler
14
14
  rake
15
15
  ensure_valid_encoding (0.5.3)
16
- minitest (4.7.0)
17
- minitest-emoji (1.0.0)
16
+ i18n (0.6.4)
17
+ minitest (5.0.1)
18
18
  rake (10.0.4)
19
19
 
20
20
  PLATFORMS
@@ -24,6 +24,6 @@ DEPENDENCIES
24
24
  activesupport (~> 3.0.0)
25
25
  appraisal
26
26
  gigo!
27
- minitest
28
- minitest-emoji
27
+ i18n
28
+ minitest (~> 5.0)
29
29
  rake
@@ -1,7 +1,7 @@
1
1
  PATH
2
2
  remote: /Users/kencollins/Repositories/customink/gigo
3
3
  specs:
4
- gigo (1.1.0)
4
+ gigo (1.2.0)
5
5
  activesupport (>= 3.0)
6
6
  ensure_valid_encoding (~> 0.5.3)
7
7
 
@@ -10,12 +10,12 @@ GEM
10
10
  specs:
11
11
  activesupport (3.1.10)
12
12
  multi_json (>= 1.0, < 1.3)
13
- appraisal (0.5.1)
13
+ appraisal (0.5.2)
14
14
  bundler
15
15
  rake
16
16
  ensure_valid_encoding (0.5.3)
17
- minitest (4.7.0)
18
- minitest-emoji (1.0.0)
17
+ i18n (0.6.4)
18
+ minitest (5.0.1)
19
19
  multi_json (1.2.0)
20
20
  rake (10.0.4)
21
21
 
@@ -26,6 +26,6 @@ DEPENDENCIES
26
26
  activesupport (~> 3.1.0)
27
27
  appraisal
28
28
  gigo!
29
- minitest
30
- minitest-emoji
29
+ i18n
30
+ minitest (~> 5.0)
31
31
  rake
@@ -1,24 +1,23 @@
1
1
  PATH
2
2
  remote: /Users/kencollins/Repositories/customink/gigo
3
3
  specs:
4
- gigo (1.1.0)
4
+ gigo (1.2.0)
5
5
  activesupport (>= 3.0)
6
6
  ensure_valid_encoding (~> 0.5.3)
7
7
 
8
8
  GEM
9
9
  remote: https://rubygems.org/
10
10
  specs:
11
- activesupport (3.2.13)
12
- i18n (= 0.6.1)
11
+ activesupport (3.2.12)
12
+ i18n (~> 0.6)
13
13
  multi_json (~> 1.0)
14
- appraisal (0.5.1)
14
+ appraisal (0.5.2)
15
15
  bundler
16
16
  rake
17
17
  ensure_valid_encoding (0.5.3)
18
- i18n (0.6.1)
19
- minitest (4.7.0)
20
- minitest-emoji (1.0.0)
21
- multi_json (1.7.2)
18
+ i18n (0.6.4)
19
+ minitest (5.0.1)
20
+ multi_json (1.7.3)
22
21
  rake (10.0.4)
23
22
 
24
23
  PLATFORMS
@@ -28,6 +27,6 @@ DEPENDENCIES
28
27
  activesupport (~> 3.2.0)
29
28
  appraisal
30
29
  gigo!
31
- minitest
32
- minitest-emoji
30
+ i18n
31
+ minitest (~> 5.0)
33
32
  rake
@@ -1,33 +1,32 @@
1
1
  GIT
2
2
  remote: git://github.com/rails/rails.git
3
- revision: b6e5971e53746a79abc92ce9acea85ca14fab0b3
3
+ revision: d3d8cfd5689188f48714f49ad000a1c1fbd9edcd
4
4
  specs:
5
- activesupport (4.0.0.beta1)
5
+ activesupport (4.1.0.beta)
6
6
  i18n (~> 0.6, >= 0.6.4)
7
- minitest (~> 4.2)
8
- multi_json (~> 1.3)
7
+ json (~> 1.7)
8
+ minitest (~> 5.0)
9
9
  thread_safe (~> 0.1)
10
10
  tzinfo (~> 0.3.37)
11
11
 
12
12
  PATH
13
13
  remote: /Users/kencollins/Repositories/customink/gigo
14
14
  specs:
15
- gigo (1.1.0)
15
+ gigo (1.2.0)
16
16
  activesupport (>= 3.0)
17
17
  ensure_valid_encoding (~> 0.5.3)
18
18
 
19
19
  GEM
20
20
  remote: https://rubygems.org/
21
21
  specs:
22
- appraisal (0.5.1)
22
+ appraisal (0.5.2)
23
23
  bundler
24
24
  rake
25
- atomic (1.0.1)
25
+ atomic (1.1.9)
26
26
  ensure_valid_encoding (0.5.3)
27
27
  i18n (0.6.4)
28
- minitest (4.7.0)
29
- minitest-emoji (1.0.0)
30
- multi_json (1.7.2)
28
+ json (1.8.0)
29
+ minitest (5.0.1)
31
30
  rake (10.0.4)
32
31
  thread_safe (0.1.0)
33
32
  atomic
@@ -40,6 +39,6 @@ DEPENDENCIES
40
39
  activesupport!
41
40
  appraisal
42
41
  gigo!
43
- minitest
44
- minitest-emoji
42
+ i18n
43
+ minitest (~> 5.0)
45
44
  rake
@@ -18,7 +18,7 @@ Gem::Specification.new do |gem|
18
18
  gem.add_runtime_dependency 'activesupport', '>= 3.0'
19
19
  gem.add_runtime_dependency 'ensure_valid_encoding', '~> 0.5.3'
20
20
  gem.add_development_dependency 'appraisal'
21
+ gem.add_development_dependency 'i18n' # Older ActiveSupport does not have a proper dep.
21
22
  gem.add_development_dependency 'rake'
22
- gem.add_development_dependency 'minitest'
23
- gem.add_development_dependency 'minitest-emoji'
23
+ gem.add_development_dependency 'minitest', '~> 5.0'
24
24
  end
@@ -1,40 +1,38 @@
1
- require 'active_support/multibyte'
2
- require 'active_support/core_ext/object/acts_like'
3
- require 'active_support/core_ext/string/behavior'
1
+ require 'active_support/all'
4
2
  require 'ensure_valid_encoding'
5
- require 'gigo/rchardet'
3
+ require 'gigo/transcoders'
4
+ require 'gigo/transcoders/active_support'
5
+ require 'gigo/transcoders/rchardet'
6
+ require 'gigo/transcoders/blind'
6
7
  require 'gigo/version'
7
8
 
8
9
  module GIGO
9
10
 
10
- def self.load(data)
11
+ mattr_accessor :encoding
12
+ self.encoding = Encoding.default_internal || Encoding::UTF_8
13
+
14
+ def self.load(data, options = {})
11
15
  return data if data.nil? || !data.acts_like?(:string)
12
- encoded_string = safe_detect_and_encoder(data)
13
- return data if data.encoding == forced_encoding && data == encoded_string
16
+ tcoders = options[:transcoders] || transcoders
17
+ encoded_string = transcode(data, tcoders)
18
+ return data if data.encoding == GIGO.encoding && data == encoded_string
14
19
  encoded_string
15
20
  end
16
21
 
17
22
 
18
23
  protected
19
24
 
20
- def self.safe_detect_and_encoder(data)
25
+ def self.transcode(data, tcoders)
21
26
  string = data
22
- begin
23
- string = ActiveSupport::Multibyte.proxy_class.new(string).tidy_bytes
24
- rescue Exception => e
27
+ tcoders.detect do |t|
25
28
  begin
26
- encoding = CharDet.detect(string.dup)['encoding'] || string.encoding || Encoding.default_internal || forced_encoding
27
- string = string.force_encoding(encoding).encode forced_encoding, :undef => :replace, :invalid => :replace
29
+ string = t.transcode(string)
28
30
  rescue Exception => e
29
- string = string.encode forced_encoding, :undef => :replace, :invalid => :replace
31
+ false
30
32
  end
31
33
  end
32
34
  string = EnsureValidEncoding.ensure_valid_encoding string, invalid: :replace, replace: "?"
33
- string.to_s
35
+ string
34
36
  end
35
37
 
36
- def self.forced_encoding
37
- Encoding.default_internal || Encoding::UTF_8
38
- end
39
-
40
38
  end
@@ -0,0 +1,7 @@
1
+ module GIGO
2
+ module Transcoders
3
+
4
+ end
5
+ mattr_accessor :transcoders
6
+ self.transcoders = []
7
+ end
@@ -0,0 +1,13 @@
1
+ module GIGO
2
+ module Transcoders
3
+ module ActiveSupport
4
+
5
+ GIGO.transcoders << self
6
+
7
+ def self.transcode(data)
8
+ ::ActiveSupport::Multibyte.proxy_class.new(data).tidy_bytes.to_s
9
+ end
10
+
11
+ end
12
+ end
13
+ end
@@ -0,0 +1,13 @@
1
+ module GIGO
2
+ module Transcoders
3
+ module Blind
4
+
5
+ GIGO.transcoders << self
6
+
7
+ def self.transcode(data)
8
+ data.encode GIGO.encoding, :undef => :replace, :invalid => :replace
9
+ end
10
+
11
+ end
12
+ end
13
+ end
@@ -0,0 +1,22 @@
1
+ require 'gigo/rchardet'
2
+
3
+ module GIGO
4
+ module Transcoders
5
+ module CharDet
6
+
7
+ GIGO.transcoders << self
8
+
9
+ def self.transcode(data)
10
+ source_encoding = detect_encoding(data) || data.encoding || Encoding.default_internal || Encoding::UTF_8
11
+ data.force_encoding(source_encoding).encode GIGO.encoding, :undef => :replace, :invalid => :replace
12
+ end
13
+
14
+ private
15
+
16
+ def self.detect_encoding(data)
17
+ CharDet.detect(data.dup)['encoding']
18
+ end
19
+
20
+ end
21
+ end
22
+ end
@@ -1,3 +1,3 @@
1
1
  module GIGO
2
- VERSION = "1.2.0"
2
+ VERSION = "1.3.0"
3
3
  end
@@ -4,8 +4,6 @@ require 'test_helper'
4
4
  module GIGO
5
5
  class BaseTest < TestCase
6
6
 
7
- include ERB::Util
8
-
9
7
  let(:data_utf8_emoji) { "💖" }
10
8
  let(:data_utf8) { "€20 – “Woohoo”" }
11
9
  let(:data_bad_readin) { "�20 � �Woohoo�" }
@@ -14,6 +12,19 @@ module GIGO
14
12
  let(:data_really_bad) { "ed.Ã\u0083Ã\u0083\xC3" }
15
13
 
16
14
 
15
+ describe '.encoding' do
16
+
17
+ it 'defaults to UTF-8 encoding' do
18
+ GIGO.encoding.must_equal Encoding::UTF_8
19
+ end
20
+
21
+ it 'can be set to any encoding' do
22
+ GIGO.encoding = Encoding::CP1252
23
+ GIGO.encoding.must_equal Encoding::CP1252
24
+ end
25
+
26
+ end
27
+
17
28
  describe '.load' do
18
29
 
19
30
  it 'ignores if string is not present' do
@@ -61,7 +72,17 @@ module GIGO
61
72
 
62
73
  end
63
74
 
64
-
75
+ describe '.transcoders' do
76
+
77
+ it 'is an array of default transcoders' do
78
+ GIGO.transcoders.must_equal [
79
+ GIGO::Transcoders::ActiveSupport,
80
+ GIGO::Transcoders::CharDet,
81
+ GIGO::Transcoders::Blind
82
+ ]
83
+ end
84
+
85
+ end
65
86
 
66
87
  end
67
88
  end
@@ -1,14 +1,26 @@
1
- require 'bundler'
2
- require 'minitest/autorun'
3
- Bundler.require :development, :test
1
+ require 'bundler' ; Bundler.require :development, :test
4
2
  require 'gigo'
5
- require 'support/minitest'
3
+ require 'minitest/autorun'
6
4
  require 'erb'
7
5
 
8
6
  module GIGO
9
7
  class TestCase < MiniTest::Spec
10
8
 
9
+ include ERB::Util
10
+
11
+ before { setup_gigo }
12
+ after { teardown_gigo }
13
+
14
+
15
+ private
16
+
17
+ def setup_gigo
18
+ @_default_gigo_encoding = GIGO.encoding
19
+ end
11
20
 
21
+ def teardown_gigo
22
+ GIGO.encoding = @_default_gigo_encoding
23
+ end
12
24
 
13
25
  end
14
26
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: gigo
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.2.0
4
+ version: 1.3.0
5
5
  prerelease:
6
6
  platform: ruby
7
7
  authors:
@@ -9,7 +9,7 @@ authors:
9
9
  autorequire:
10
10
  bindir: bin
11
11
  cert_chain: []
12
- date: 2013-04-11 00:00:00.000000000 Z
12
+ date: 2013-05-19 00:00:00.000000000 Z
13
13
  dependencies:
14
14
  - !ruby/object:Gem::Dependency
15
15
  name: activesupport
@@ -60,7 +60,7 @@ dependencies:
60
60
  - !ruby/object:Gem::Version
61
61
  version: '0'
62
62
  - !ruby/object:Gem::Dependency
63
- name: rake
63
+ name: i18n
64
64
  requirement: !ruby/object:Gem::Requirement
65
65
  none: false
66
66
  requirements:
@@ -76,7 +76,7 @@ dependencies:
76
76
  - !ruby/object:Gem::Version
77
77
  version: '0'
78
78
  - !ruby/object:Gem::Dependency
79
- name: minitest
79
+ name: rake
80
80
  requirement: !ruby/object:Gem::Requirement
81
81
  none: false
82
82
  requirements:
@@ -92,21 +92,21 @@ dependencies:
92
92
  - !ruby/object:Gem::Version
93
93
  version: '0'
94
94
  - !ruby/object:Gem::Dependency
95
- name: minitest-emoji
95
+ name: minitest
96
96
  requirement: !ruby/object:Gem::Requirement
97
97
  none: false
98
98
  requirements:
99
- - - ! '>='
99
+ - - ~>
100
100
  - !ruby/object:Gem::Version
101
- version: '0'
101
+ version: '5.0'
102
102
  type: :development
103
103
  prerelease: false
104
104
  version_requirements: !ruby/object:Gem::Requirement
105
105
  none: false
106
106
  requirements:
107
- - - ! '>='
107
+ - - ~>
108
108
  - !ruby/object:Gem::Version
109
- version: '0'
109
+ version: '5.0'
110
110
  description: Garbage in, garbage out. Fix ruby encoded strings at all costs.
111
111
  email:
112
112
  - kcollins@customink.com
@@ -165,9 +165,12 @@ files:
165
165
  - lib/gigo/rchardet/sjisprober.rb
166
166
  - lib/gigo/rchardet/universaldetector.rb
167
167
  - lib/gigo/rchardet/utf8prober.rb
168
+ - lib/gigo/transcoders.rb
169
+ - lib/gigo/transcoders/active_support.rb
170
+ - lib/gigo/transcoders/blind.rb
171
+ - lib/gigo/transcoders/rchardet.rb
168
172
  - lib/gigo/version.rb
169
173
  - test/cases/gigo_test.rb
170
- - test/support/minitest.rb
171
174
  - test/test_helper.rb
172
175
  homepage: http://github.com/customink/gigo
173
176
  licenses: []
@@ -183,7 +186,7 @@ required_ruby_version: !ruby/object:Gem::Requirement
183
186
  version: '0'
184
187
  segments:
185
188
  - 0
186
- hash: 4476278558512394603
189
+ hash: -1915512776307670961
187
190
  required_rubygems_version: !ruby/object:Gem::Requirement
188
191
  none: false
189
192
  requirements:
@@ -192,7 +195,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
192
195
  version: '0'
193
196
  segments:
194
197
  - 0
195
- hash: 4476278558512394603
198
+ hash: -1915512776307670961
196
199
  requirements: []
197
200
  rubyforge_project:
198
201
  rubygems_version: 1.8.25
@@ -203,5 +206,4 @@ summary: The gigo gem aims to solve bad data, likely from a legacy database. It
203
206
  put in and take out of your data stores.
204
207
  test_files:
205
208
  - test/cases/gigo_test.rb
206
- - test/support/minitest.rb
207
209
  - test/test_helper.rb
@@ -1,7 +0,0 @@
1
- require 'minitest/emoji'
2
-
3
- if ENV['CI']
4
- MiniTest::Emoji::DEFAULT.merge! '.' => ".", 'F' => "F", 'E' => "E", 'S' => "S"
5
- else
6
- MiniTest::Emoji::DEFAULT.merge! '.' => "\u{1f49A} ", 'F' => "\u{1f494} ", 'E' => "\u{1f480} ", 'S' => "\u{1f49B} "
7
- end