rchardet19 1.3.5 → 1.3.6
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +7 -0
- data/README.markdown +33 -30
- data/Rakefile +2 -29
- data/lib/rchardet/jpcntx.rb +7 -7
- data/rchardet.gemspec +2 -2
- metadata +40 -44
checksums.yaml
ADDED
@@ -0,0 +1,7 @@
|
|
1
|
+
---
|
2
|
+
SHA1:
|
3
|
+
metadata.gz: 437c6ce14d2dcccc100d652abf7efd8ad2f76123
|
4
|
+
data.tar.gz: 59ab35c63ab22ce50d67eefeef71830500829654
|
5
|
+
SHA512:
|
6
|
+
metadata.gz: 55592006d5309c094cc335d23ad0d668397ef3cceed5d4b95721baadc4f43048b767dc6fdddb241f4de3aab843969e4827087b9793b81ac92599991cad4c5461
|
7
|
+
data.tar.gz: 19e54f6731b6cd4ecd10e5902dc0f8f9500fc0a98658c73c0d6d1a40a81c6791fc6383fdc0344c195a7ffdd62fce44eeb13611987b8e39ed5ccb7ee1bf008c11
|
data/README.markdown
CHANGED
@@ -1,63 +1,66 @@
|
|
1
1
|
# rCharDet*19*
|
2
2
|
|
3
|
-
|
3
|
+
## [Project Page](http://rubyforge.org/projects/rchardet) | [1.9 Author](https://github.com/edouard/rchardet) | [Original Author](https://github.com/jmhodges/rchardet)
|
4
4
|
|
5
|
-
rCharDet is a character encoding detection library for ruby and the implementation is based
|
6
|
-
on Mozilla Charset Detectors.
|
5
|
+
*rCharDet* is a character encoding detection library for ruby and the implementation is based on Mozilla Charset Detectors.
|
7
6
|
|
8
|
-
This is a forked project in a effort to make it Ruby 1.9 compatible
|
7
|
+
This is a forked project in a effort to make it Ruby 1.9 compatible.
|
9
8
|
|
10
|
-
|
11
|
-
|
12
|
-
require 'rubygems'
|
13
|
-
require 'rchardet19'
|
14
|
-
|
15
|
-
>> cd = CharDet.detect("some data")
|
16
|
-
=> #<struct #<Class:0x102216198> encoding="ascii", confidence=1.0>
|
9
|
+
Follow me on [Twitter](http://twitter.com/linusoleander) or [Github](https://github.com/oleander) for more info and updates.
|
17
10
|
|
18
|
-
### How to use
|
11
|
+
### How to use
|
19
12
|
|
20
|
-
`detect` takes the variable `data` that contains an unknown encoding.
|
13
|
+
`CharDet.detect` takes the variable `data` that contains an unknown encoding.
|
21
14
|
|
22
15
|
We then try to change the encoding to UTF-8, but only if we are at least ~ 60% sure that we found the right encoding.
|
23
16
|
|
24
|
-
|
25
|
-
|
26
|
-
|
27
|
-
|
28
|
-
|
17
|
+
```` ruby
|
18
|
+
data = "Some unknown data"
|
19
|
+
cd = CharDet.detect(data)
|
20
|
+
data = cd.confidence > 0.6 ? Iconv.conv(cd.encoding, "UTF-8", data) : data
|
21
|
+
````
|
22
|
+
|
23
|
+
## What do I've to work with?
|
29
24
|
|
30
|
-
A struct is being returned from the `detect` method
|
25
|
+
A struct is being returned from the `detect` method, it has the following accessors.
|
31
26
|
|
32
27
|
- **encoding** (String) Encoding of the ingoing string, `UTF-8` for example.
|
33
28
|
- **confidence** (Float) The confidence level of the *encoding*, from 0.0 to 1.0, where 1.0 is the best.
|
34
29
|
|
35
|
-
|
30
|
+
## Make it silent
|
36
31
|
|
37
32
|
The `detect` takes two arguments, the string to guess the encoding on and an option hash.
|
38
33
|
|
39
34
|
You can use the option hash de decide if you want the `detect` method to raise an exception or not if the ingoing string is `nil`.
|
40
35
|
|
41
|
-
|
42
|
-
|
43
|
-
|
44
|
-
|
45
|
-
|
46
|
-
|
36
|
+
```` ruby
|
37
|
+
CharDet.detect("some data", :silent => true) # Won't raise an exception
|
38
|
+
CharDet.detect(nil, :silent => true) # Won't raise an exception
|
39
|
+
CharDet.detect(nil) # Will raise an exception
|
40
|
+
CharDet.detect(nil, :silent => false) # Will raise an exception
|
41
|
+
````
|
42
|
+
|
43
|
+
## How do install
|
47
44
|
|
48
45
|
[sudo] gem install rchardet19
|
49
46
|
|
50
|
-
|
47
|
+
## How to use it in a rails 3 project
|
51
48
|
|
52
49
|
Add `gem 'rchardet19'` to your Gemfile and run `bundle`.
|
53
50
|
|
54
|
-
|
51
|
+
## How to help
|
55
52
|
|
56
53
|
- Start by copying the project or make your own branch.
|
57
54
|
- Navigate to the root path of the project and run `bundle`.
|
58
55
|
- Start by running all tests using rspec, `rspec spec/rchardet19_spec.rb`.
|
59
56
|
- Implement your own code, write some tests, commit and do a pull request.
|
60
57
|
|
61
|
-
|
58
|
+
## Requirements
|
59
|
+
|
60
|
+
*rCharDet19* is tested in
|
61
|
+
- OS X 10.6.6 using Ruby 1.8.7 and 1.9.2.
|
62
|
+
- Ubuntu 12.10 using Ruby 1.9.3 and 2.0.0-p0
|
63
|
+
|
64
|
+
## License
|
62
65
|
|
63
|
-
rCharDet19 is
|
66
|
+
*rCharDet19* is released under the *MIT license*.
|
data/Rakefile
CHANGED
@@ -1,29 +1,2 @@
|
|
1
|
-
require
|
2
|
-
|
3
|
-
require 'rake/gempackagetask'
|
4
|
-
begin
|
5
|
-
require 'lib/rchardet'
|
6
|
-
rescue LoadError
|
7
|
-
module CharDet; VERSION = '0.0.0'; end
|
8
|
-
puts "Problem loading rfeedparser; try rake setup"
|
9
|
-
end
|
10
|
-
|
11
|
-
spec = Gem::Specification.new do |s|
|
12
|
-
s.name = "rchardet"
|
13
|
-
s.version = CharDet::VERSION
|
14
|
-
s.author = "Jeff Hodges"
|
15
|
-
s.email = "jeff at somethingsimilar dot com"
|
16
|
-
s.homepage = "http://github.com/jmhodges/rchardet/tree/master"
|
17
|
-
s.platform = Gem::Platform::RUBY
|
18
|
-
s.summary = "Character encoding auto-detection in Ruby. As smart as your browser. Open source."
|
19
|
-
s.files = FileList["lib/**/*"]
|
20
|
-
s.require_path = "lib"
|
21
|
-
# s.autorequire = "feedparser" # tHe 3vil according to Why.
|
22
|
-
s.has_rdoc = false # TODO: fix
|
23
|
-
s.extra_rdoc_files = ['README', 'COPYING']
|
24
|
-
s.rubyforge_project = 'rchardet'
|
25
|
-
|
26
|
-
end
|
27
|
-
|
28
|
-
Rake::GemPackageTask.new(spec) do
|
29
|
-
end
|
1
|
+
require "bundler"
|
2
|
+
Bundler::GemHelper.install_tasks
|
data/lib/rchardet/jpcntx.rb
CHANGED
@@ -14,12 +14,12 @@
|
|
14
14
|
# modify it under the terms of the GNU Lesser General Public
|
15
15
|
# License as published by the Free Software Foundation; either
|
16
16
|
# version 2.1 of the License, or (at your option) any later version.
|
17
|
-
#
|
17
|
+
#
|
18
18
|
# This library is distributed in the hope that it will be useful,
|
19
19
|
# but WITHOUT ANY WARRANTY; without even the implied warranty of
|
20
20
|
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
|
21
21
|
# Lesser General Public License for more details.
|
22
|
-
#
|
22
|
+
#
|
23
23
|
# You should have received a copy of the GNU Lesser General Public
|
24
24
|
# License along with this library; if not, write to the Free Software
|
25
25
|
# Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA
|
@@ -137,9 +137,9 @@ module CharDet
|
|
137
137
|
return if @_mDone
|
138
138
|
|
139
139
|
# The buffer we got is byte oriented, and a character may span in more than one
|
140
|
-
# buffers. In case the last one or two byte in last buffer is not complete, we
|
140
|
+
# buffers. In case the last one or two byte in last buffer is not complete, we
|
141
141
|
# record how many byte needed to complete that character and skip these bytes here.
|
142
|
-
# We can choose to record those bytes as well and analyse the character once it
|
142
|
+
# We can choose to record those bytes as well and analyse the character once it
|
143
143
|
# is complete, but since a character will not make much difference, by simply skipping
|
144
144
|
# this character will simply our logic and improve performance.
|
145
145
|
i = @_mNeedToSkipCharNum
|
@@ -195,10 +195,10 @@ module CharDet
|
|
195
195
|
# return its order if it is hiragana
|
196
196
|
if aStr.length > 1
|
197
197
|
if (aStr[0..0] == "\202") and (aStr[1..1] >= "\x9F") and (aStr[1..1] <= "\xF1")
|
198
|
-
return aStr[1] - 0x9F, charLen
|
198
|
+
return aStr[1].ord - 0x9F, charLen
|
199
199
|
end
|
200
200
|
end
|
201
|
-
|
201
|
+
|
202
202
|
return -1, charLen
|
203
203
|
end
|
204
204
|
end
|
@@ -219,7 +219,7 @@ module CharDet
|
|
219
219
|
# return its order if it is hiragana
|
220
220
|
if aStr.length > 1
|
221
221
|
if (aStr[0..0] == "\xA4") and (aStr[1..1] >= "\xA1") and (aStr[1..1] <= "\xF3")
|
222
|
-
return aStr[1] - 0xA1, charLen
|
222
|
+
return aStr[1].ord - 0xA1, charLen
|
223
223
|
end
|
224
224
|
end
|
225
225
|
|
data/rchardet.gemspec
CHANGED
@@ -3,12 +3,12 @@ $:.push File.expand_path("../lib", __FILE__)
|
|
3
3
|
|
4
4
|
Gem::Specification.new do |s|
|
5
5
|
s.name = "rchardet19"
|
6
|
-
s.version = "1.3.
|
6
|
+
s.version = "1.3.6"
|
7
7
|
s.authors = ["Jeff Hodges", "Édouard Brière", "Linus Oleander"]
|
8
8
|
s.email = "linus@oleander.nu"
|
9
9
|
s.homepage = "https://github.com/oleander/rchardet"
|
10
10
|
s.platform = Gem::Platform::RUBY
|
11
|
-
s.summary = "
|
11
|
+
s.summary = "Ruby 1.9 compatible character encoding auto-detection library"
|
12
12
|
s.description = "Character encoding auto-detection in Ruby. This library is a port of the auto-detection code in Mozilla. It means taking a sequence of bytes in an unknown character encoding, and attempting to determine the encoding so you can read the text. It’s like cracking a code when you don’t have the decryption key."
|
13
13
|
s.files = `git ls-files`.split("\n")
|
14
14
|
s.test_files = `git ls-files -- {test,spec,features}/*`.split("\n")
|
metadata
CHANGED
@@ -1,43 +1,44 @@
|
|
1
|
-
--- !ruby/object:Gem::Specification
|
1
|
+
--- !ruby/object:Gem::Specification
|
2
2
|
name: rchardet19
|
3
|
-
version: !ruby/object:Gem::Version
|
4
|
-
|
5
|
-
version: 1.3.5
|
3
|
+
version: !ruby/object:Gem::Version
|
4
|
+
version: 1.3.6
|
6
5
|
platform: ruby
|
7
|
-
authors:
|
6
|
+
authors:
|
8
7
|
- Jeff Hodges
|
9
|
-
- "
|
8
|
+
- "Édouard Brière"
|
10
9
|
- Linus Oleander
|
11
10
|
autorequire:
|
12
11
|
bindir: bin
|
13
12
|
cert_chain: []
|
14
|
-
|
15
|
-
|
16
|
-
|
17
|
-
dependencies:
|
18
|
-
- !ruby/object:Gem::Dependency
|
13
|
+
date: 2014-04-18 00:00:00.000000000 Z
|
14
|
+
dependencies:
|
15
|
+
- !ruby/object:Gem::Dependency
|
19
16
|
name: rspec
|
20
|
-
|
21
|
-
|
22
|
-
none: false
|
23
|
-
requirements:
|
17
|
+
requirement: !ruby/object:Gem::Requirement
|
18
|
+
requirements:
|
24
19
|
- - ">="
|
25
|
-
- !ruby/object:Gem::Version
|
26
|
-
version:
|
20
|
+
- !ruby/object:Gem::Version
|
21
|
+
version: '0'
|
27
22
|
type: :development
|
28
|
-
|
29
|
-
|
23
|
+
prerelease: false
|
24
|
+
version_requirements: !ruby/object:Gem::Requirement
|
25
|
+
requirements:
|
26
|
+
- - ">="
|
27
|
+
- !ruby/object:Gem::Version
|
28
|
+
version: '0'
|
29
|
+
description: Character encoding auto-detection in Ruby. This library is a port of
|
30
|
+
the auto-detection code in Mozilla. It means taking a sequence of bytes in an unknown
|
31
|
+
character encoding, and attempting to determine the encoding so you can read the
|
32
|
+
text. It’s like cracking a code when you don’t have the decryption key.
|
30
33
|
email: linus@oleander.nu
|
31
34
|
executables: []
|
32
|
-
|
33
35
|
extensions: []
|
34
|
-
|
35
|
-
extra_rdoc_files:
|
36
|
+
extra_rdoc_files:
|
36
37
|
- README.markdown
|
37
38
|
- COPYING
|
38
|
-
files:
|
39
|
-
- .gitignore
|
40
|
-
- .rspec
|
39
|
+
files:
|
40
|
+
- ".gitignore"
|
41
|
+
- ".rspec"
|
41
42
|
- COPYING
|
42
43
|
- Gemfile
|
43
44
|
- Gemfile.lock
|
@@ -81,34 +82,29 @@ files:
|
|
81
82
|
- rchardet.gemspec
|
82
83
|
- spec/rchardet19_spec.rb
|
83
84
|
- spec/spec_helper.rb
|
84
|
-
has_rdoc: true
|
85
85
|
homepage: https://github.com/oleander/rchardet
|
86
86
|
licenses: []
|
87
|
-
|
87
|
+
metadata: {}
|
88
88
|
post_install_message:
|
89
89
|
rdoc_options: []
|
90
|
-
|
91
|
-
require_paths:
|
90
|
+
require_paths:
|
92
91
|
- lib
|
93
|
-
required_ruby_version: !ruby/object:Gem::Requirement
|
94
|
-
|
95
|
-
requirements:
|
92
|
+
required_ruby_version: !ruby/object:Gem::Requirement
|
93
|
+
requirements:
|
96
94
|
- - ">="
|
97
|
-
- !ruby/object:Gem::Version
|
98
|
-
version:
|
99
|
-
required_rubygems_version: !ruby/object:Gem::Requirement
|
100
|
-
|
101
|
-
requirements:
|
95
|
+
- !ruby/object:Gem::Version
|
96
|
+
version: '0'
|
97
|
+
required_rubygems_version: !ruby/object:Gem::Requirement
|
98
|
+
requirements:
|
102
99
|
- - ">="
|
103
|
-
- !ruby/object:Gem::Version
|
104
|
-
version:
|
100
|
+
- !ruby/object:Gem::Version
|
101
|
+
version: '0'
|
105
102
|
requirements: []
|
106
|
-
|
107
103
|
rubyforge_project:
|
108
|
-
rubygems_version: 1.
|
104
|
+
rubygems_version: 2.1.8
|
109
105
|
signing_key:
|
110
|
-
specification_version:
|
111
|
-
summary:
|
112
|
-
test_files:
|
106
|
+
specification_version: 4
|
107
|
+
summary: Ruby 1.9 compatible character encoding auto-detection library
|
108
|
+
test_files:
|
113
109
|
- spec/rchardet19_spec.rb
|
114
110
|
- spec/spec_helper.rb
|