rchardet19 1.3.5 → 1.3.6
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +7 -0
- data/README.markdown +33 -30
- data/Rakefile +2 -29
- data/lib/rchardet/jpcntx.rb +7 -7
- data/rchardet.gemspec +2 -2
- metadata +40 -44
checksums.yaml
ADDED
@@ -0,0 +1,7 @@
|
|
1
|
+
---
|
2
|
+
SHA1:
|
3
|
+
metadata.gz: 437c6ce14d2dcccc100d652abf7efd8ad2f76123
|
4
|
+
data.tar.gz: 59ab35c63ab22ce50d67eefeef71830500829654
|
5
|
+
SHA512:
|
6
|
+
metadata.gz: 55592006d5309c094cc335d23ad0d668397ef3cceed5d4b95721baadc4f43048b767dc6fdddb241f4de3aab843969e4827087b9793b81ac92599991cad4c5461
|
7
|
+
data.tar.gz: 19e54f6731b6cd4ecd10e5902dc0f8f9500fc0a98658c73c0d6d1a40a81c6791fc6383fdc0344c195a7ffdd62fce44eeb13611987b8e39ed5ccb7ee1bf008c11
|
data/README.markdown
CHANGED
@@ -1,63 +1,66 @@
|
|
1
1
|
# rCharDet*19*
|
2
2
|
|
3
|
-
|
3
|
+
## [Project Page](http://rubyforge.org/projects/rchardet) | [1.9 Author](https://github.com/edouard/rchardet) | [Original Author](https://github.com/jmhodges/rchardet)
|
4
4
|
|
5
|
-
rCharDet is a character encoding detection library for ruby and the implementation is based
|
6
|
-
on Mozilla Charset Detectors.
|
5
|
+
*rCharDet* is a character encoding detection library for ruby and the implementation is based on Mozilla Charset Detectors.
|
7
6
|
|
8
|
-
This is a forked project in a effort to make it Ruby 1.9 compatible
|
7
|
+
This is a forked project in a effort to make it Ruby 1.9 compatible.
|
9
8
|
|
10
|
-
|
11
|
-
|
12
|
-
require 'rubygems'
|
13
|
-
require 'rchardet19'
|
14
|
-
|
15
|
-
>> cd = CharDet.detect("some data")
|
16
|
-
=> #<struct #<Class:0x102216198> encoding="ascii", confidence=1.0>
|
9
|
+
Follow me on [Twitter](http://twitter.com/linusoleander) or [Github](https://github.com/oleander) for more info and updates.
|
17
10
|
|
18
|
-
### How to use
|
11
|
+
### How to use
|
19
12
|
|
20
|
-
`detect` takes the variable `data` that contains an unknown encoding.
|
13
|
+
`CharDet.detect` takes the variable `data` that contains an unknown encoding.
|
21
14
|
|
22
15
|
We then try to change the encoding to UTF-8, but only if we are at least ~ 60% sure that we found the right encoding.
|
23
16
|
|
24
|
-
|
25
|
-
|
26
|
-
|
27
|
-
|
28
|
-
|
17
|
+
```` ruby
|
18
|
+
data = "Some unknown data"
|
19
|
+
cd = CharDet.detect(data)
|
20
|
+
data = cd.confidence > 0.6 ? Iconv.conv(cd.encoding, "UTF-8", data) : data
|
21
|
+
````
|
22
|
+
|
23
|
+
## What do I've to work with?
|
29
24
|
|
30
|
-
A struct is being returned from the `detect` method
|
25
|
+
A struct is being returned from the `detect` method, it has the following accessors.
|
31
26
|
|
32
27
|
- **encoding** (String) Encoding of the ingoing string, `UTF-8` for example.
|
33
28
|
- **confidence** (Float) The confidence level of the *encoding*, from 0.0 to 1.0, where 1.0 is the best.
|
34
29
|
|
35
|
-
|
30
|
+
## Make it silent
|
36
31
|
|
37
32
|
The `detect` takes two arguments, the string to guess the encoding on and an option hash.
|
38
33
|
|
39
34
|
You can use the option hash de decide if you want the `detect` method to raise an exception or not if the ingoing string is `nil`.
|
40
35
|
|
41
|
-
|
42
|
-
|
43
|
-
|
44
|
-
|
45
|
-
|
46
|
-
|
36
|
+
```` ruby
|
37
|
+
CharDet.detect("some data", :silent => true) # Won't raise an exception
|
38
|
+
CharDet.detect(nil, :silent => true) # Won't raise an exception
|
39
|
+
CharDet.detect(nil) # Will raise an exception
|
40
|
+
CharDet.detect(nil, :silent => false) # Will raise an exception
|
41
|
+
````
|
42
|
+
|
43
|
+
## How do install
|
47
44
|
|
48
45
|
[sudo] gem install rchardet19
|
49
46
|
|
50
|
-
|
47
|
+
## How to use it in a rails 3 project
|
51
48
|
|
52
49
|
Add `gem 'rchardet19'` to your Gemfile and run `bundle`.
|
53
50
|
|
54
|
-
|
51
|
+
## How to help
|
55
52
|
|
56
53
|
- Start by copying the project or make your own branch.
|
57
54
|
- Navigate to the root path of the project and run `bundle`.
|
58
55
|
- Start by running all tests using rspec, `rspec spec/rchardet19_spec.rb`.
|
59
56
|
- Implement your own code, write some tests, commit and do a pull request.
|
60
57
|
|
61
|
-
|
58
|
+
## Requirements
|
59
|
+
|
60
|
+
*rCharDet19* is tested in
|
61
|
+
- OS X 10.6.6 using Ruby 1.8.7 and 1.9.2.
|
62
|
+
- Ubuntu 12.10 using Ruby 1.9.3 and 2.0.0-p0
|
63
|
+
|
64
|
+
## License
|
62
65
|
|
63
|
-
rCharDet19 is
|
66
|
+
*rCharDet19* is released under the *MIT license*.
|
data/Rakefile
CHANGED
@@ -1,29 +1,2 @@
|
|
1
|
-
require
|
2
|
-
|
3
|
-
require 'rake/gempackagetask'
|
4
|
-
begin
|
5
|
-
require 'lib/rchardet'
|
6
|
-
rescue LoadError
|
7
|
-
module CharDet; VERSION = '0.0.0'; end
|
8
|
-
puts "Problem loading rfeedparser; try rake setup"
|
9
|
-
end
|
10
|
-
|
11
|
-
spec = Gem::Specification.new do |s|
|
12
|
-
s.name = "rchardet"
|
13
|
-
s.version = CharDet::VERSION
|
14
|
-
s.author = "Jeff Hodges"
|
15
|
-
s.email = "jeff at somethingsimilar dot com"
|
16
|
-
s.homepage = "http://github.com/jmhodges/rchardet/tree/master"
|
17
|
-
s.platform = Gem::Platform::RUBY
|
18
|
-
s.summary = "Character encoding auto-detection in Ruby. As smart as your browser. Open source."
|
19
|
-
s.files = FileList["lib/**/*"]
|
20
|
-
s.require_path = "lib"
|
21
|
-
# s.autorequire = "feedparser" # tHe 3vil according to Why.
|
22
|
-
s.has_rdoc = false # TODO: fix
|
23
|
-
s.extra_rdoc_files = ['README', 'COPYING']
|
24
|
-
s.rubyforge_project = 'rchardet'
|
25
|
-
|
26
|
-
end
|
27
|
-
|
28
|
-
Rake::GemPackageTask.new(spec) do
|
29
|
-
end
|
1
|
+
require "bundler"
|
2
|
+
Bundler::GemHelper.install_tasks
|
data/lib/rchardet/jpcntx.rb
CHANGED
@@ -14,12 +14,12 @@
|
|
14
14
|
# modify it under the terms of the GNU Lesser General Public
|
15
15
|
# License as published by the Free Software Foundation; either
|
16
16
|
# version 2.1 of the License, or (at your option) any later version.
|
17
|
-
#
|
17
|
+
#
|
18
18
|
# This library is distributed in the hope that it will be useful,
|
19
19
|
# but WITHOUT ANY WARRANTY; without even the implied warranty of
|
20
20
|
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
|
21
21
|
# Lesser General Public License for more details.
|
22
|
-
#
|
22
|
+
#
|
23
23
|
# You should have received a copy of the GNU Lesser General Public
|
24
24
|
# License along with this library; if not, write to the Free Software
|
25
25
|
# Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA
|
@@ -137,9 +137,9 @@ module CharDet
|
|
137
137
|
return if @_mDone
|
138
138
|
|
139
139
|
# The buffer we got is byte oriented, and a character may span in more than one
|
140
|
-
# buffers. In case the last one or two byte in last buffer is not complete, we
|
140
|
+
# buffers. In case the last one or two byte in last buffer is not complete, we
|
141
141
|
# record how many byte needed to complete that character and skip these bytes here.
|
142
|
-
# We can choose to record those bytes as well and analyse the character once it
|
142
|
+
# We can choose to record those bytes as well and analyse the character once it
|
143
143
|
# is complete, but since a character will not make much difference, by simply skipping
|
144
144
|
# this character will simply our logic and improve performance.
|
145
145
|
i = @_mNeedToSkipCharNum
|
@@ -195,10 +195,10 @@ module CharDet
|
|
195
195
|
# return its order if it is hiragana
|
196
196
|
if aStr.length > 1
|
197
197
|
if (aStr[0..0] == "\202") and (aStr[1..1] >= "\x9F") and (aStr[1..1] <= "\xF1")
|
198
|
-
return aStr[1] - 0x9F, charLen
|
198
|
+
return aStr[1].ord - 0x9F, charLen
|
199
199
|
end
|
200
200
|
end
|
201
|
-
|
201
|
+
|
202
202
|
return -1, charLen
|
203
203
|
end
|
204
204
|
end
|
@@ -219,7 +219,7 @@ module CharDet
|
|
219
219
|
# return its order if it is hiragana
|
220
220
|
if aStr.length > 1
|
221
221
|
if (aStr[0..0] == "\xA4") and (aStr[1..1] >= "\xA1") and (aStr[1..1] <= "\xF3")
|
222
|
-
return aStr[1] - 0xA1, charLen
|
222
|
+
return aStr[1].ord - 0xA1, charLen
|
223
223
|
end
|
224
224
|
end
|
225
225
|
|
data/rchardet.gemspec
CHANGED
@@ -3,12 +3,12 @@ $:.push File.expand_path("../lib", __FILE__)
|
|
3
3
|
|
4
4
|
Gem::Specification.new do |s|
|
5
5
|
s.name = "rchardet19"
|
6
|
-
s.version = "1.3.
|
6
|
+
s.version = "1.3.6"
|
7
7
|
s.authors = ["Jeff Hodges", "Édouard Brière", "Linus Oleander"]
|
8
8
|
s.email = "linus@oleander.nu"
|
9
9
|
s.homepage = "https://github.com/oleander/rchardet"
|
10
10
|
s.platform = Gem::Platform::RUBY
|
11
|
-
s.summary = "
|
11
|
+
s.summary = "Ruby 1.9 compatible character encoding auto-detection library"
|
12
12
|
s.description = "Character encoding auto-detection in Ruby. This library is a port of the auto-detection code in Mozilla. It means taking a sequence of bytes in an unknown character encoding, and attempting to determine the encoding so you can read the text. It’s like cracking a code when you don’t have the decryption key."
|
13
13
|
s.files = `git ls-files`.split("\n")
|
14
14
|
s.test_files = `git ls-files -- {test,spec,features}/*`.split("\n")
|
metadata
CHANGED
@@ -1,43 +1,44 @@
|
|
1
|
-
--- !ruby/object:Gem::Specification
|
1
|
+
--- !ruby/object:Gem::Specification
|
2
2
|
name: rchardet19
|
3
|
-
version: !ruby/object:Gem::Version
|
4
|
-
|
5
|
-
version: 1.3.5
|
3
|
+
version: !ruby/object:Gem::Version
|
4
|
+
version: 1.3.6
|
6
5
|
platform: ruby
|
7
|
-
authors:
|
6
|
+
authors:
|
8
7
|
- Jeff Hodges
|
9
|
-
- "
|
8
|
+
- "Édouard Brière"
|
10
9
|
- Linus Oleander
|
11
10
|
autorequire:
|
12
11
|
bindir: bin
|
13
12
|
cert_chain: []
|
14
|
-
|
15
|
-
|
16
|
-
|
17
|
-
dependencies:
|
18
|
-
- !ruby/object:Gem::Dependency
|
13
|
+
date: 2014-04-18 00:00:00.000000000 Z
|
14
|
+
dependencies:
|
15
|
+
- !ruby/object:Gem::Dependency
|
19
16
|
name: rspec
|
20
|
-
|
21
|
-
|
22
|
-
none: false
|
23
|
-
requirements:
|
17
|
+
requirement: !ruby/object:Gem::Requirement
|
18
|
+
requirements:
|
24
19
|
- - ">="
|
25
|
-
- !ruby/object:Gem::Version
|
26
|
-
version:
|
20
|
+
- !ruby/object:Gem::Version
|
21
|
+
version: '0'
|
27
22
|
type: :development
|
28
|
-
|
29
|
-
|
23
|
+
prerelease: false
|
24
|
+
version_requirements: !ruby/object:Gem::Requirement
|
25
|
+
requirements:
|
26
|
+
- - ">="
|
27
|
+
- !ruby/object:Gem::Version
|
28
|
+
version: '0'
|
29
|
+
description: Character encoding auto-detection in Ruby. This library is a port of
|
30
|
+
the auto-detection code in Mozilla. It means taking a sequence of bytes in an unknown
|
31
|
+
character encoding, and attempting to determine the encoding so you can read the
|
32
|
+
text. It’s like cracking a code when you don’t have the decryption key.
|
30
33
|
email: linus@oleander.nu
|
31
34
|
executables: []
|
32
|
-
|
33
35
|
extensions: []
|
34
|
-
|
35
|
-
extra_rdoc_files:
|
36
|
+
extra_rdoc_files:
|
36
37
|
- README.markdown
|
37
38
|
- COPYING
|
38
|
-
files:
|
39
|
-
- .gitignore
|
40
|
-
- .rspec
|
39
|
+
files:
|
40
|
+
- ".gitignore"
|
41
|
+
- ".rspec"
|
41
42
|
- COPYING
|
42
43
|
- Gemfile
|
43
44
|
- Gemfile.lock
|
@@ -81,34 +82,29 @@ files:
|
|
81
82
|
- rchardet.gemspec
|
82
83
|
- spec/rchardet19_spec.rb
|
83
84
|
- spec/spec_helper.rb
|
84
|
-
has_rdoc: true
|
85
85
|
homepage: https://github.com/oleander/rchardet
|
86
86
|
licenses: []
|
87
|
-
|
87
|
+
metadata: {}
|
88
88
|
post_install_message:
|
89
89
|
rdoc_options: []
|
90
|
-
|
91
|
-
require_paths:
|
90
|
+
require_paths:
|
92
91
|
- lib
|
93
|
-
required_ruby_version: !ruby/object:Gem::Requirement
|
94
|
-
|
95
|
-
requirements:
|
92
|
+
required_ruby_version: !ruby/object:Gem::Requirement
|
93
|
+
requirements:
|
96
94
|
- - ">="
|
97
|
-
- !ruby/object:Gem::Version
|
98
|
-
version:
|
99
|
-
required_rubygems_version: !ruby/object:Gem::Requirement
|
100
|
-
|
101
|
-
requirements:
|
95
|
+
- !ruby/object:Gem::Version
|
96
|
+
version: '0'
|
97
|
+
required_rubygems_version: !ruby/object:Gem::Requirement
|
98
|
+
requirements:
|
102
99
|
- - ">="
|
103
|
-
- !ruby/object:Gem::Version
|
104
|
-
version:
|
100
|
+
- !ruby/object:Gem::Version
|
101
|
+
version: '0'
|
105
102
|
requirements: []
|
106
|
-
|
107
103
|
rubyforge_project:
|
108
|
-
rubygems_version: 1.
|
104
|
+
rubygems_version: 2.1.8
|
109
105
|
signing_key:
|
110
|
-
specification_version:
|
111
|
-
summary:
|
112
|
-
test_files:
|
106
|
+
specification_version: 4
|
107
|
+
summary: Ruby 1.9 compatible character encoding auto-detection library
|
108
|
+
test_files:
|
113
109
|
- spec/rchardet19_spec.rb
|
114
110
|
- spec/spec_helper.rb
|