rtesseract 2.2.0 → 3.0.0

Sign up to get free protection for your applications and to get access to all the features.
Files changed (55) hide show
  1. checksums.yaml +5 -5
  2. data/.document +1 -2
  3. data/.gitignore +12 -0
  4. data/.rspec +2 -0
  5. data/.travis.yml +13 -10
  6. data/CODE_OF_CONDUCT.md +74 -0
  7. data/Gemfile +4 -17
  8. data/Gemfile.lock +40 -85
  9. data/LICENSE.txt +18 -17
  10. data/README.md +137 -0
  11. data/Rakefile +4 -48
  12. data/bin/console +14 -0
  13. data/bin/setup +8 -0
  14. data/lib/rtesseract.rb +22 -220
  15. data/lib/rtesseract/box.rb +15 -60
  16. data/lib/rtesseract/check.rb +14 -0
  17. data/lib/rtesseract/command.rb +41 -0
  18. data/lib/rtesseract/configuration.rb +15 -64
  19. data/lib/rtesseract/pdf.rb +18 -0
  20. data/lib/rtesseract/text.rb +9 -0
  21. data/lib/rtesseract/tsv.rb +18 -0
  22. data/lib/rtesseract/version.rb +3 -0
  23. data/rtesseract.gemspec +27 -98
  24. metadata +36 -85
  25. data/README.rdoc +0 -156
  26. data/VERSION +0 -1
  27. data/lib/processors/mini_magick.rb +0 -43
  28. data/lib/processors/none.rb +0 -34
  29. data/lib/processors/rmagick.rb +0 -46
  30. data/lib/rtesseract/blob.rb +0 -34
  31. data/lib/rtesseract/box_char.rb +0 -31
  32. data/lib/rtesseract/errors.rb +0 -21
  33. data/lib/rtesseract/mixed.rb +0 -54
  34. data/lib/rtesseract/processor.rb +0 -19
  35. data/lib/rtesseract/utils.rb +0 -44
  36. data/lib/rtesseract/uzn.rb +0 -47
  37. data/spec/configs/eng.user-words.txt +0 -13
  38. data/spec/images/README.pdf +0 -0
  39. data/spec/images/blank.tif +0 -0
  40. data/spec/images/mixed.tif +0 -0
  41. data/spec/images/orientation_reverse.png +0 -0
  42. data/spec/images/test with spaces.tif +0 -0
  43. data/spec/images/test-pdf.png +0 -0
  44. data/spec/images/test.bmp +0 -0
  45. data/spec/images/test.jpg +0 -0
  46. data/spec/images/test.png +0 -0
  47. data/spec/images/test.tif +0 -0
  48. data/spec/images/test1.tif +0 -0
  49. data/spec/images/test_words.png +0 -0
  50. data/spec/rtesseract_box_char_spec.rb +0 -82
  51. data/spec/rtesseract_box_spec.rb +0 -36
  52. data/spec/rtesseract_mixed_spec.rb +0 -49
  53. data/spec/rtesseract_spec.rb +0 -282
  54. data/spec/rtesseract_uzn_spec.rb +0 -56
  55. data/spec/spec_helper.rb +0 -21
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
- SHA1:
3
- metadata.gz: cb2f55f98e4b22827068fd2284a893788d72a751
4
- data.tar.gz: f6b3ff2bfff6d4e250c75b239354485670b3d732
2
+ SHA256:
3
+ metadata.gz: 00e35fcd674159d47c7252be586ba214c42f374d1e671cab2ae8a47672237d92
4
+ data.tar.gz: e6601b67346092513af4399c49bb161b99afccb84f277395fe8f27417d2f61da
5
5
  SHA512:
6
- metadata.gz: ba2dcb1878a1c98013a6c4c0ad4a583a8b830faf01fede4df8b39a1d6e492ef79cc3610dee60ab856937e7ac7a3c7ae8789ae74fd32f80dbc910cd488a5f0651
7
- data.tar.gz: aacbcfe446dd8050a6d45b78dab5d38468cb6110c4c6216c1877bfa2bbee08bdc6763e9b0c5c0ec7fe40aac8fc43a553da533b33449c60f40d21f6e6c8034faa
6
+ metadata.gz: 92713fe105ad0b96ca28fc61f530c4dcfb8a6905aeb9d7206dfb951afab7f482e10b1ffc397f5d2e8cf86f9bd0a69be65e8da791ba0dd32655c29341bf0a67af
7
+ data.tar.gz: 4601e51dc96489292cfd0b54653f149bfcde9a830b3c3071ed2bc55778ab393405c267172ea06aa1e2ad3f64c615b845ba0aaf9a5317288555ee50e913bc196f
data/.document CHANGED
@@ -1,5 +1,4 @@
1
1
  lib/**/*.rb
2
2
  bin/*
3
- -
4
- features/**/*.feature
3
+ -
5
4
  LICENSE.txt
data/.gitignore ADDED
@@ -0,0 +1,12 @@
1
+ /.bundle/
2
+ /.yardoc
3
+ /_yardoc/
4
+ /coverage/
5
+ /coverage.data
6
+ /doc/
7
+ /pkg/
8
+ /spec/reports/
9
+ /tmp/
10
+
11
+ # rspec failure tracking
12
+ .rspec_status
data/.rspec CHANGED
@@ -1,2 +1,4 @@
1
+ --format documentation
1
2
  --color
2
3
  --order rand
4
+ --require spec_helper
data/.travis.yml CHANGED
@@ -1,13 +1,16 @@
1
- sudo: required
2
- dist: trusty
1
+ ---
2
+
3
+ sudo: false
4
+ dist: bionic
3
5
  language: ruby
4
- addons:
5
- apt:
6
- packages:
7
- - tesseract-ocr
6
+ cache: bundler
7
+
8
+ before_install:
9
+ - sudo add-apt-repository ppa:alex-p/tesseract-ocr -y
10
+ - sudo apt-get update -q
11
+ - sudo apt-get install tesseract-ocr tesseract-ocr-eng ghostscript -y
12
+ - gem install bundler -v 1.17.1
8
13
 
9
14
  rvm:
10
- - 2.1.10
11
- - 2.2.6
12
- - 2.3.3
13
- - 2.4.0
15
+ - 2.4.5
16
+ - 2.5.3
@@ -0,0 +1,74 @@
1
+ # Contributor Covenant Code of Conduct
2
+
3
+ ## Our Pledge
4
+
5
+ In the interest of fostering an open and welcoming environment, we as
6
+ contributors and maintainers pledge to making participation in our project and
7
+ our community a harassment-free experience for everyone, regardless of age, body
8
+ size, disability, ethnicity, gender identity and expression, level of experience,
9
+ nationality, personal appearance, race, religion, or sexual identity and
10
+ orientation.
11
+
12
+ ## Our Standards
13
+
14
+ Examples of behavior that contributes to creating a positive environment
15
+ include:
16
+
17
+ * Using welcoming and inclusive language
18
+ * Being respectful of differing viewpoints and experiences
19
+ * Gracefully accepting constructive criticism
20
+ * Focusing on what is best for the community
21
+ * Showing empathy towards other community members
22
+
23
+ Examples of unacceptable behavior by participants include:
24
+
25
+ * The use of sexualized language or imagery and unwelcome sexual attention or
26
+ advances
27
+ * Trolling, insulting/derogatory comments, and personal or political attacks
28
+ * Public or private harassment
29
+ * Publishing others' private information, such as a physical or electronic
30
+ address, without explicit permission
31
+ * Other conduct which could reasonably be considered inappropriate in a
32
+ professional setting
33
+
34
+ ## Our Responsibilities
35
+
36
+ Project maintainers are responsible for clarifying the standards of acceptable
37
+ behavior and are expected to take appropriate and fair corrective action in
38
+ response to any instances of unacceptable behavior.
39
+
40
+ Project maintainers have the right and responsibility to remove, edit, or
41
+ reject comments, commits, code, wiki edits, issues, and other contributions
42
+ that are not aligned to this Code of Conduct, or to ban temporarily or
43
+ permanently any contributor for other behaviors that they deem inappropriate,
44
+ threatening, offensive, or harmful.
45
+
46
+ ## Scope
47
+
48
+ This Code of Conduct applies both within project spaces and in public spaces
49
+ when an individual is representing the project or its community. Examples of
50
+ representing a project or community include using an official project e-mail
51
+ address, posting via an official social media account, or acting as an appointed
52
+ representative at an online or offline event. Representation of a project may be
53
+ further defined and clarified by project maintainers.
54
+
55
+ ## Enforcement
56
+
57
+ Instances of abusive, harassing, or otherwise unacceptable behavior may be
58
+ reported by contacting the project team at daniloj.dasilva@gmail.com. All
59
+ complaints will be reviewed and investigated and will result in a response that
60
+ is deemed necessary and appropriate to the circumstances. The project team is
61
+ obligated to maintain confidentiality with regard to the reporter of an incident.
62
+ Further details of specific enforcement policies may be posted separately.
63
+
64
+ Project maintainers who do not follow or enforce the Code of Conduct in good
65
+ faith may face temporary or permanent repercussions as determined by other
66
+ members of the project's leadership.
67
+
68
+ ## Attribution
69
+
70
+ This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
71
+ available at [http://contributor-covenant.org/version/1/4][version]
72
+
73
+ [homepage]: http://contributor-covenant.org
74
+ [version]: http://contributor-covenant.org/version/1/4/
data/Gemfile CHANGED
@@ -1,19 +1,6 @@
1
- source 'http://rubygems.org'
2
- # Add dependencies to develop your gem here.
3
- # Include everything needed to run rake, tests, features, etc.
4
- gem 'nokogiri'
1
+ source "https://rubygems.org"
5
2
 
6
- group :development do
7
- gem 'rspec'
8
- gem 'rdoc'
9
- gem 'bundler'
10
- gem 'jeweler'
11
- gem 'simplecov'
12
- gem 'json'
13
- gem 'coveralls', require: false
14
- end
3
+ git_source(:github) {|repo_name| "https://github.com/#{repo_name}" }
15
4
 
16
- group :test do
17
- gem 'rmagick'
18
- gem 'mini_magick'
19
- end
5
+ # Specify your gem's dependencies in rtesseract.gemspec
6
+ gemspec
data/Gemfile.lock CHANGED
@@ -1,103 +1,58 @@
1
+ PATH
2
+ remote: .
3
+ specs:
4
+ rtesseract (3.0.0)
5
+ nokogiri
6
+
1
7
  GEM
2
- remote: http://rubygems.org/
8
+ remote: https://rubygems.org/
3
9
  specs:
4
- addressable (2.4.0)
5
- builder (3.2.3)
6
- coveralls (0.7.2)
7
- multi_json (~> 1.3)
8
- rest-client (= 1.6.7)
9
- simplecov (>= 0.7)
10
- term-ansicolor (= 1.2.2)
11
- thor (= 0.18.1)
12
- descendants_tracker (0.0.4)
13
- thread_safe (~> 0.3, >= 0.3.1)
10
+ coveralls (0.8.22)
11
+ json (>= 1.8, < 3)
12
+ simplecov (~> 0.16.1)
13
+ term-ansicolor (~> 1.3)
14
+ thor (~> 0.19.4)
15
+ tins (~> 1.6)
14
16
  diff-lcs (1.3)
15
- docile (1.1.5)
16
- faraday (0.9.2)
17
- multipart-post (>= 1.2, < 3)
18
- git (1.3.0)
19
- github_api (0.16.0)
20
- addressable (~> 2.4.0)
21
- descendants_tracker (~> 0.0.4)
22
- faraday (~> 0.8, < 0.10)
23
- hashie (>= 3.4)
24
- mime-types (>= 1.16, < 3.0)
25
- oauth2 (~> 1.0)
26
- hashie (3.5.7)
27
- highline (1.7.10)
28
- jeweler (2.3.7)
29
- builder
30
- bundler (>= 1)
31
- git (>= 1.2.5)
32
- github_api (~> 0.16.0)
33
- highline (>= 1.6.15)
34
- nokogiri (>= 1.5.10)
35
- psych (~> 2.2)
36
- rake
37
- rdoc
38
- semver2
17
+ docile (1.3.1)
39
18
  json (2.1.0)
40
- jwt (1.5.6)
41
- mime-types (2.99.3)
42
- mini_magick (4.8.0)
43
- mini_portile2 (2.3.0)
44
- multi_json (1.12.2)
45
- multi_xml (0.6.0)
46
- multipart-post (2.0.0)
47
- nokogiri (1.8.1)
48
- mini_portile2 (~> 2.3.0)
49
- oauth2 (1.4.0)
50
- faraday (>= 0.8, < 0.13)
51
- jwt (~> 1.0)
52
- multi_json (~> 1.3)
53
- multi_xml (~> 0.5)
54
- rack (>= 1.2, < 3)
55
- psych (2.2.4)
56
- rack (2.0.3)
57
- rake (12.3.0)
58
- rdoc (6.0.1)
59
- rest-client (1.6.7)
60
- mime-types (>= 1.16)
61
- rmagick (2.16.0)
62
- rspec (3.7.0)
63
- rspec-core (~> 3.7.0)
64
- rspec-expectations (~> 3.7.0)
65
- rspec-mocks (~> 3.7.0)
66
- rspec-core (3.7.1)
67
- rspec-support (~> 3.7.0)
68
- rspec-expectations (3.7.0)
19
+ mini_portile2 (2.4.0)
20
+ nokogiri (1.9.1)
21
+ mini_portile2 (~> 2.4.0)
22
+ rake (10.5.0)
23
+ rspec (3.8.0)
24
+ rspec-core (~> 3.8.0)
25
+ rspec-expectations (~> 3.8.0)
26
+ rspec-mocks (~> 3.8.0)
27
+ rspec-core (3.8.0)
28
+ rspec-support (~> 3.8.0)
29
+ rspec-expectations (3.8.2)
69
30
  diff-lcs (>= 1.2.0, < 2.0)
70
- rspec-support (~> 3.7.0)
71
- rspec-mocks (3.7.0)
31
+ rspec-support (~> 3.8.0)
32
+ rspec-mocks (3.8.0)
72
33
  diff-lcs (>= 1.2.0, < 2.0)
73
- rspec-support (~> 3.7.0)
74
- rspec-support (3.7.0)
75
- semver2 (3.4.2)
76
- simplecov (0.15.1)
77
- docile (~> 1.1.0)
34
+ rspec-support (~> 3.8.0)
35
+ rspec-support (3.8.0)
36
+ simplecov (0.16.1)
37
+ docile (~> 1.1)
78
38
  json (>= 1.8, < 3)
79
39
  simplecov-html (~> 0.10.0)
80
40
  simplecov-html (0.10.2)
81
- term-ansicolor (1.2.2)
82
- tins (~> 0.8)
83
- thor (0.18.1)
84
- thread_safe (0.3.6)
85
- tins (0.13.2)
41
+ term-ansicolor (1.7.0)
42
+ tins (~> 1.0)
43
+ thor (0.19.4)
44
+ tins (1.20.2)
86
45
 
87
46
  PLATFORMS
88
47
  ruby
89
48
 
90
49
  DEPENDENCIES
91
- bundler
50
+ bundler (~> 1.17)
92
51
  coveralls
93
- jeweler
94
- json
95
- mini_magick
96
- nokogiri
97
- rdoc
98
- rmagick
99
- rspec
52
+ rake (~> 10.0)
53
+ rspec (~> 3.0)
54
+ rtesseract!
100
55
  simplecov
101
56
 
102
57
  BUNDLED WITH
103
- 1.16.1
58
+ 1.17.2
data/LICENSE.txt CHANGED
@@ -1,20 +1,21 @@
1
- Copyright (c) 2014 Danilo Jeremias da Silva
1
+ The MIT License (MIT)
2
2
 
3
- Permission is hereby granted, free of charge, to any person obtaining
4
- a copy of this software and associated documentation files (the
5
- "Software"), to deal in the Software without restriction, including
6
- without limitation the rights to use, copy, modify, merge, publish,
7
- distribute, sublicense, and/or sell copies of the Software, and to
8
- permit persons to whom the Software is furnished to do so, subject to
9
- the following conditions:
3
+ Copyright (c) 2018 Danilo Jeremias da Silva
10
4
 
11
- The above copyright notice and this permission notice shall be
12
- included in all copies or substantial portions of the Software.
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
13
11
 
14
- THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
15
- EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
16
- MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
17
- NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
18
- LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
19
- OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
20
- WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
12
+ The above copyright notice and this permission notice shall be included in
13
+ all copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
21
+ THE SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,137 @@
1
+ # RTesseract
2
+
3
+ <a href='http://badge.fury.io/rb/rtesseract'>
4
+ <img src="https://badge.fury.io/rb/rtesseract.png" alt="Gem Version" />
5
+ </a>
6
+ <a href='https://travis-ci.org/dannnylo/rtesseract'>
7
+ <img src="https://travis-ci.org/dannnylo/rtesseract.png?branch=master" alt="Build Status" />
8
+ </a>
9
+ <a href='https://coveralls.io/r/dannnylo/rtesseract?branch=master'>
10
+ <img src="https://coveralls.io/repos/dannnylo/rtesseract/badge.png?branch=master" alt="Coverage Status" />
11
+ </a>
12
+ <a href='https://codeclimate.com/github/dannnylo/rtesseract'>
13
+ <img src="https://codeclimate.com/github/dannnylo/rtesseract.png" />
14
+ </a>
15
+
16
+ Ruby library for working with the Tesseract OCR.
17
+
18
+ ## Installation
19
+
20
+ Check if tesseract ocr programs is installed:
21
+
22
+ $ tesseract --version
23
+
24
+ Add this line to your application's Gemfile:
25
+
26
+ ```ruby
27
+ gem 'rtesseract'
28
+ ```
29
+
30
+ And then execute:
31
+
32
+ $ bundle
33
+
34
+ Or install it yourself as:
35
+
36
+ $ gem install rtesseract
37
+
38
+ ## Usage
39
+
40
+ It's very simple to use rtesseract.
41
+
42
+ ### Convert image to string
43
+
44
+ ```ruby
45
+ image = RTesseract.new("my_image.jpg")
46
+ image.to_s # Getting the value
47
+ ```
48
+
49
+ ### Convert image to searchable PDF
50
+
51
+ ```ruby
52
+ image = RTesseract.new("my_image.jpg")
53
+ image.to_pdf # Getting open file of pdf
54
+ ```
55
+
56
+ ### Convert image to TSV
57
+
58
+ ```ruby
59
+ image = RTesseract.new("my_image.jpg")
60
+ image.to_tsv # Getting open file of pdf
61
+ ```
62
+
63
+ This will preserve the image colors, pictures and structure in the generated pdf.
64
+
65
+ ## Options
66
+
67
+ ### Language
68
+
69
+ ```ruby
70
+ RTesseract.new('test.jpg', lang: 'deu')
71
+ ```
72
+
73
+ * eng - English
74
+ * deu - German
75
+ * deu-f - German fraktur
76
+ * fra - French
77
+ * ita - Italian
78
+ * nld - Dutch
79
+ * por - Portuguese
80
+ * spa - Spanish
81
+ * vie - Vietnamese
82
+ * or any other supported by tesseract.
83
+
84
+ Note: Make sure you have installed the language to tesseract
85
+
86
+ ### Other options
87
+
88
+ ```ruby
89
+ RTesseract.new('test.jpg', config_file: :digits) # Only digit recognition
90
+ ```
91
+
92
+ OR
93
+
94
+ ```ruby
95
+ RTesseract.new('test.jpg', config_file: 'digits quiet')
96
+ ```
97
+
98
+ ### BOUNDING BOX: TO GET WORDS WITH THEIR POSITIONS
99
+ ```ruby
100
+ RTesseract.new('test_words.png').to_box
101
+ ```
102
+
103
+ # => [
104
+ # {:word => 'If', :x_start=>52, :y_start=>13, :x_end=>63, :y_end=>27},
105
+ # {:word => 'you', :x_start=>69, :y_start=>17, :x_end=>100, :y_end=>31},
106
+ # {:word => 'are', :x_start=>108, :y_start=>17, :x_end=>136, :y_end=>27},
107
+ # {:word => 'a', :x_start=>143, :y_start=>17, :x_end=>151, :y_end=>27},
108
+ # {:word => 'friend,', :x_start=>158, :y_start=>13, :x_end=>214, :y_end=>29},
109
+ # {:word => 'you', :x_start=>51, :y_start=>39, :x_end=>82, :y_end=>53},
110
+ # {:word => 'speak', :x_start=>90, :y_start=>35, :x_end=>140, :y_end=>53},
111
+ # {:word => 'the', :x_start=>146, :y_start=>35, :x_end=>174, :y_end=>49},
112
+ # {:word => 'password,', :x_start=>182, :y_start=>35, :x_end=>267, :y_end=>53},
113
+ # {:word => 'and', :x_start=>51, :y_start=>57, :x_end=>81, :y_end=>71},
114
+ # {:word => 'the', :x_start=>89, :y_start=>57, :x_end=>117, :y_end=>71},
115
+ # {:word => 'doors', :x_start=>124, :y_start=>57, :x_end=>172, :y_end=>71},
116
+ # {:word => 'will', :x_start=>180, :y_start=>57, :x_end=>208, :y_end=>71},
117
+ # {:word => 'open.', :x_start=>216, :y_start=>61, :x_end=>263, :y_end=>75}
118
+ # ]
119
+
120
+
121
+ ## Development
122
+
123
+ After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
124
+
125
+ To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org).
126
+
127
+ ## Contributing
128
+
129
+ Bug reports and pull requests are welcome on GitHub at https://github.com/dannnylo/rtesseract. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the [Contributor Covenant](http://contributor-covenant.org) code of conduct.
130
+
131
+ ## License
132
+
133
+ The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).
134
+
135
+ ## Code of Conduct
136
+
137
+ Everyone interacting in the Rtesseract project’s codebases, issue trackers, chat rooms and mailing lists is expected to follow the [code of conduct](https://github.com/dannnylo/rtesseract/blob/master/CODE_OF_CONDUCT.md).