tesseract-ocr 0.1.5 → 0.1.6

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: 1cb827519f64b7aba71ee22666e49e952a92ec41
4
+ data.tar.gz: 2687f882e6e704ebd53026b522e471d7aa57c914
5
+ SHA512:
6
+ metadata.gz: 356b6ed6de748cfaf3dbb017346400a802e4a0c060759f29cc2ddb254448ab4584d655f4bc794ba198cbd43bda35d8cb802665ff2f74ae60ca1f305c89d59f88
7
+ data.tar.gz: 2340f05a490d53fb809a5bdcefed75a547674bbcc02b3200cf505335547e073f9fd2a2a04409e31de55865ae3afdc08d5121480246d06bce0432e4e4b17d794f
data/README.md CHANGED
@@ -1,17 +1,25 @@
1
1
  ruby-tesseract - Ruby bindings and wrapper
2
2
  ==========================================
3
- This wrapper binds the TessBaseAPI object through ffi-inline (which means it will work on JRuby too)
4
- and then proceeds to wrap said API in a more ruby-esque Engine class.
3
+ This wrapper binds the TessBaseAPI object through ffi-inline (which means it
4
+ will work on JRuby too) and then proceeds to wrap said API in a more ruby-esque
5
+ Engine class.
5
6
 
6
7
  Making it work
7
8
  --------------
8
- To make this library work you need tesseract-ocr and leptonica libraries and headers and a C++ compiler.
9
+ To make this library work you need tesseract-ocr and leptonica libraries and
10
+ headers and a C++ compiler.
9
11
 
10
12
  The gem is called `tesseract-ocr`.
11
13
 
14
+ If you're on a distribution that separates the libraries from headers, remember
15
+ to install the *-dev* package.
16
+
17
+ On Debian you will need to `libleptonica-dev` and `libtesseract-dev`.
18
+
12
19
  Examples
13
20
  --------
14
- Following are some examples that show the functionalities provided by tesseract-ocr.
21
+ Following are some examples that show the functionalities provided by
22
+ tesseract-ocr.
15
23
 
16
24
  ### Basic functionality of tesseract
17
25
 
@@ -26,21 +34,21 @@ e = Tesseract::Engine.new {|e|
26
34
  e.text_for('test/first.png').strip # => 'ABC'
27
35
  ```
28
36
 
29
- You can pass to `#text_for` either a path, an IO object, a string containing the image or
30
- an object that responds to `#to_blob` (for example Magick::Image), keep in mind that
31
- the format has to be supported by leptonica.
37
+ You can pass to `#text_for` either a path, an IO object, a string containing
38
+ the image or an object that responds to `#to_blob` (for example
39
+ Magick::Image), keep in mind that the format has to be supported by leptonica.
32
40
 
33
41
  ### Accessing advanced features
34
42
 
35
- With advanced features you get access to blocks, paragraphs, lines, words and symbols.
43
+ With advanced features you get access to blocks, paragraphs, lines, words and
44
+ symbols.
36
45
 
37
- There are lot of way to access those levels, the methods are the following (replace level
38
- with one of the accessible features, so `each_level` can be `each_block` or `each_paragraph`
39
- etc.)
46
+ Replace **level** in method names with either `block`, `paragraph`, `line`,
47
+ `word` or `symbol`.
40
48
 
41
- The following kind of accessors need a block to be passed and they pass to the block each
42
- `Element` object. The Element object has various getters to access certain features, I'll
43
- talk about them later.
49
+ The following kind of accessors need a block to be passed and they pass to the
50
+ block each `Element` object. The Element object has various getters to access
51
+ certain features, I'll talk about them later.
44
52
 
45
53
  The methods are:
46
54
 
@@ -48,9 +56,10 @@ The methods are:
48
56
  * `each_level_for`
49
57
  * `each_level_at`
50
58
 
51
- The following accessors instead return an `Array` of `Element`s with cached getters, the getters
52
- are cached beacause the values accessible in the `Element` are linked to the state of the internal
53
- API, and that state changes if you access something else.
59
+ The following accessors instead return an `Array` of `Element`s with cached
60
+ getters, the getters are cached beacause the values accessible in the `Element`
61
+ are linked to the state of the internal API, and that state changes if you
62
+ access something else.
54
63
 
55
64
  The methods are:
56
65
 
@@ -65,16 +74,19 @@ Each `Element` object has the following getters:
65
74
  * `bounding_box`, this will return the box where the element is confined into
66
75
  * `binary_image`, this will return the bichromatic image of the element
67
76
  * `image`, this will return the image of the element
68
- * `baseline`, this will return the line where the text is with a pair of coordinates
77
+ * `baseline`, this will return the line where the text is with a pair of
78
+ coordinates
69
79
  * `orientation`, this will return the orientation of the element
70
80
  * `text`, this will return the text of the element
71
81
  * `confidence`, this will return the confidence of correctness for the element
72
82
 
73
83
  `Block` elements also have `type` accessors that specify the type of the block.
74
84
 
75
- `Word` elements also have `font_attributes`, `from_dictionary?` and `numeric?` getters.
85
+ `Word` elements also have `font_attributes`, `from_dictionary?` and `numeric?`
86
+ getters.
76
87
 
77
- `Symbol` elements also have `superscript?`, `subscript?` and `dropcap?` getters.
88
+ `Symbol` elements also have `superscript?`, `subscript?` and `dropcap?`
89
+ getters.
78
90
 
79
91
  Using the binary
80
92
  ----------------
@@ -97,3 +109,7 @@ ABC
97
109
  > tesseract.rb -c test/first.png
98
110
  86
99
111
  ```
112
+
113
+ License
114
+ -------
115
+ The license is BSD one clause.
@@ -22,7 +22,7 @@ OptionParser.new do |o|
22
22
  end
23
23
 
24
24
  o.on '-p', '--psm MODE', 'page segmentation mode to use' do |value|
25
- options[:psm] = value
25
+ options[:psm] = value.to_i
26
26
  end
27
27
 
28
28
  o.on '-u', '--unlv', 'output in UNLV format' do
@@ -23,11 +23,11 @@
23
23
  #++
24
24
 
25
25
  module Tesseract
26
- def prefix
26
+ def self.prefix
27
27
  ENV['TESSDATA_PREFIX']
28
28
  end
29
29
 
30
- def prefix= (path)
30
+ def self.prefix=(path)
31
31
  ENV['TESSDATA_PREFIX'] = path
32
32
  end
33
33
  end
@@ -155,12 +155,15 @@ class API
155
155
 
156
156
  def get_text
157
157
  pointer = C::BaseAPI.get_utf8_text(to_ffi)
158
- result = pointer.read_string
158
+
159
+ return if pointer.null?
160
+
161
+ result = pointer.read_string
159
162
  result.force_encoding 'UTF-8'
160
163
 
161
164
  result
162
165
  ensure
163
- C.free_array_of_char(pointer)
166
+ C.free_array_of_char(pointer) unless pointer.null?
164
167
  end
165
168
 
166
169
  def get_box (page = 0)
@@ -25,7 +25,7 @@
25
25
  module Tesseract; class API
26
26
 
27
27
  class Image
28
- def self.new (image)
28
+ def self.new (image, x = 0, y = 0)
29
29
  image = if image.is_a?(String) && (File.exists?(File.expand_path(image)) rescue nil)
30
30
  C::Leptonica.pix_read(File.expand_path(image))
31
31
  elsif image.is_a?(String)
@@ -36,11 +36,13 @@ class Image
36
36
  image = image.to_blob
37
37
 
38
38
  C::Leptonica.pix_read_mem(image, image.bytesize)
39
+ else
40
+ image
39
41
  end
40
42
 
41
43
  raise ArgumentError, 'invalid image' if image.nil? || image.null?
42
44
 
43
- super(image)
45
+ super(image, x, y)
44
46
  end
45
47
 
46
48
  attr_accessor :x, :y
@@ -68,10 +70,8 @@ class Image
68
70
  size = FFI::MemoryPointer.new(:size_t)
69
71
 
70
72
  C::Leptonica.pix_write_mem(to_ffi, data, size, C.for_enum(format))
71
-
72
73
  result = data.typecast(:pointer).read_string(size.typecast(:size_t))
73
-
74
- data.typecast(:pointer).free
74
+ C.free(data.typecast(:pointer))
75
75
 
76
76
  result
77
77
  end
@@ -62,7 +62,7 @@ class Iterator
62
62
  def get_image (level = :word, padding = 0)
63
63
  image = C::Iterator.get_image(to_ffi, C.for_enum(level), padding)
64
64
 
65
- Image.new(image.pix, image.x, image.y)
65
+ Image.new(image[:pix], image[:x], image[:y])
66
66
  end
67
67
 
68
68
  def baseline (level = :word)
@@ -75,12 +75,15 @@ class Iterator
75
75
 
76
76
  def get_text (level = :word)
77
77
  pointer = C::Iterator.get_utf8_text(to_ffi, C.for_enum(level))
78
- result = pointer.read_string
78
+
79
+ return if pointer.null?
80
+
81
+ result = pointer.read_string
79
82
  result.force_encoding 'UTF-8'
80
83
 
81
84
  result
82
85
  ensure
83
- C.free_array_of_char(pointer)
86
+ C.free_array_of_char(pointer) unless pointer.null?
84
87
  end
85
88
 
86
89
  def confidence (level = :word)
@@ -35,6 +35,12 @@ module C
35
35
  cpp.include 'tesseract/strngs.h'
36
36
  cpp.libraries 'tesseract'
37
37
 
38
+ cpp.function %{
39
+ void free (void* pointer) {
40
+ free(pointer);
41
+ }
42
+ }
43
+
38
44
  cpp.function %{
39
45
  void free_array_of_char (char* pointer) {
40
46
  delete [] pointer;
@@ -48,7 +48,7 @@ module Leptonica
48
48
  }, blocking: true
49
49
 
50
50
  cpp.function %{
51
- Pix* pix_read_fd (int fd) {
51
+ Pix* pix_read_stream (int fd) {
52
52
  return pixReadStream(fdopen(fd, "rb"), 0);
53
53
  }
54
54
  }, blocking: true
@@ -24,6 +24,6 @@
24
24
 
25
25
  module Tesseract
26
26
  def self.version
27
- '0.1.5'
27
+ '0.1.6'
28
28
  end
29
29
  end
@@ -1,13 +1,14 @@
1
1
  Kernel.load 'lib/tesseract/version.rb'
2
2
 
3
3
  Gem::Specification.new {|s|
4
- s.name = 'tesseract-ocr'
5
- s.version = Tesseract.version
6
- s.author = 'meh.'
7
- s.email = 'meh@paranoici.org'
8
- s.homepage = 'http://github.com/meh/ruby-tesseract-ocr'
9
- s.platform = Gem::Platform::RUBY
10
- s.summary = 'A wrapper library to the tesseract-ocr API.'
4
+ s.name = 'tesseract-ocr'
5
+ s.version = Tesseract.version
6
+ s.author = 'meh.'
7
+ s.email = 'meh@schizofreni.co'
8
+ s.homepage = 'http://github.com/meh/ruby-tesseract-ocr'
9
+ s.platform = Gem::Platform::RUBY
10
+ s.summary = 'A wrapper library to the tesseract-ocr API.'
11
+ s.license = 'BSD'
11
12
 
12
13
  s.files = `git ls-files`.split("\n")
13
14
  s.executables = `git ls-files -- bin/*`.split("\n").map { |f| File.basename(f) }
metadata CHANGED
@@ -1,82 +1,73 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: tesseract-ocr
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.5
5
- prerelease:
4
+ version: 0.1.6
6
5
  platform: ruby
7
6
  authors:
8
7
  - meh.
9
8
  autorequire:
10
9
  bindir: bin
11
10
  cert_chain: []
12
- date: 2012-04-29 00:00:00.000000000 Z
11
+ date: 2014-02-24 00:00:00.000000000 Z
13
12
  dependencies:
14
13
  - !ruby/object:Gem::Dependency
15
14
  name: call-me
16
15
  requirement: !ruby/object:Gem::Requirement
17
- none: false
18
16
  requirements:
19
- - - ! '>='
17
+ - - ">="
20
18
  - !ruby/object:Gem::Version
21
19
  version: '0'
22
20
  type: :runtime
23
21
  prerelease: false
24
22
  version_requirements: !ruby/object:Gem::Requirement
25
- none: false
26
23
  requirements:
27
- - - ! '>='
24
+ - - ">="
28
25
  - !ruby/object:Gem::Version
29
26
  version: '0'
30
27
  - !ruby/object:Gem::Dependency
31
28
  name: iso-639
32
29
  requirement: !ruby/object:Gem::Requirement
33
- none: false
34
30
  requirements:
35
- - - ! '>='
31
+ - - ">="
36
32
  - !ruby/object:Gem::Version
37
33
  version: '0'
38
34
  type: :runtime
39
35
  prerelease: false
40
36
  version_requirements: !ruby/object:Gem::Requirement
41
- none: false
42
37
  requirements:
43
- - - ! '>='
38
+ - - ">="
44
39
  - !ruby/object:Gem::Version
45
40
  version: '0'
46
41
  - !ruby/object:Gem::Dependency
47
42
  name: ffi-extra
48
43
  requirement: !ruby/object:Gem::Requirement
49
- none: false
50
44
  requirements:
51
- - - ! '>='
45
+ - - ">="
52
46
  - !ruby/object:Gem::Version
53
47
  version: '0'
54
48
  type: :runtime
55
49
  prerelease: false
56
50
  version_requirements: !ruby/object:Gem::Requirement
57
- none: false
58
51
  requirements:
59
- - - ! '>='
52
+ - - ">="
60
53
  - !ruby/object:Gem::Version
61
54
  version: '0'
62
55
  - !ruby/object:Gem::Dependency
63
56
  name: ffi-inline
64
57
  requirement: !ruby/object:Gem::Requirement
65
- none: false
66
58
  requirements:
67
- - - ! '>='
59
+ - - ">="
68
60
  - !ruby/object:Gem::Version
69
61
  version: '0'
70
62
  type: :runtime
71
63
  prerelease: false
72
64
  version_requirements: !ruby/object:Gem::Requirement
73
- none: false
74
65
  requirements:
75
- - - ! '>='
66
+ - - ">="
76
67
  - !ruby/object:Gem::Version
77
68
  version: '0'
78
69
  description:
79
- email: meh@paranoici.org
70
+ email: meh@schizofreni.co
80
71
  executables:
81
72
  - tesseract-train.rb
82
73
  - tesseract.rb
@@ -221,28 +212,28 @@ files:
221
212
  - test/test-european.jpg
222
213
  - test/test.png
223
214
  homepage: http://github.com/meh/ruby-tesseract-ocr
224
- licenses: []
215
+ licenses:
216
+ - BSD
217
+ metadata: {}
225
218
  post_install_message:
226
219
  rdoc_options: []
227
220
  require_paths:
228
221
  - lib
229
222
  required_ruby_version: !ruby/object:Gem::Requirement
230
- none: false
231
223
  requirements:
232
- - - ! '>='
224
+ - - ">="
233
225
  - !ruby/object:Gem::Version
234
226
  version: '0'
235
227
  required_rubygems_version: !ruby/object:Gem::Requirement
236
- none: false
237
228
  requirements:
238
- - - ! '>='
229
+ - - ">="
239
230
  - !ruby/object:Gem::Version
240
231
  version: '0'
241
232
  requirements: []
242
233
  rubyforge_project:
243
- rubygems_version: 1.8.23
234
+ rubygems_version: 2.2.0
244
235
  signing_key:
245
- specification_version: 3
236
+ specification_version: 4
246
237
  summary: A wrapper library to the tesseract-ocr API.
247
238
  test_files:
248
239
  - test/first.png
@@ -252,3 +243,4 @@ test_files:
252
243
  - test/tesseract_spec.rb
253
244
  - test/test-european.jpg
254
245
  - test/test.png
246
+ has_rdoc: