ruby_marks 0.1.5 → 0.2.0

Sign up to get free protection for your applications and to get access to all the features.
data/README.md CHANGED
@@ -65,13 +65,20 @@ require 'ruby_marks'
65
65
  How it Works
66
66
  ------------
67
67
 
68
- The gem will scan a document column search for this small full-filled black rectangles **(clock marks)**.
69
- For each clock mark found, it will perform a line scan in each group looking for a marked position.
70
- In the end, returns a hash with each correspondent mark found in the group and the clock.
68
+ Using a template document, you should especify the expected area where each group is. By applying an edge detect algorithm
69
+ it will discover where the groups are, and will check if they are near the expected position.
70
+ After the groups being found, the gem will perform a scan in each group in order to recognize their marks.
71
+ In the end, returns a hash with each correspondent mark found in the group.
71
72
 
72
- The gem will not perform deskew in your documents. If the document have skew, then you should apply your own
73
+ The gem will not perform deskew in your documents. If the document have a huge skew, then you should apply your own
73
74
  deskew method on the file before.
74
75
 
76
+ ```
77
+ NOTE:
78
+ We changed the way it recognizes the marks. It's not based on clocks anymore. If you are updating the gem
79
+ from 0.1.4 version, you should refactor your code to eliminate the clocks parameters and adjust
80
+ some new configurations.
81
+ ```
75
82
 
76
83
  Usage
77
84
  -----
@@ -83,105 +90,136 @@ That said, lets describe it's basic structure. The example will assume a directo
83
90
 
84
91
  [![Document Example](https://raw.github.com/andrerpbts/ruby_marks/master/assets/sheet_demo2.png)](https://github.com/andrerpbts/ruby_marks/blob/master/assets/sheet_demo2.png)
85
92
 
86
- Then, we write a basic code to scan it and print result on console:
93
+
94
+ First, we will need to get the pixels coordinates, using one document as template, of the areas
95
+ where the expected groups are. This image can explain where to pick each position:
96
+
97
+ [![Document Example](https://raw.github.com/andrerpbts/ruby_marks/master/assets/sheet_demo2_group_coords.png)](https://github.com/andrerpbts/ruby_marks/blob/master/assets/sheet_demo2_group_coords.png)
98
+
99
+
100
+ The threshold level should be adjusted too, in order to don't get a too bright or too polluted marks. See:
101
+
102
+ [![Document Example](https://raw.github.com/andrerpbts/ruby_marks/master/assets/threshold_examples.png)](https://github.com/andrerpbts/ruby_marks/blob/master/assets/threshold_examples.png)
103
+
104
+
105
+ Then, we write a basic code to scan it and print result on console (each option available are described bellow):
87
106
 
88
107
  ```ruby
108
+ # Instantiate the Recognizer
89
109
  recognizer = RubyMarks::Recognizer.new
90
- recognizer.configure do |config|
91
110
 
92
- config.clock_marks_scan_x = 20
93
- config.clock_width = 29
94
- config.clock_height = 12
111
+ # Configuring the document aspects
112
+ recognizer.configure do |config|
113
+ config.threshold_level = 90
114
+ config.default_expected_lines = 5
95
115
 
96
- config.define_group :one do |group|
97
- group.clocks_range = 1..5
98
- group.x_distance_from_clock = 89
116
+ config.define_group :first do |group|
117
+ group.expected_coordinates = {x1: 34, y1: 6, x2: 160, y2: 134}
99
118
  end
100
119
 
101
- config.define_group :two do |group|
102
- group.clocks_range = 1..5
103
- group.x_distance_from_clock = 315
120
+ config.define_group :second do |group|
121
+ group.expected_coordinates = {x1: 258, y1: 6, x2: 388, y2: 134}
104
122
  end
105
123
 
106
- config.define_group :three do |group|
107
- group.clocks_range = 1..5
108
- group.x_distance_from_clock = 542
124
+ config.define_group :third do |group|
125
+ group.expected_coordinates = {x1: 486, y1: 6, x2: 614, y2: 134}
109
126
  end
110
127
 
111
- config.define_group :four do |group|
112
- group.clocks_range = 1..5
113
- group.x_distance_from_clock = 769
128
+ config.define_group :fourth do |group|
129
+ group.expected_coordinates = {x1: 714, y1: 6, x2: 844, y2: 134}
114
130
  end
115
131
 
116
- config.define_group :five do |group|
117
- group.clocks_range = 1..5
118
- group.x_distance_from_clock = 996
132
+ config.define_group :fifth do |group|
133
+ group.expected_coordinates = {x1: 942, y1: 6, x2: 1068, y2: 134}
119
134
  end
120
135
  end
136
+ ```
137
+
138
+
139
+ Then we need to adjust the edge level to make sure the groups are being highlighted enough to being recognized.
140
+ You can see the image after the edge algorithm is applied if you write the file after submit it to Recognizer. Like this:
141
+
142
+ ```ruby
143
+ recognizer.file = 'example.png'
144
+ file = @recognizer.file
145
+ filename = "temp_image.png"
146
+ file.write(filename)
147
+ ```
148
+
149
+ The result image should be like this one (note that all the groups are separated from the rest of the document these white blocks):
150
+
151
+ [![Document Example](https://raw.github.com/andrerpbts/ruby_marks/master/assets/sheet_demo2_edge.png)](https://github.com/andrerpbts/ruby_marks/blob/master/assets/sheet_demo2_edge.png)
152
+
153
+
154
+ There's a method you can call to will help you to identify how the document is being recognized. This method return the image
155
+ with the showing where is the expected groups coordinates are, where are the actual groups coordinates, and where the marks
156
+ is being recognized in each group.
121
157
 
158
+ Example:
159
+
160
+ ```ruby
161
+ flagged_document = recognizer.flag_all_marks
162
+ flagged_document.write(temp_filename)
163
+ ```
164
+
165
+ Will return the image below with recognized clock marks in green, the clock_marks_scan_x line in blue and
166
+ each mark position in a red cross:
167
+
168
+ [![Flagged Document Example](https://raw.github.com/andrerpbts/ruby_marks/master/assets/sheet_demo2_flagged.png)](https://github.com/andrerpbts/ruby_marks/blob/master/assets/sheet_demo2_flagged.png)
169
+
170
+
171
+ With all this configured, we can submit our images to a scan:
172
+
173
+ ```ruby
174
+ # Read all documents in directory thats in a png format
122
175
  Dir["./*.png"].each do |file|
123
176
  recognizer.file = file
124
177
  puts recognizer.scan
125
178
  end
126
179
  ```
127
180
 
128
- This should puts each scan in a hash, like this:
181
+ And, this should puts each scan in a hash, like this:
129
182
 
130
183
  ```
131
184
  {
132
- :clock_1 => {
133
- :group_one => ['A'],
134
- :group_two => ['E'],
135
- :group_three => ['B'],
136
- :group_four => ['B'],
137
- :group_five => ['B']
185
+ first: {
186
+ 1 => ['A'],
187
+ 2 => ['C'],
188
+ 3 => ['B'],
189
+ 4 => ['B'],
190
+ 5 => ['D']
138
191
  },
139
- :clock_2 => {
140
- :group_one => ['C'],
141
- :group_two => ['A'],
142
- :group_three => ['B'],
143
- :group_four => ['E'],
144
- :group_five => ['A']
192
+ second: {
193
+ 1 => ['E'],
194
+ 2 => ['A'],
195
+ 3 => ['B'],
196
+ 4 => ['A'],
197
+ 5 => ['B']
145
198
  },
146
- :clock_3 => {
147
- :group_one => ['B'],
148
- :group_two => ['B'],
149
- :group_three => ['D'],
150
- :group_four => ['A'],
151
- :group_five => ['A']
199
+ three: {
200
+ 1 => ['B'],
201
+ 2 => ['B'],
202
+ 3 => ['D'],
203
+ 4 => ['B'],
204
+ 5 => ['B']
152
205
  },
153
- :clock_4 => {
154
- :group_one => ['B'],
155
- :group_two => ['A'],
156
- :group_three => ['B'],
157
- :group_four => ['C'],
158
- :group_five => ['C']
206
+ four: {
207
+ 1 => ['B'],
208
+ 2 => ['E'],
209
+ 3 => ['A'],
210
+ 4 => ['C'],
211
+ 5 => ['D']
159
212
  },
160
- :clock_5 => {
161
- :group_one => ['D'],
162
- :group_two => ['B'],
163
- :group_three => ['B'],
164
- :group_four => ['D'],
165
- :group_five => ['D']
213
+ five: {
214
+ 1 => ['B'],
215
+ 2 => ['A'],
216
+ 3 => ['A'],
217
+ 4 => ['C'],
218
+ 5 => ['D']
166
219
  }
167
220
  }
168
221
  ```
169
222
 
170
- There's a method you can call to will help you to configure the positions. This method return the image
171
- with the markups of encountered clock marks, where the marks is being recognized and where the clock_marks_scan_x
172
- config is making the column search.
173
-
174
- Example:
175
-
176
- ```ruby
177
- flagged_document = recognizer.flag_all_marks
178
- flagged_document.write(temp_filename)
179
- ```
180
-
181
- Will return the image below with recognized clock marks in green, the clock_marks_scan_x line in blue and
182
- each mark position in a red cross:
183
-
184
- [![Flagged Document Example](https://raw.github.com/andrerpbts/ruby_marks/master/assets/sheet_demo2_flagged.png)](https://github.com/andrerpbts/ruby_marks/blob/master/assets/sheet_demo2_flagged.png)
185
223
 
186
224
 
187
225
  General Configuration Options
@@ -190,66 +228,38 @@ General Configuration Options
190
228
  As you may see, it's necessary configure some document aspects to make this work properly. So, lets describe
191
229
  each general configuration option available:
192
230
 
193
- ### Threshold level
194
-
195
- ```ruby
196
- # Applies the given percentual in the image in order to get it back with only black and white pixels.
197
- # Low percentuals will result in a bright image, as High percentuals will result in a more darken image.
198
- # The default value is 60
199
-
200
- config.threshold_level = 60
201
- ```
202
-
203
- ### Distance in axis X from margin to scan the clock marks
231
+ ### Edge level
204
232
 
205
233
  ```ruby
206
- # Defines the X distance from the left margin (in pixels) to look for the valids (black) pixels
207
- # of the clock marks in this column. This configuration is very important because each type of document may
208
- # have the clock marks in a specific and different column, and this configuration that will indicate
209
- # a X pixel column that cross all the clocks.
210
- # The default value is 62 but only for tests purposes. You SHOULD calculate this value and set
211
- # a new one.
212
-
213
- config.clock_marks_scan_x = 62
234
+ # The size of the edge to apply in the edge detect algorithm.
235
+ # The default value is 4, but is very important you verify the algorithm result and adjust it to work.
236
+ config.edge_level = 4
214
237
  ```
215
238
 
216
- ### Clock sizes
217
-
218
- ```ruby
219
- # Defines the expected width and height of clock marks (in pixels). With the tolerance, if the first
220
- # recognized clock exceeds or stricts those values, it will be ignored...
221
- # The default values is 26 to width and 12 to height. Since the clock marks can be different, you SHOULD
222
- # calculate those sizes for your documents.
223
-
224
- config.clock_width = 26
225
- config.clock_height = 12
226
- ```
227
-
228
- ### Tolerance on the size of clock mark
239
+ ### Threshold level
229
240
 
230
241
  ```ruby
231
- # Indicates the actual tolerance (in pixels) for the clock mark found. That means the clock can be smaller or
232
- # larger than expected, by the number of pixels set in this option.
233
- # The default value is 2
242
+ # Applies the given percentual in the image in order to get it back with only black and white pixels.
243
+ # Low percentuals will result in a bright image, as High percentuals will result in a more darken image.
244
+ # The default value is 60, but is very important you verify the algorithm result and adjust it to work.
234
245
 
235
- config.clock_mark_size_tolerance = 2
246
+ config.threshold_level = 60
236
247
  ```
237
248
 
238
- ### Expected clocks count
249
+ ### Expected lines
239
250
 
240
251
  ```ruby
241
- # If this value is defined (above 0), the scan will perform a check if the clocks found on document
242
- # is identical with this expected number. If different, the scan will be stopped.
243
- # This config is mandatory if you want to raise the Clock Mark Difference Watcher.
244
- # The default value is 0
252
+ # The scan will raise the incorrect group watcher if one or more group don't have the expected number of lines
253
+ # Here, this configuration becomes valid to all groups.
254
+ # The default value is 20, but is very
245
255
 
246
- config.expected_clocks_count = 0
256
+ config.default_expected_lines = 20
247
257
  ```
248
258
 
249
259
  ### Default mark sizes
250
260
 
251
261
  ```ruby
252
- # Defines the expected width and height of the marks (in pixels). With the tolerance, if the first recognized
262
+ # Defines the expected width and height of the marks (in pixels). With the tolerance, if the recognized
253
263
  # mark exceeds or stricts those values, it will be ignored.
254
264
  # The default values is 20 to width and 20 to height. Since the marks can be different, you SHOULD
255
265
  # calculate those sizes for your documents.
@@ -258,6 +268,17 @@ config.default_mark_width = 20
258
268
  config.default_mark_height = 20
259
269
  ```
260
270
 
271
+ ### Default mark sizes tolerances
272
+
273
+ ```ruby
274
+ # Defines the tolerance in width and height of the marks (in pixels). With the the mark size, if the recognized
275
+ # mark exceeds or stricts those values, it will be ignored.
276
+ # The default values is 4 for both width and height.
277
+
278
+ config.default_mark_width_tolerance = 4
279
+ config.default_mark_height_tolerance = 4
280
+ ```
281
+
261
282
  ### Intensity percentual
262
283
 
263
284
  ```ruby
@@ -285,13 +306,22 @@ config.default_marks_options = %w{A B C D E}
285
306
 
286
307
  ```ruby
287
308
  # Defines the distance (in pixel) between the middle of a mark and the middle of the next mark in the same group.
288
- # The scan will begin in the first mark, by the value in pixels it have from the right corner of the clock.
289
- # After it, each mark option in the group will be checked based in this distance.
309
+ # This option is used to try suppose not found marks.
290
310
  # The default value is 25
291
311
 
292
312
  config.default_distance_between_marks = 25
293
313
  ```
294
314
 
315
+ ### Adjust bnconsistent bubbles
316
+
317
+ ```ruby
318
+ # If true, it will perform an analysis in each group in order to see if there's more or less than expected bubbles,
319
+ # an will try to remove or add these inconsistent marks.
320
+ # The default value is true
321
+
322
+ config.adjust_inconsistent_bubbles = true
323
+ ```
324
+
295
325
 
296
326
  Group Configuration Options
297
327
  ---------------------------
@@ -299,6 +329,14 @@ Group Configuration Options
299
329
  The General Configuration Options is more generic for the entire document. So, you can have some particularities
300
330
  when defining a group. So:
301
331
 
332
+ ### Expected coordinates
333
+
334
+ ```ruby
335
+ # This configuration defines the area coordinate where the group is expected to be.
336
+
337
+ group.expected_coordinates = {x1: 145, y1: 780, x2: 270, y2: 1290}
338
+ ```
339
+
302
340
  ### Mark sizes
303
341
 
304
342
  ```ruby
@@ -316,15 +354,6 @@ group.mark_height = RubyMarks.default_mark_height
316
354
  group.marks_options = RubyMarks.default_marks_options
317
355
  ```
318
356
 
319
- ### Distance in axis X from clock
320
-
321
- ```ruby
322
- # Defines the distance from the right corner of the clock mark to the middle of the first mark in the group
323
- # It don't have a default value, you MUST set this value for each group in your document
324
-
325
- group.x_distance_from_clock = 89
326
- ```
327
-
328
357
  ### Distance Between Marks
329
358
 
330
359
  ```ruby
@@ -333,26 +362,25 @@ group.x_distance_from_clock = 89
333
362
  group.distance_between_marks = RubyMarks.default_distance_between_marks
334
363
  ```
335
364
 
336
- ### Clocks range
365
+ ### Expected lines
337
366
 
338
367
  ```ruby
339
- # Defines the clock ranges this group belongs to. This range that will consider what clock mark
340
- # should be returned in the result of the scan.
368
+ # It overwrites the default_expected_lines values for the group you configure it.
341
369
 
342
- group.clocks_range = 1..5
370
+ group.expected_lines = @recognizer.config.default_expected_lines
343
371
  ```
344
372
 
345
373
 
346
374
  Watchers
347
375
  --------
348
376
 
349
- Sometimes, due some image flaws, the scan can't recognize some clock mark, or a mark, or even recognize
377
+ Sometimes, due some image flaws, the scan can't recognize some group, or a mark, or even recognize
350
378
  more than one mark in a clock row in the same group when it is not expected. Then, you can place some
351
- watchers, that will perform some custom code made by yourself in those cases. The available watchers are:
352
- In the watchers you can, for example, apply a deskew in image and re-run the scan. But, be advised, if you
353
- call the scan method again inside the watcher, you should make sure that you have a way to leave the watcher
354
- to avoid a endless loop. You always can check how many times the watcher got raised by checking in
355
- `recognizer.raised_watchers[:watcher_name]` hash.
379
+ watchers, that will perform some custom code made by yourself in those cases, such applies a deskew
380
+ in image and re-run the scan, for example.
381
+ But, be advised, if you call the scan method again inside the watcher, you should make sure that you
382
+ have a way to leave the watcher to avoid a endless loop. You always can check how many times the watcher
383
+ got raised by checking in `recognizer.raised_watchers[:watcher_name]` hash.
356
384
 
357
385
 
358
386
  ### Scan Mark Watcher
@@ -390,14 +418,15 @@ recognizer.add_watcher :scan_multiple_marked_watcher do |recognizer, result|
390
418
  end
391
419
  ```
392
420
 
393
- ### Clock Mark Difference Watcher
421
+ ### Incorrect Group Watcher
394
422
 
395
423
  ```ruby
396
- # Will execute your custom code if didn't recognizes your expected clock marks count.
397
- # In order to raise this watcher you must define the `config.expected_clocks_count`.
398
- # It returns the recognizer object.
424
+ # Will execute your custom code if didn't a group isn't found, or it have a line count different than expected,
425
+ # or in one or more lines the options marks found are different of the specified in marks options.
426
+ # It returns the recognizer object, a boolean value to incorrect expected lines count, and a boolean value
427
+ # to incorrect bubble line found, and a boolean value to bubbles adjusted or not.
399
428
 
400
- recognizer.add_watcher :clock_mark_difference_watcher do |recognizer|
429
+ recognizer.add_watcher :clock_mark_difference_watcher do |recognizer, incorrect_expected_lines, incorrect_bubble_line_found, bubbles_adjusted|
401
430
  # place your custom code
402
431
  end
403
432
  ```
@@ -3,58 +3,37 @@ module RubyMarks
3
3
 
4
4
  class Config
5
5
 
6
- attr_accessor :clock_marks_scan_x, :expected_clocks_count, :intensity_percentual, :recognition_colors,
7
- :default_marks_options, :default_distance_between_marks,
8
- :clock_width, :clock_height, :threshold_level, :clock_mark_size_tolerance,
9
- :default_mark_width, :default_mark_height
6
+ attr_accessor :intensity_percentual, :edge_level, :default_marks_options, :threshold_level,
7
+ :default_mark_width, :default_mark_height,
8
+ :default_mark_width_tolerance, :default_mark_height_tolerance,
9
+ :default_distance_between_marks, :adjust_inconsistent_bubbles,
10
+ :default_expected_lines
11
+
10
12
 
11
13
  def initialize(recognizer)
12
14
  @recognizer = recognizer
13
15
  @threshold_level = RubyMarks.threshold_level
16
+ @edge_level = RubyMarks.edge_level
14
17
 
15
- @intensity_percentual = RubyMarks.intensity_percentual
16
- @recognition_colors = RubyMarks.recognition_colors
18
+ @adjust_inconsistent_bubbles = RubyMarks.adjust_inconsistent_bubbles
17
19
 
18
- @expected_clocks_count = RubyMarks.expected_clocks_count
19
- @clock_marks_scan_x = RubyMarks.clock_marks_scan_x
20
- @clock_width = RubyMarks.clock_width
21
- @clock_height = RubyMarks.clock_height
22
- @clock_mark_size_tolerance = RubyMarks.clock_mark_size_tolerance
20
+ @intensity_percentual = RubyMarks.intensity_percentual
23
21
 
24
22
  @default_mark_width = RubyMarks.default_mark_width
25
23
  @default_mark_height = RubyMarks.default_mark_height
24
+
25
+ @default_mark_width_tolerance = RubyMarks.default_mark_width_tolerance
26
+ @default_mark_height_tolerance = RubyMarks.default_mark_height_tolerance
27
+
26
28
  @default_marks_options = RubyMarks.default_marks_options
27
29
  @default_distance_between_marks = RubyMarks.default_distance_between_marks
30
+ @default_expected_lines = RubyMarks.default_expected_lines
28
31
  end
29
32
 
30
33
  def calculated_threshold_level
31
34
  Magick::QuantumRange * (@threshold_level.to_f / 100)
32
35
  end
33
36
 
34
- def clock_width_with_down_tolerance
35
- @clock_width - @clock_mark_size_tolerance
36
- end
37
-
38
- def clock_width_with_up_tolerance
39
- @clock_width + @clock_mark_size_tolerance
40
- end
41
-
42
- def clock_height_with_down_tolerance
43
- @clock_height - @clock_mark_size_tolerance
44
- end
45
-
46
- def clock_height_with_up_tolerance
47
- @clock_height + @clock_mark_size_tolerance
48
- end
49
-
50
- def clock_width_tolerance_range
51
- clock_width_with_down_tolerance..clock_width_with_up_tolerance
52
- end
53
-
54
- def clock_height_tolerance_range
55
- clock_height_with_down_tolerance..clock_height_with_up_tolerance
56
- end
57
-
58
37
  def define_group(group_label, &block)
59
38
  group = RubyMarks::Group.new(group_label, @recognizer, &block)
60
39
  @recognizer.add_group(group)
@@ -2,33 +2,63 @@
2
2
  module RubyMarks
3
3
 
4
4
  class Group
5
- attr_reader :label, :recognizer, :clocks_range
5
+ attr_reader :label, :recognizer
6
+ attr_accessor :mark_width, :mark_height, :marks_options, :coordinates, :expected_coordinates,
7
+ :mark_width_tolerance, :mark_height_tolerance, :marks, :distance_between_marks
6
8
 
7
- attr_accessor :mark_width, :mark_height, :marks_options, :x_distance_from_clock,
8
- :distance_between_marks
9
9
 
10
10
  def initialize(label, recognizer)
11
11
  @label = label
12
12
  @recognizer = recognizer
13
- @mark_width = @recognizer.config.default_mark_width
13
+
14
+ @mark_width = @recognizer.config.default_mark_width
14
15
  @mark_height = @recognizer.config.default_mark_height
16
+
17
+ @mark_width_tolerance = @recognizer.config.default_mark_width_tolerance
18
+ @mark_height_tolerance = @recognizer.config.default_mark_height_tolerance
19
+
15
20
  @marks_options = @recognizer.config.default_marks_options
16
21
  @distance_between_marks = @recognizer.config.default_distance_between_marks
17
- @x_distance_from_clock = 0
18
- @clocks_range = 0..0
22
+
23
+ @expected_lines = @recognizer.config.default_expected_lines
24
+ @expected_coordinates = {}
19
25
  yield self if block_given?
20
26
  end
21
27
 
22
- def clocks_range=(value)
23
- value = value..value if value.is_a?(Fixnum)
24
- @clocks_range = value if value.is_a?(Range)
28
+
29
+ def incorrect_expected_lines
30
+ @expected_lines != marks.count
31
+ end
32
+
33
+ def mark_width_with_down_tolerance
34
+ @mark_width - @mark_width_tolerance
35
+ end
36
+
37
+
38
+ def mark_width_with_up_tolerance
39
+ @mark_width + @mark_width_tolerance
40
+ end
41
+
42
+
43
+ def mark_height_with_down_tolerance
44
+ @mark_height - @mark_height_tolerance
45
+ end
46
+
47
+
48
+ def mark_height_with_up_tolerance
49
+ @mark_height + @mark_height_tolerance
50
+ end
51
+
52
+
53
+ def mark_width_tolerance_range
54
+ mark_width_with_down_tolerance..mark_width_with_up_tolerance
25
55
  end
26
56
 
27
- def belongs_to_clock?(clock)
28
- if @clocks_range.is_a?(Range)
29
- return @clocks_range.include? clock
30
- end
57
+
58
+ def mark_height_tolerance_range
59
+ mark_height_with_down_tolerance..mark_height_with_up_tolerance
31
60
  end
61
+
32
62
  end
33
63
 
34
64
  end