ruby_marks 0.1.5 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/README.md CHANGED
@@ -65,13 +65,20 @@ require 'ruby_marks'
65
65
  How it Works
66
66
  ------------
67
67
 
68
- The gem will scan a document column search for this small full-filled black rectangles **(clock marks)**.
69
- For each clock mark found, it will perform a line scan in each group looking for a marked position.
70
- In the end, returns a hash with each correspondent mark found in the group and the clock.
68
+ Using a template document, you should especify the expected area where each group is. By applying an edge detect algorithm
69
+ it will discover where the groups are, and will check if they are near the expected position.
70
+ After the groups being found, the gem will perform a scan in each group in order to recognize their marks.
71
+ In the end, returns a hash with each correspondent mark found in the group.
71
72
 
72
- The gem will not perform deskew in your documents. If the document have skew, then you should apply your own
73
+ The gem will not perform deskew in your documents. If the document have a huge skew, then you should apply your own
73
74
  deskew method on the file before.
74
75
 
76
+ ```
77
+ NOTE:
78
+ We changed the way it recognizes the marks. It's not based on clocks anymore. If you are updating the gem
79
+ from 0.1.4 version, you should refactor your code to eliminate the clocks parameters and adjust
80
+ some new configurations.
81
+ ```
75
82
 
76
83
  Usage
77
84
  -----
@@ -83,105 +90,136 @@ That said, lets describe it's basic structure. The example will assume a directo
83
90
 
84
91
  [![Document Example](https://raw.github.com/andrerpbts/ruby_marks/master/assets/sheet_demo2.png)](https://github.com/andrerpbts/ruby_marks/blob/master/assets/sheet_demo2.png)
85
92
 
86
- Then, we write a basic code to scan it and print result on console:
93
+
94
+ First, we will need to get the pixels coordinates, using one document as template, of the areas
95
+ where the expected groups are. This image can explain where to pick each position:
96
+
97
+ [![Document Example](https://raw.github.com/andrerpbts/ruby_marks/master/assets/sheet_demo2_group_coords.png)](https://github.com/andrerpbts/ruby_marks/blob/master/assets/sheet_demo2_group_coords.png)
98
+
99
+
100
+ The threshold level should be adjusted too, in order to don't get a too bright or too polluted marks. See:
101
+
102
+ [![Document Example](https://raw.github.com/andrerpbts/ruby_marks/master/assets/threshold_examples.png)](https://github.com/andrerpbts/ruby_marks/blob/master/assets/threshold_examples.png)
103
+
104
+
105
+ Then, we write a basic code to scan it and print result on console (each option available are described bellow):
87
106
 
88
107
  ```ruby
108
+ # Instantiate the Recognizer
89
109
  recognizer = RubyMarks::Recognizer.new
90
- recognizer.configure do |config|
91
110
 
92
- config.clock_marks_scan_x = 20
93
- config.clock_width = 29
94
- config.clock_height = 12
111
+ # Configuring the document aspects
112
+ recognizer.configure do |config|
113
+ config.threshold_level = 90
114
+ config.default_expected_lines = 5
95
115
 
96
- config.define_group :one do |group|
97
- group.clocks_range = 1..5
98
- group.x_distance_from_clock = 89
116
+ config.define_group :first do |group|
117
+ group.expected_coordinates = {x1: 34, y1: 6, x2: 160, y2: 134}
99
118
  end
100
119
 
101
- config.define_group :two do |group|
102
- group.clocks_range = 1..5
103
- group.x_distance_from_clock = 315
120
+ config.define_group :second do |group|
121
+ group.expected_coordinates = {x1: 258, y1: 6, x2: 388, y2: 134}
104
122
  end
105
123
 
106
- config.define_group :three do |group|
107
- group.clocks_range = 1..5
108
- group.x_distance_from_clock = 542
124
+ config.define_group :third do |group|
125
+ group.expected_coordinates = {x1: 486, y1: 6, x2: 614, y2: 134}
109
126
  end
110
127
 
111
- config.define_group :four do |group|
112
- group.clocks_range = 1..5
113
- group.x_distance_from_clock = 769
128
+ config.define_group :fourth do |group|
129
+ group.expected_coordinates = {x1: 714, y1: 6, x2: 844, y2: 134}
114
130
  end
115
131
 
116
- config.define_group :five do |group|
117
- group.clocks_range = 1..5
118
- group.x_distance_from_clock = 996
132
+ config.define_group :fifth do |group|
133
+ group.expected_coordinates = {x1: 942, y1: 6, x2: 1068, y2: 134}
119
134
  end
120
135
  end
136
+ ```
137
+
138
+
139
+ Then we need to adjust the edge level to make sure the groups are being highlighted enough to being recognized.
140
+ You can see the image after the edge algorithm is applied if you write the file after submit it to Recognizer. Like this:
141
+
142
+ ```ruby
143
+ recognizer.file = 'example.png'
144
+ file = @recognizer.file
145
+ filename = "temp_image.png"
146
+ file.write(filename)
147
+ ```
148
+
149
+ The result image should be like this one (note that all the groups are separated from the rest of the document these white blocks):
150
+
151
+ [![Document Example](https://raw.github.com/andrerpbts/ruby_marks/master/assets/sheet_demo2_edge.png)](https://github.com/andrerpbts/ruby_marks/blob/master/assets/sheet_demo2_edge.png)
152
+
153
+
154
+ There's a method you can call to will help you to identify how the document is being recognized. This method return the image
155
+ with the showing where is the expected groups coordinates are, where are the actual groups coordinates, and where the marks
156
+ is being recognized in each group.
121
157
 
158
+ Example:
159
+
160
+ ```ruby
161
+ flagged_document = recognizer.flag_all_marks
162
+ flagged_document.write(temp_filename)
163
+ ```
164
+
165
+ Will return the image below with recognized clock marks in green, the clock_marks_scan_x line in blue and
166
+ each mark position in a red cross:
167
+
168
+ [![Flagged Document Example](https://raw.github.com/andrerpbts/ruby_marks/master/assets/sheet_demo2_flagged.png)](https://github.com/andrerpbts/ruby_marks/blob/master/assets/sheet_demo2_flagged.png)
169
+
170
+
171
+ With all this configured, we can submit our images to a scan:
172
+
173
+ ```ruby
174
+ # Read all documents in directory thats in a png format
122
175
  Dir["./*.png"].each do |file|
123
176
  recognizer.file = file
124
177
  puts recognizer.scan
125
178
  end
126
179
  ```
127
180
 
128
- This should puts each scan in a hash, like this:
181
+ And, this should puts each scan in a hash, like this:
129
182
 
130
183
  ```
131
184
  {
132
- :clock_1 => {
133
- :group_one => ['A'],
134
- :group_two => ['E'],
135
- :group_three => ['B'],
136
- :group_four => ['B'],
137
- :group_five => ['B']
185
+ first: {
186
+ 1 => ['A'],
187
+ 2 => ['C'],
188
+ 3 => ['B'],
189
+ 4 => ['B'],
190
+ 5 => ['D']
138
191
  },
139
- :clock_2 => {
140
- :group_one => ['C'],
141
- :group_two => ['A'],
142
- :group_three => ['B'],
143
- :group_four => ['E'],
144
- :group_five => ['A']
192
+ second: {
193
+ 1 => ['E'],
194
+ 2 => ['A'],
195
+ 3 => ['B'],
196
+ 4 => ['A'],
197
+ 5 => ['B']
145
198
  },
146
- :clock_3 => {
147
- :group_one => ['B'],
148
- :group_two => ['B'],
149
- :group_three => ['D'],
150
- :group_four => ['A'],
151
- :group_five => ['A']
199
+ three: {
200
+ 1 => ['B'],
201
+ 2 => ['B'],
202
+ 3 => ['D'],
203
+ 4 => ['B'],
204
+ 5 => ['B']
152
205
  },
153
- :clock_4 => {
154
- :group_one => ['B'],
155
- :group_two => ['A'],
156
- :group_three => ['B'],
157
- :group_four => ['C'],
158
- :group_five => ['C']
206
+ four: {
207
+ 1 => ['B'],
208
+ 2 => ['E'],
209
+ 3 => ['A'],
210
+ 4 => ['C'],
211
+ 5 => ['D']
159
212
  },
160
- :clock_5 => {
161
- :group_one => ['D'],
162
- :group_two => ['B'],
163
- :group_three => ['B'],
164
- :group_four => ['D'],
165
- :group_five => ['D']
213
+ five: {
214
+ 1 => ['B'],
215
+ 2 => ['A'],
216
+ 3 => ['A'],
217
+ 4 => ['C'],
218
+ 5 => ['D']
166
219
  }
167
220
  }
168
221
  ```
169
222
 
170
- There's a method you can call to will help you to configure the positions. This method return the image
171
- with the markups of encountered clock marks, where the marks is being recognized and where the clock_marks_scan_x
172
- config is making the column search.
173
-
174
- Example:
175
-
176
- ```ruby
177
- flagged_document = recognizer.flag_all_marks
178
- flagged_document.write(temp_filename)
179
- ```
180
-
181
- Will return the image below with recognized clock marks in green, the clock_marks_scan_x line in blue and
182
- each mark position in a red cross:
183
-
184
- [![Flagged Document Example](https://raw.github.com/andrerpbts/ruby_marks/master/assets/sheet_demo2_flagged.png)](https://github.com/andrerpbts/ruby_marks/blob/master/assets/sheet_demo2_flagged.png)
185
223
 
186
224
 
187
225
  General Configuration Options
@@ -190,66 +228,38 @@ General Configuration Options
190
228
  As you may see, it's necessary configure some document aspects to make this work properly. So, lets describe
191
229
  each general configuration option available:
192
230
 
193
- ### Threshold level
194
-
195
- ```ruby
196
- # Applies the given percentual in the image in order to get it back with only black and white pixels.
197
- # Low percentuals will result in a bright image, as High percentuals will result in a more darken image.
198
- # The default value is 60
199
-
200
- config.threshold_level = 60
201
- ```
202
-
203
- ### Distance in axis X from margin to scan the clock marks
231
+ ### Edge level
204
232
 
205
233
  ```ruby
206
- # Defines the X distance from the left margin (in pixels) to look for the valids (black) pixels
207
- # of the clock marks in this column. This configuration is very important because each type of document may
208
- # have the clock marks in a specific and different column, and this configuration that will indicate
209
- # a X pixel column that cross all the clocks.
210
- # The default value is 62 but only for tests purposes. You SHOULD calculate this value and set
211
- # a new one.
212
-
213
- config.clock_marks_scan_x = 62
234
+ # The size of the edge to apply in the edge detect algorithm.
235
+ # The default value is 4, but is very important you verify the algorithm result and adjust it to work.
236
+ config.edge_level = 4
214
237
  ```
215
238
 
216
- ### Clock sizes
217
-
218
- ```ruby
219
- # Defines the expected width and height of clock marks (in pixels). With the tolerance, if the first
220
- # recognized clock exceeds or stricts those values, it will be ignored...
221
- # The default values is 26 to width and 12 to height. Since the clock marks can be different, you SHOULD
222
- # calculate those sizes for your documents.
223
-
224
- config.clock_width = 26
225
- config.clock_height = 12
226
- ```
227
-
228
- ### Tolerance on the size of clock mark
239
+ ### Threshold level
229
240
 
230
241
  ```ruby
231
- # Indicates the actual tolerance (in pixels) for the clock mark found. That means the clock can be smaller or
232
- # larger than expected, by the number of pixels set in this option.
233
- # The default value is 2
242
+ # Applies the given percentual in the image in order to get it back with only black and white pixels.
243
+ # Low percentuals will result in a bright image, as High percentuals will result in a more darken image.
244
+ # The default value is 60, but is very important you verify the algorithm result and adjust it to work.
234
245
 
235
- config.clock_mark_size_tolerance = 2
246
+ config.threshold_level = 60
236
247
  ```
237
248
 
238
- ### Expected clocks count
249
+ ### Expected lines
239
250
 
240
251
  ```ruby
241
- # If this value is defined (above 0), the scan will perform a check if the clocks found on document
242
- # is identical with this expected number. If different, the scan will be stopped.
243
- # This config is mandatory if you want to raise the Clock Mark Difference Watcher.
244
- # The default value is 0
252
+ # The scan will raise the incorrect group watcher if one or more group don't have the expected number of lines
253
+ # Here, this configuration becomes valid to all groups.
254
+ # The default value is 20, but is very
245
255
 
246
- config.expected_clocks_count = 0
256
+ config.default_expected_lines = 20
247
257
  ```
248
258
 
249
259
  ### Default mark sizes
250
260
 
251
261
  ```ruby
252
- # Defines the expected width and height of the marks (in pixels). With the tolerance, if the first recognized
262
+ # Defines the expected width and height of the marks (in pixels). With the tolerance, if the recognized
253
263
  # mark exceeds or stricts those values, it will be ignored.
254
264
  # The default values is 20 to width and 20 to height. Since the marks can be different, you SHOULD
255
265
  # calculate those sizes for your documents.
@@ -258,6 +268,17 @@ config.default_mark_width = 20
258
268
  config.default_mark_height = 20
259
269
  ```
260
270
 
271
+ ### Default mark sizes tolerances
272
+
273
+ ```ruby
274
+ # Defines the tolerance in width and height of the marks (in pixels). With the the mark size, if the recognized
275
+ # mark exceeds or stricts those values, it will be ignored.
276
+ # The default values is 4 for both width and height.
277
+
278
+ config.default_mark_width_tolerance = 4
279
+ config.default_mark_height_tolerance = 4
280
+ ```
281
+
261
282
  ### Intensity percentual
262
283
 
263
284
  ```ruby
@@ -285,13 +306,22 @@ config.default_marks_options = %w{A B C D E}
285
306
 
286
307
  ```ruby
287
308
  # Defines the distance (in pixel) between the middle of a mark and the middle of the next mark in the same group.
288
- # The scan will begin in the first mark, by the value in pixels it have from the right corner of the clock.
289
- # After it, each mark option in the group will be checked based in this distance.
309
+ # This option is used to try suppose not found marks.
290
310
  # The default value is 25
291
311
 
292
312
  config.default_distance_between_marks = 25
293
313
  ```
294
314
 
315
+ ### Adjust bnconsistent bubbles
316
+
317
+ ```ruby
318
+ # If true, it will perform an analysis in each group in order to see if there's more or less than expected bubbles,
319
+ # an will try to remove or add these inconsistent marks.
320
+ # The default value is true
321
+
322
+ config.adjust_inconsistent_bubbles = true
323
+ ```
324
+
295
325
 
296
326
  Group Configuration Options
297
327
  ---------------------------
@@ -299,6 +329,14 @@ Group Configuration Options
299
329
  The General Configuration Options is more generic for the entire document. So, you can have some particularities
300
330
  when defining a group. So:
301
331
 
332
+ ### Expected coordinates
333
+
334
+ ```ruby
335
+ # This configuration defines the area coordinate where the group is expected to be.
336
+
337
+ group.expected_coordinates = {x1: 145, y1: 780, x2: 270, y2: 1290}
338
+ ```
339
+
302
340
  ### Mark sizes
303
341
 
304
342
  ```ruby
@@ -316,15 +354,6 @@ group.mark_height = RubyMarks.default_mark_height
316
354
  group.marks_options = RubyMarks.default_marks_options
317
355
  ```
318
356
 
319
- ### Distance in axis X from clock
320
-
321
- ```ruby
322
- # Defines the distance from the right corner of the clock mark to the middle of the first mark in the group
323
- # It don't have a default value, you MUST set this value for each group in your document
324
-
325
- group.x_distance_from_clock = 89
326
- ```
327
-
328
357
  ### Distance Between Marks
329
358
 
330
359
  ```ruby
@@ -333,26 +362,25 @@ group.x_distance_from_clock = 89
333
362
  group.distance_between_marks = RubyMarks.default_distance_between_marks
334
363
  ```
335
364
 
336
- ### Clocks range
365
+ ### Expected lines
337
366
 
338
367
  ```ruby
339
- # Defines the clock ranges this group belongs to. This range that will consider what clock mark
340
- # should be returned in the result of the scan.
368
+ # It overwrites the default_expected_lines values for the group you configure it.
341
369
 
342
- group.clocks_range = 1..5
370
+ group.expected_lines = @recognizer.config.default_expected_lines
343
371
  ```
344
372
 
345
373
 
346
374
  Watchers
347
375
  --------
348
376
 
349
- Sometimes, due some image flaws, the scan can't recognize some clock mark, or a mark, or even recognize
377
+ Sometimes, due some image flaws, the scan can't recognize some group, or a mark, or even recognize
350
378
  more than one mark in a clock row in the same group when it is not expected. Then, you can place some
351
- watchers, that will perform some custom code made by yourself in those cases. The available watchers are:
352
- In the watchers you can, for example, apply a deskew in image and re-run the scan. But, be advised, if you
353
- call the scan method again inside the watcher, you should make sure that you have a way to leave the watcher
354
- to avoid a endless loop. You always can check how many times the watcher got raised by checking in
355
- `recognizer.raised_watchers[:watcher_name]` hash.
379
+ watchers, that will perform some custom code made by yourself in those cases, such applies a deskew
380
+ in image and re-run the scan, for example.
381
+ But, be advised, if you call the scan method again inside the watcher, you should make sure that you
382
+ have a way to leave the watcher to avoid a endless loop. You always can check how many times the watcher
383
+ got raised by checking in `recognizer.raised_watchers[:watcher_name]` hash.
356
384
 
357
385
 
358
386
  ### Scan Mark Watcher
@@ -390,14 +418,15 @@ recognizer.add_watcher :scan_multiple_marked_watcher do |recognizer, result|
390
418
  end
391
419
  ```
392
420
 
393
- ### Clock Mark Difference Watcher
421
+ ### Incorrect Group Watcher
394
422
 
395
423
  ```ruby
396
- # Will execute your custom code if didn't recognizes your expected clock marks count.
397
- # In order to raise this watcher you must define the `config.expected_clocks_count`.
398
- # It returns the recognizer object.
424
+ # Will execute your custom code if didn't a group isn't found, or it have a line count different than expected,
425
+ # or in one or more lines the options marks found are different of the specified in marks options.
426
+ # It returns the recognizer object, a boolean value to incorrect expected lines count, and a boolean value
427
+ # to incorrect bubble line found, and a boolean value to bubbles adjusted or not.
399
428
 
400
- recognizer.add_watcher :clock_mark_difference_watcher do |recognizer|
429
+ recognizer.add_watcher :clock_mark_difference_watcher do |recognizer, incorrect_expected_lines, incorrect_bubble_line_found, bubbles_adjusted|
401
430
  # place your custom code
402
431
  end
403
432
  ```
@@ -3,58 +3,37 @@ module RubyMarks
3
3
 
4
4
  class Config
5
5
 
6
- attr_accessor :clock_marks_scan_x, :expected_clocks_count, :intensity_percentual, :recognition_colors,
7
- :default_marks_options, :default_distance_between_marks,
8
- :clock_width, :clock_height, :threshold_level, :clock_mark_size_tolerance,
9
- :default_mark_width, :default_mark_height
6
+ attr_accessor :intensity_percentual, :edge_level, :default_marks_options, :threshold_level,
7
+ :default_mark_width, :default_mark_height,
8
+ :default_mark_width_tolerance, :default_mark_height_tolerance,
9
+ :default_distance_between_marks, :adjust_inconsistent_bubbles,
10
+ :default_expected_lines
11
+
10
12
 
11
13
  def initialize(recognizer)
12
14
  @recognizer = recognizer
13
15
  @threshold_level = RubyMarks.threshold_level
16
+ @edge_level = RubyMarks.edge_level
14
17
 
15
- @intensity_percentual = RubyMarks.intensity_percentual
16
- @recognition_colors = RubyMarks.recognition_colors
18
+ @adjust_inconsistent_bubbles = RubyMarks.adjust_inconsistent_bubbles
17
19
 
18
- @expected_clocks_count = RubyMarks.expected_clocks_count
19
- @clock_marks_scan_x = RubyMarks.clock_marks_scan_x
20
- @clock_width = RubyMarks.clock_width
21
- @clock_height = RubyMarks.clock_height
22
- @clock_mark_size_tolerance = RubyMarks.clock_mark_size_tolerance
20
+ @intensity_percentual = RubyMarks.intensity_percentual
23
21
 
24
22
  @default_mark_width = RubyMarks.default_mark_width
25
23
  @default_mark_height = RubyMarks.default_mark_height
24
+
25
+ @default_mark_width_tolerance = RubyMarks.default_mark_width_tolerance
26
+ @default_mark_height_tolerance = RubyMarks.default_mark_height_tolerance
27
+
26
28
  @default_marks_options = RubyMarks.default_marks_options
27
29
  @default_distance_between_marks = RubyMarks.default_distance_between_marks
30
+ @default_expected_lines = RubyMarks.default_expected_lines
28
31
  end
29
32
 
30
33
  def calculated_threshold_level
31
34
  Magick::QuantumRange * (@threshold_level.to_f / 100)
32
35
  end
33
36
 
34
- def clock_width_with_down_tolerance
35
- @clock_width - @clock_mark_size_tolerance
36
- end
37
-
38
- def clock_width_with_up_tolerance
39
- @clock_width + @clock_mark_size_tolerance
40
- end
41
-
42
- def clock_height_with_down_tolerance
43
- @clock_height - @clock_mark_size_tolerance
44
- end
45
-
46
- def clock_height_with_up_tolerance
47
- @clock_height + @clock_mark_size_tolerance
48
- end
49
-
50
- def clock_width_tolerance_range
51
- clock_width_with_down_tolerance..clock_width_with_up_tolerance
52
- end
53
-
54
- def clock_height_tolerance_range
55
- clock_height_with_down_tolerance..clock_height_with_up_tolerance
56
- end
57
-
58
37
  def define_group(group_label, &block)
59
38
  group = RubyMarks::Group.new(group_label, @recognizer, &block)
60
39
  @recognizer.add_group(group)
@@ -2,33 +2,63 @@
2
2
  module RubyMarks
3
3
 
4
4
  class Group
5
- attr_reader :label, :recognizer, :clocks_range
5
+ attr_reader :label, :recognizer
6
+ attr_accessor :mark_width, :mark_height, :marks_options, :coordinates, :expected_coordinates,
7
+ :mark_width_tolerance, :mark_height_tolerance, :marks, :distance_between_marks
6
8
 
7
- attr_accessor :mark_width, :mark_height, :marks_options, :x_distance_from_clock,
8
- :distance_between_marks
9
9
 
10
10
  def initialize(label, recognizer)
11
11
  @label = label
12
12
  @recognizer = recognizer
13
- @mark_width = @recognizer.config.default_mark_width
13
+
14
+ @mark_width = @recognizer.config.default_mark_width
14
15
  @mark_height = @recognizer.config.default_mark_height
16
+
17
+ @mark_width_tolerance = @recognizer.config.default_mark_width_tolerance
18
+ @mark_height_tolerance = @recognizer.config.default_mark_height_tolerance
19
+
15
20
  @marks_options = @recognizer.config.default_marks_options
16
21
  @distance_between_marks = @recognizer.config.default_distance_between_marks
17
- @x_distance_from_clock = 0
18
- @clocks_range = 0..0
22
+
23
+ @expected_lines = @recognizer.config.default_expected_lines
24
+ @expected_coordinates = {}
19
25
  yield self if block_given?
20
26
  end
21
27
 
22
- def clocks_range=(value)
23
- value = value..value if value.is_a?(Fixnum)
24
- @clocks_range = value if value.is_a?(Range)
28
+
29
+ def incorrect_expected_lines
30
+ @expected_lines != marks.count
31
+ end
32
+
33
+ def mark_width_with_down_tolerance
34
+ @mark_width - @mark_width_tolerance
35
+ end
36
+
37
+
38
+ def mark_width_with_up_tolerance
39
+ @mark_width + @mark_width_tolerance
40
+ end
41
+
42
+
43
+ def mark_height_with_down_tolerance
44
+ @mark_height - @mark_height_tolerance
45
+ end
46
+
47
+
48
+ def mark_height_with_up_tolerance
49
+ @mark_height + @mark_height_tolerance
50
+ end
51
+
52
+
53
+ def mark_width_tolerance_range
54
+ mark_width_with_down_tolerance..mark_width_with_up_tolerance
25
55
  end
26
56
 
27
- def belongs_to_clock?(clock)
28
- if @clocks_range.is_a?(Range)
29
- return @clocks_range.include? clock
30
- end
57
+
58
+ def mark_height_tolerance_range
59
+ mark_height_with_down_tolerance..mark_height_with_up_tolerance
31
60
  end
61
+
32
62
  end
33
63
 
34
64
  end