dreader 0.4.1 → 0.4.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 2216ea348440adb3c2cff5236ae8b3a0d916e4576002b400b4d3121da650dc8e
4
- data.tar.gz: 1d22f900bf6b578ddcbf28ab30cbf835cba6121dd6904f071690939ab7b84d5d
3
+ metadata.gz: 6d616b9bad780960c105a2c754e392ecf51c0cdaeeb6076c11b4042d7c40e414
4
+ data.tar.gz: 656507d726a22a24111fc239c0f640dedf90321aab2bcda74a73dc0cf730a83d
5
5
  SHA512:
6
- metadata.gz: 15a938341dfe5075bad0255fe4c623220c2f4cbd6a3e629410df1f6da58e2554bf7e5b15f17b641a57de45b694254771b9e58911d9c324582f786c8604db35c4
7
- data.tar.gz: f02e9653e465e0eedf1d0feac0aecfbb20d7ae5177b51956a917159d4405d786a049d9bb840b8520223c6aeb7239fc1f8b954ac80caa07a322fe5d06f2aeb487
6
+ metadata.gz: 9ee4f8c9367864ef01aea8d75240a3a93a7bdbcab2411b4e2d4f92df8dc4fa5840e812308a02b8c5da4afc791c358f5eff4849017f63a4d90011ebcdca24e217
7
+ data.tar.gz: 11571140e63afb0c52b33a010e701e754964150878cba7e395878c2cf2f59cc85eb6b493d5fd05f287d229ec807e8873a45aafe850be6fc30abcbbb2d1cc47fc
data/Changelog.org CHANGED
@@ -1,3 +1,14 @@
1
+ * Version 0.4.2
2
+ ** better error messages for process and check functions
3
+ dreader now captures exceptions raised by process and check and
4
+ prints and error message to stdout if an error is found.
5
+ the exception is then propagated in the standard way.
6
+ ** new method bulk_declare
7
+ bulk_declare allow to easily declare columns which don't need a
8
+ specific treatment
9
+ ** read will now complains if the argument passed is not a hash
10
+ ** virtualcols is now accessible (attr_reader)
11
+ ** fixed a bug with slice
1
12
  * Version 0.4.1
2
13
  ** fixed an issue with ~read~: it always required a hash as input
3
14
  ** changed syntax of ~debug~, which now accepts a hash as argument
data/Gemfile.lock CHANGED
@@ -1,7 +1,7 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- dreader (0.4.1)
4
+ dreader (0.4.2)
5
5
  roo
6
6
 
7
7
  GEM
data/README.md CHANGED
@@ -46,7 +46,7 @@ Or install it yourself as:
46
46
 
47
47
  ## Usage
48
48
 
49
- ### Declare the file you want to read
49
+ ### Declare what file you want to read
50
50
 
51
51
  Require `dreader` and declare an instance of the `Dreader::Engine` class:
52
52
 
@@ -137,6 +137,44 @@ end
137
137
  # we are done with our declarations)
138
138
  ```
139
139
 
140
+ If there are different columns that you want to read and process in
141
+ the same way, you can use the method `bulk_declare`, which accepts a
142
+ hash as input.
143
+
144
+ For instance:
145
+
146
+ ```ruby
147
+ i.bulk_declare {a: 'A', b: 'B'}
148
+ ```
149
+
150
+ is equivalent to:
151
+
152
+ ```ruby
153
+ i.column :a do
154
+ colref 'A'
155
+ end
156
+
157
+ i.column :b do
158
+ colref 'B'
159
+ end
160
+ ```
161
+
162
+ The method also accepts a code block, which allows to define a common
163
+ `process` function for all columns. In case, **don't forget to put
164
+ the hash in parentheses, or the Ruby parser won't be able to
165
+ distinguish the hash from the code block.** For instance:
166
+
167
+ ```ruby
168
+ i.bulk_declare({a: 'A', b: 'B'}) do
169
+ process do |cell|
170
+ ...
171
+ end
172
+ end
173
+ ```
174
+
175
+ There is an example of `bulk_declare` in the examples directory:
176
+ ([us_cities_bulk_declare.rb](examples/wikipedia_us_cities/us_cities_bulk_declare.rb)).
177
+
140
178
  **Remarks:**
141
179
 
142
180
  1. the column name can be anything ruby can use as a Hash key. You
@@ -146,7 +184,7 @@ end
146
184
  2. `colref` can be a string (e.g., `'A'`) or an integer, in which case
147
185
  the first column is one
148
186
 
149
- 3. you need to declare only the columns you want to import. For
187
+ 3. **you need to declare only the columns you want to import.** For
150
188
  instance, we could skip the declaration for column 1, if 'Date of
151
189
  Birth' is the only data we want to import
152
190
 
@@ -215,9 +253,6 @@ into a `@table` instance variable.
215
253
  i.read
216
254
  ```
217
255
 
218
- **Read applies all the `column` and `virtual_column` declarations and
219
- builds a hash with the data read.**
220
-
221
256
  After reading the file we can use `errors` to see whether any of the
222
257
  `check` functions failed:
223
258
 
@@ -228,6 +263,13 @@ array_of_strings ech do |error_line|
228
263
  end
229
264
  ```
230
265
 
266
+ We can then use `virtual_columns` to process data and generate the
267
+ virtual columns:
268
+
269
+ ```ruby
270
+ i.virtual_columns
271
+ ```
272
+
231
273
  Finally we can use the `process` function to execute the `mapping`
232
274
  directive to each line read from the file.
233
275
 
@@ -290,13 +332,11 @@ i.table
290
332
  age: { value: 31, row_number: 2, col_number: 2, errors: nil } } ]
291
333
  ```
292
334
 
293
- ## Simplifying the data read
335
+ ## Simplifying the hash with the data read
294
336
 
295
337
  The `Dreader::Util` class provides some functions to simplify and
296
338
  restructure the hashes built by `dreader`.
297
339
 
298
- More in details:
299
-
300
340
  `Dreader::Util.simplify hash` simplifies the hash passed as input by
301
341
  removing all information but the value and making the value
302
342
  accessible directly from the name of the column.
@@ -309,14 +349,35 @@ Dreader::Util.simplify i.table[0]
309
349
  `Dreader::Util.slice hash, keys` and `Dreader::Util.slice hash,
310
350
  keys`, where `keys` is an arrays of keys, are respectively used to
311
351
  select or remove some keys from `hash`.
352
+
353
+ ```ruby
354
+ i.table[0]
355
+ { name: { value: "John", row_number: 1, col_number: 1, errors: nil },
356
+ age: { value: 30, row_number: 1, col_number: 2, errors: nil }}
357
+
358
+ Dreader::Util.slice i.table[0], :name
359
+ {name: { value: "John", row_number: 1, col_number: 1, errors: nil}
360
+
361
+ Dreader::Util.clean i.table[0], :name
362
+ {age: { value: 30, row_number: 1, col_number: 2, errors: nil }
363
+ ```
364
+
365
+ The methods `slice` and `clean` are more useful when used in
366
+ conjuction with `simplify`:
312
367
 
313
368
  ```ruby
314
- Dreader::Util.slice i.table[0], [:age]
369
+ hash = Dreader::Util.simplify i.table[0]
370
+ {name: "John", age: 30}
371
+
372
+ Dreader::Util.slice hash, [:age]
315
373
  {age: 30}
316
374
 
317
- Dreader::Util.clean i.table[0], [:age]
375
+ Dreader::Util.clean hash, [:age]
318
376
  {name: "John"}
319
377
  ```
378
+
379
+ Notice that the output produced by `slice` and `simplify` is a has
380
+ which can be used to create an `ActiveRecord` object.
320
381
 
321
382
  Finally, the `Dreader::Util.restructure` method helps building hashes
322
383
  to create
@@ -0,0 +1,85 @@
1
+ require 'dreader'
2
+
3
+ # this is the class which will contain all the data we read from the file
4
+ class City
5
+ [:city, :state, :population, :lat, :lon].each do |var|
6
+ attr_accessor var
7
+ end
8
+
9
+ def initialize(hash)
10
+ hash.each do |k, v|
11
+ self.send("#{k}=", v)
12
+ end
13
+ end
14
+ end
15
+
16
+ importer = Dreader::Engine.new
17
+
18
+ # read from us_cities.tsv, lines from 2 to 10 (included)
19
+ importer.options do
20
+ filename "us_cities.tsv"
21
+ first_row 2
22
+ last_row 10
23
+ end
24
+
25
+ # these are the columns for which we only need to specify column and name
26
+ importer.bulk_declare ({city: 2, state: 3, latlon: 11}) do
27
+ process { |val| val.strip }
28
+ end
29
+
30
+ # the population column requires more work
31
+ importer.column :population do |col|
32
+ col.colref 4
33
+
34
+ # make "3,000" into 3000 (int)
35
+ col.process do |value|
36
+ value.gsub(",", "").to_i
37
+ end
38
+
39
+ col.check do |value|
40
+ value > 0
41
+ end
42
+
43
+ end
44
+
45
+ cities = []
46
+
47
+ importer.mapping do |row|
48
+ # remove all additional information stored in each cell
49
+ r = Dreader::Util.simplify row
50
+
51
+ # make latlon into the lat, lon fields
52
+ r[:lat], r[:lon] = r[:latlon].split(" ")
53
+
54
+ # now r contains something like
55
+ # {lat: ..., lon: ..., city: ..., state: ..., population: ..., latlon: ...}
56
+
57
+ # remove fields which are not understood by the Cities class and
58
+ # make a new instance
59
+ cleaned = Dreader::Util.clean r, [:latlon]
60
+
61
+ # you must declare an array cities before calling importer.process
62
+ cities << City.new(cleaned)
63
+ end
64
+
65
+ # print to stdout what we told dreader to read
66
+ # (useful only for ... debugging!)
67
+ importer.debug n: 10
68
+
69
+ # check some other features of debug:
70
+ # disable processing and debug (e.g., to analyze the raw data read)
71
+ importer.debug process: false, check: false
72
+
73
+ # load and process
74
+ importer.load
75
+ cities = []
76
+ importer.process
77
+
78
+ # output everything to see whether it works
79
+ puts "First ten cities in the US (source Wikipedia)"
80
+ cities.each do |city|
81
+ [:city, :state, :population, :lat, :lon].each do |var|
82
+ puts "#{var.to_s.capitalize}: #{city.send(var)}"
83
+ end
84
+ puts ""
85
+ end
@@ -1,3 +1,3 @@
1
1
  module Dreader
2
- VERSION = "0.4.1"
2
+ VERSION = "0.4.2"
3
3
  end
data/lib/dreader.rb CHANGED
@@ -70,8 +70,9 @@ module Dreader
70
70
  end
71
71
 
72
72
  # an alias for Hash.slice
73
- def self.slice hash, *keys
74
- hash.slice keys
73
+ # keys is an array of keys
74
+ def self.slice hash, keys
75
+ hash.slice *keys
75
76
  end
76
77
 
77
78
  # remove all `keys` from `hash`
@@ -102,9 +103,11 @@ module Dreader
102
103
  attr_reader :options
103
104
  # the specification of the columns to process
104
105
  attr_reader :colspec
106
+ # the specification of the virtual columns
107
+ attr_reader :virtualcols
105
108
  # the data we read
106
109
  attr_reader :table
107
-
110
+
108
111
  def initialize
109
112
  @options = {}
110
113
  @colspec = []
@@ -133,6 +136,51 @@ module Dreader
133
136
  @colspec << column.to_hash.merge({name: name})
134
137
  end
135
138
 
139
+ # bulk declare columns we intend to read
140
+ #
141
+ # - hash is a hash in the form { symbolic_name: colref }
142
+ #
143
+ # i.bulk_declare {name: 'B', age: 'C'} is equivalent to:
144
+ #
145
+ # i.column :name do
146
+ # colref 'B'
147
+ # end
148
+ # i.column :age do
149
+ # colref 'C'
150
+ # end
151
+ #
152
+ # i.bulk_declare {name: 'B', age: 'C'} do
153
+ # process do |cell|
154
+ # cell.strip
155
+ # end
156
+ # end
157
+ #
158
+ # is equivalent to:
159
+ #
160
+ # i.column :name do
161
+ # colref 'B'
162
+ # process do |cell|
163
+ # cell.strip
164
+ # end
165
+ # end
166
+ # i.column :age do
167
+ # colref 'C'
168
+ # process do |cell|
169
+ # cell.strip
170
+ # end
171
+ # end
172
+ def bulk_declare hash, &block
173
+ hash.keys.each do |key|
174
+ column = Column.new
175
+ column.colref hash[key]
176
+ if block
177
+ column.instance_eval(&block)
178
+ end
179
+ @colspec << column.to_hash.merge({name: key})
180
+ end
181
+ end
182
+
183
+
136
184
  # virtual columns define derived attributes
137
185
  # the code specified in the virtual column is executed after reading
138
186
  # a row and before applying the mapping function
@@ -165,7 +213,12 @@ module Dreader
165
213
  # @return the data read from filename, in the form of an array of
166
214
  # hashes
167
215
  def read args = {}
168
- hash = @options.merge(args)
216
+ if args.class == Hash
217
+ hash = @options.merge(args)
218
+ else
219
+ puts "dreader error at #{__callee__}: this function takes a Hash as input"
220
+ exit
221
+ end
169
222
 
170
223
  spreadsheet = Dreader::Engine.open_spreadsheet (hash[:filename])
171
224
  sheet = spreadsheet.sheet(hash[:sheet] || 0)
@@ -187,13 +240,23 @@ module Dreader
187
240
  r[colname][:row_number] = row_number
188
241
  r[colname][:col_number] = colspec[:colref]
189
242
 
190
- r[colname][:value] = value = colspec[:process] ? colspec[:process].call(cell) : cell
243
+ begin
244
+ r[colname][:value] = value = colspec[:process] ? colspec[:process].call(cell) : cell
245
+ rescue => e
246
+ puts "dreader error at #{__callee__}: 'process' specification for :#{colname} raised an exception at row #{row_number} (col #{index + 1}, value: #{cell})"
247
+ raise e
248
+ end
191
249
 
192
- if colspec[:check] and not colspec[:check].call(value) then
193
- r[colname][:error] = true
194
- @errors << "Error: value \"#{cell}\" for #{colname} at row #{row_number} (col #{index + 1}) does not pass the check function"
195
- else
196
- r[colname][:error] = false
250
+ begin
251
+ if colspec[:check] and not colspec[:check].call(value) then
252
+ r[colname][:error] = true
253
+ @errors << "dreader error at #{__callee__}: value \"#{cell}\" for #{colname} at row #{row_number} (col #{index + 1}) does not pass the check function"
254
+ else
255
+ r[colname][:error] = false
256
+ end
257
+ rescue => e
258
+ puts "dreader error at #{__callee__}: 'check' specification for :#{colname} raised an exception at row #{row_number} (col #{index + 1}, value: #{cell})"
259
+ raise e
197
260
  end
198
261
  end
199
262
 
@@ -205,10 +268,34 @@ module Dreader
205
268
 
206
269
  alias_method :load, :read
207
270
 
271
+ # get (processed) row number
272
+ #
273
+ # - row_number is the row to get: index starts at 1.
274
+ #
275
+ # get_row(1) get the first line read, that is, the row specified
276
+ # by `first_row` in `options` (or in read)
277
+ #
278
+ # You need to invoke read first
279
+ def get_row row_number
280
+ if row_number > @table.size
281
+ puts "dreader error at #{__callee__}: 'row_number' is out of range (did you invoke read first?)"
282
+ exit
283
+ elsif row_number <= 0
284
+ puts "dreader error at #{__callee__}: 'row_number' is zero or negative (first row is 1)."
285
+ else
286
+ @table[row_number - 1]
287
+ end
288
+ end
289
+
208
290
  # show to stdout the first `n` records we read from the file given the current
209
291
  # configuration
210
292
  def debug args = {}
211
- hash = @options.merge(args)
293
+ if args.class == Hash
294
+ hash = @options.merge(args)
295
+ else
296
+ puts "dreader error at #{__callee__}: this function takes a Hash as input"
297
+ exit
298
+ end
212
299
 
213
300
  # apply some defaults, if not defined in the options
214
301
  hash[:process] = true if not hash.has_key? :process # shall we apply the process function?
@@ -246,13 +333,23 @@ module Dreader
246
333
  checked_str = ""
247
334
 
248
335
  if hash[:process]
249
- processed = colspec[:process] ? colspec[:process].call(cell) : cell
250
- processed_str = "processed: '#{processed}' (#{processed.class})"
336
+ begin
337
+ processed = colspec[:process] ? colspec[:process].call(cell) : cell
338
+ processed_str = "processed: '#{processed}' (#{processed.class})"
339
+ rescue => e
340
+ puts "dreader error at #{__callee__}: 'check' specification for :#{colname} raised an exception at row #{row_number} (col #{index + 1}, value: #{cell})"
341
+ raise e
342
+ end
251
343
  end
252
344
  if hash[:check]
253
- processed = colspec[:process] ? colspec[:process].call(cell) : cell
254
- check = colspec[:check] ? colspec[:check].call(processed) : "no check specified"
255
- checked_str = "checked: '#{check}'"
345
+ begin
346
+ processed = colspec[:process] ? colspec[:process].call(cell) : cell
347
+ check = colspec[:check] ? colspec[:check].call(processed) : "no check specified"
348
+ checked_str = "checked: '#{check}'"
349
+ rescue => e
350
+ puts "dreader error at #{__callee__}: 'check' specification for #{colname} at row #{row_number} raised an exception (col #{index + 1}, value: #{cell})"
351
+ raise e
352
+ end
256
353
  end
257
354
 
258
355
  puts " #{colname} => orig: '#{cell}' (#{cell.class}) #{processed_str} #{checked_str} (column: '#{colspec[:colref]}')"
@@ -268,13 +365,18 @@ module Dreader
268
365
 
269
366
  def virtual_columns
270
367
  # execute the virtual column specification
271
- @virtualcols.each do |virtualcol|
272
- @table.each do |r|
273
- # add the cell to the table
274
- r[virtualcol[:name]] = {
275
- value: virtualcol[:process].call(r),
276
- virtual: true,
277
- }
368
+ @table.each do |r|
369
+ @virtualcols.each do |virtualcol|
370
+ begin
371
+ # add the cell to the table
372
+ r[virtualcol[:name]] = {
373
+ value: virtualcol[:process].call(r),
374
+ virtual: true,
375
+ }
376
+ rescue => e
377
+ puts "dreader error at #{__callee__}: 'process' specification for :#{virtualcol[:name]} raised an exception at row #{r[r.keys.first][:row_number]}"
378
+ raise e
379
+ end
278
380
  end
279
381
  end
280
382
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: dreader
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.4.1
4
+ version: 0.4.2
5
5
  platform: ruby
6
6
  authors:
7
7
  - Adolfo Villafiorita
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2018-04-11 00:00:00.000000000 Z
11
+ date: 2018-05-20 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: bundler
@@ -84,6 +84,7 @@ files:
84
84
  - examples/wikipedia_big_us_cities/cities_by_state.ods
85
85
  - examples/wikipedia_us_cities/us_cities.rb
86
86
  - examples/wikipedia_us_cities/us_cities.tsv
87
+ - examples/wikipedia_us_cities/us_cities_bulk_declare.rb
87
88
  - lib/dreader.rb
88
89
  - lib/dreader/version.rb
89
90
  homepage: http://github.com/avillafiorita/dreader