dreader 0.4.1 → 0.4.2

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 2216ea348440adb3c2cff5236ae8b3a0d916e4576002b400b4d3121da650dc8e
4
- data.tar.gz: 1d22f900bf6b578ddcbf28ab30cbf835cba6121dd6904f071690939ab7b84d5d
3
+ metadata.gz: 6d616b9bad780960c105a2c754e392ecf51c0cdaeeb6076c11b4042d7c40e414
4
+ data.tar.gz: 656507d726a22a24111fc239c0f640dedf90321aab2bcda74a73dc0cf730a83d
5
5
  SHA512:
6
- metadata.gz: 15a938341dfe5075bad0255fe4c623220c2f4cbd6a3e629410df1f6da58e2554bf7e5b15f17b641a57de45b694254771b9e58911d9c324582f786c8604db35c4
7
- data.tar.gz: f02e9653e465e0eedf1d0feac0aecfbb20d7ae5177b51956a917159d4405d786a049d9bb840b8520223c6aeb7239fc1f8b954ac80caa07a322fe5d06f2aeb487
6
+ metadata.gz: 9ee4f8c9367864ef01aea8d75240a3a93a7bdbcab2411b4e2d4f92df8dc4fa5840e812308a02b8c5da4afc791c358f5eff4849017f63a4d90011ebcdca24e217
7
+ data.tar.gz: 11571140e63afb0c52b33a010e701e754964150878cba7e395878c2cf2f59cc85eb6b493d5fd05f287d229ec807e8873a45aafe850be6fc30abcbbb2d1cc47fc
data/Changelog.org CHANGED
@@ -1,3 +1,14 @@
1
+ * Version 0.4.2
2
+ ** better error messages for process and check functions
3
+ dreader now captures exceptions raised by process and check and
4
+ prints and error message to stdout if an error is found.
5
+ the exception is then propagated in the standard way.
6
+ ** new method bulk_declare
7
+ bulk_declare allow to easily declare columns which don't need a
8
+ specific treatment
9
+ ** read will now complains if the argument passed is not a hash
10
+ ** virtualcols is now accessible (attr_reader)
11
+ ** fixed a bug with slice
1
12
  * Version 0.4.1
2
13
  ** fixed an issue with ~read~: it always required a hash as input
3
14
  ** changed syntax of ~debug~, which now accepts a hash as argument
data/Gemfile.lock CHANGED
@@ -1,7 +1,7 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- dreader (0.4.1)
4
+ dreader (0.4.2)
5
5
  roo
6
6
 
7
7
  GEM
data/README.md CHANGED
@@ -46,7 +46,7 @@ Or install it yourself as:
46
46
 
47
47
  ## Usage
48
48
 
49
- ### Declare the file you want to read
49
+ ### Declare what file you want to read
50
50
 
51
51
  Require `dreader` and declare an instance of the `Dreader::Engine` class:
52
52
 
@@ -137,6 +137,44 @@ end
137
137
  # we are done with our declarations)
138
138
  ```
139
139
 
140
+ If there are different columns that you want to read and process in
141
+ the same way, you can use the method `bulk_declare`, which accepts a
142
+ hash as input.
143
+
144
+ For instance:
145
+
146
+ ```ruby
147
+ i.bulk_declare {a: 'A', b: 'B'}
148
+ ```
149
+
150
+ is equivalent to:
151
+
152
+ ```ruby
153
+ i.column :a do
154
+ colref 'A'
155
+ end
156
+
157
+ i.column :b do
158
+ colref 'B'
159
+ end
160
+ ```
161
+
162
+ The method also accepts a code block, which allows to define a common
163
+ `process` function for all columns. In case, **don't forget to put
164
+ the hash in parentheses, or the Ruby parser won't be able to
165
+ distinguish the hash from the code block.** For instance:
166
+
167
+ ```ruby
168
+ i.bulk_declare({a: 'A', b: 'B'}) do
169
+ process do |cell|
170
+ ...
171
+ end
172
+ end
173
+ ```
174
+
175
+ There is an example of `bulk_declare` in the examples directory:
176
+ ([us_cities_bulk_declare.rb](examples/wikipedia_us_cities/us_cities_bulk_declare.rb)).
177
+
140
178
  **Remarks:**
141
179
 
142
180
  1. the column name can be anything ruby can use as a Hash key. You
@@ -146,7 +184,7 @@ end
146
184
  2. `colref` can be a string (e.g., `'A'`) or an integer, in which case
147
185
  the first column is one
148
186
 
149
- 3. you need to declare only the columns you want to import. For
187
+ 3. **you need to declare only the columns you want to import.** For
150
188
  instance, we could skip the declaration for column 1, if 'Date of
151
189
  Birth' is the only data we want to import
152
190
 
@@ -215,9 +253,6 @@ into a `@table` instance variable.
215
253
  i.read
216
254
  ```
217
255
 
218
- **Read applies all the `column` and `virtual_column` declarations and
219
- builds a hash with the data read.**
220
-
221
256
  After reading the file we can use `errors` to see whether any of the
222
257
  `check` functions failed:
223
258
 
@@ -228,6 +263,13 @@ array_of_strings ech do |error_line|
228
263
  end
229
264
  ```
230
265
 
266
+ We can then use `virtual_columns` to process data and generate the
267
+ virtual columns:
268
+
269
+ ```ruby
270
+ i.virtual_columns
271
+ ```
272
+
231
273
  Finally we can use the `process` function to execute the `mapping`
232
274
  directive to each line read from the file.
233
275
 
@@ -290,13 +332,11 @@ i.table
290
332
  age: { value: 31, row_number: 2, col_number: 2, errors: nil } } ]
291
333
  ```
292
334
 
293
- ## Simplifying the data read
335
+ ## Simplifying the hash with the data read
294
336
 
295
337
  The `Dreader::Util` class provides some functions to simplify and
296
338
  restructure the hashes built by `dreader`.
297
339
 
298
- More in details:
299
-
300
340
  `Dreader::Util.simplify hash` simplifies the hash passed as input by
301
341
  removing all information but the value and making the value
302
342
  accessible directly from the name of the column.
@@ -309,14 +349,35 @@ Dreader::Util.simplify i.table[0]
309
349
  `Dreader::Util.slice hash, keys` and `Dreader::Util.slice hash,
310
350
  keys`, where `keys` is an arrays of keys, are respectively used to
311
351
  select or remove some keys from `hash`.
352
+
353
+ ```ruby
354
+ i.table[0]
355
+ { name: { value: "John", row_number: 1, col_number: 1, errors: nil },
356
+ age: { value: 30, row_number: 1, col_number: 2, errors: nil }}
357
+
358
+ Dreader::Util.slice i.table[0], :name
359
+ {name: { value: "John", row_number: 1, col_number: 1, errors: nil}
360
+
361
+ Dreader::Util.clean i.table[0], :name
362
+ {age: { value: 30, row_number: 1, col_number: 2, errors: nil }
363
+ ```
364
+
365
+ The methods `slice` and `clean` are more useful when used in
366
+ conjuction with `simplify`:
312
367
 
313
368
  ```ruby
314
- Dreader::Util.slice i.table[0], [:age]
369
+ hash = Dreader::Util.simplify i.table[0]
370
+ {name: "John", age: 30}
371
+
372
+ Dreader::Util.slice hash, [:age]
315
373
  {age: 30}
316
374
 
317
- Dreader::Util.clean i.table[0], [:age]
375
+ Dreader::Util.clean hash, [:age]
318
376
  {name: "John"}
319
377
  ```
378
+
379
+ Notice that the output produced by `slice` and `simplify` is a has
380
+ which can be used to create an `ActiveRecord` object.
320
381
 
321
382
  Finally, the `Dreader::Util.restructure` method helps building hashes
322
383
  to create
@@ -0,0 +1,85 @@
1
+ require 'dreader'
2
+
3
+ # this is the class which will contain all the data we read from the file
4
+ class City
5
+ [:city, :state, :population, :lat, :lon].each do |var|
6
+ attr_accessor var
7
+ end
8
+
9
+ def initialize(hash)
10
+ hash.each do |k, v|
11
+ self.send("#{k}=", v)
12
+ end
13
+ end
14
+ end
15
+
16
+ importer = Dreader::Engine.new
17
+
18
+ # read from us_cities.tsv, lines from 2 to 10 (included)
19
+ importer.options do
20
+ filename "us_cities.tsv"
21
+ first_row 2
22
+ last_row 10
23
+ end
24
+
25
+ # these are the columns for which we only need to specify column and name
26
+ importer.bulk_declare ({city: 2, state: 3, latlon: 11}) do
27
+ process { |val| val.strip }
28
+ end
29
+
30
+ # the population column requires more work
31
+ importer.column :population do |col|
32
+ col.colref 4
33
+
34
+ # make "3,000" into 3000 (int)
35
+ col.process do |value|
36
+ value.gsub(",", "").to_i
37
+ end
38
+
39
+ col.check do |value|
40
+ value > 0
41
+ end
42
+
43
+ end
44
+
45
+ cities = []
46
+
47
+ importer.mapping do |row|
48
+ # remove all additional information stored in each cell
49
+ r = Dreader::Util.simplify row
50
+
51
+ # make latlon into the lat, lon fields
52
+ r[:lat], r[:lon] = r[:latlon].split(" ")
53
+
54
+ # now r contains something like
55
+ # {lat: ..., lon: ..., city: ..., state: ..., population: ..., latlon: ...}
56
+
57
+ # remove fields which are not understood by the Cities class and
58
+ # make a new instance
59
+ cleaned = Dreader::Util.clean r, [:latlon]
60
+
61
+ # you must declare an array cities before calling importer.process
62
+ cities << City.new(cleaned)
63
+ end
64
+
65
+ # print to stdout what we told dreader to read
66
+ # (useful only for ... debugging!)
67
+ importer.debug n: 10
68
+
69
+ # check some other features of debug:
70
+ # disable processing and debug (e.g., to analyze the raw data read)
71
+ importer.debug process: false, check: false
72
+
73
+ # load and process
74
+ importer.load
75
+ cities = []
76
+ importer.process
77
+
78
+ # output everything to see whether it works
79
+ puts "First ten cities in the US (source Wikipedia)"
80
+ cities.each do |city|
81
+ [:city, :state, :population, :lat, :lon].each do |var|
82
+ puts "#{var.to_s.capitalize}: #{city.send(var)}"
83
+ end
84
+ puts ""
85
+ end
@@ -1,3 +1,3 @@
1
1
  module Dreader
2
- VERSION = "0.4.1"
2
+ VERSION = "0.4.2"
3
3
  end
data/lib/dreader.rb CHANGED
@@ -70,8 +70,9 @@ module Dreader
70
70
  end
71
71
 
72
72
  # an alias for Hash.slice
73
- def self.slice hash, *keys
74
- hash.slice keys
73
+ # keys is an array of keys
74
+ def self.slice hash, keys
75
+ hash.slice *keys
75
76
  end
76
77
 
77
78
  # remove all `keys` from `hash`
@@ -102,9 +103,11 @@ module Dreader
102
103
  attr_reader :options
103
104
  # the specification of the columns to process
104
105
  attr_reader :colspec
106
+ # the specification of the virtual columns
107
+ attr_reader :virtualcols
105
108
  # the data we read
106
109
  attr_reader :table
107
-
110
+
108
111
  def initialize
109
112
  @options = {}
110
113
  @colspec = []
@@ -133,6 +136,51 @@ module Dreader
133
136
  @colspec << column.to_hash.merge({name: name})
134
137
  end
135
138
 
139
+ # bulk declare columns we intend to read
140
+ #
141
+ # - hash is a hash in the form { symbolic_name: colref }
142
+ #
143
+ # i.bulk_declare {name: 'B', age: 'C'} is equivalent to:
144
+ #
145
+ # i.column :name do
146
+ # colref 'B'
147
+ # end
148
+ # i.column :age do
149
+ # colref 'C'
150
+ # end
151
+ #
152
+ # i.bulk_declare {name: 'B', age: 'C'} do
153
+ # process do |cell|
154
+ # cell.strip
155
+ # end
156
+ # end
157
+ #
158
+ # is equivalent to:
159
+ #
160
+ # i.column :name do
161
+ # colref 'B'
162
+ # process do |cell|
163
+ # cell.strip
164
+ # end
165
+ # end
166
+ # i.column :age do
167
+ # colref 'C'
168
+ # process do |cell|
169
+ # cell.strip
170
+ # end
171
+ # end
172
+ def bulk_declare hash, &block
173
+ hash.keys.each do |key|
174
+ column = Column.new
175
+ column.colref hash[key]
176
+ if block
177
+ column.instance_eval(&block)
178
+ end
179
+ @colspec << column.to_hash.merge({name: key})
180
+ end
181
+ end
182
+
183
+
136
184
  # virtual columns define derived attributes
137
185
  # the code specified in the virtual column is executed after reading
138
186
  # a row and before applying the mapping function
@@ -165,7 +213,12 @@ module Dreader
165
213
  # @return the data read from filename, in the form of an array of
166
214
  # hashes
167
215
  def read args = {}
168
- hash = @options.merge(args)
216
+ if args.class == Hash
217
+ hash = @options.merge(args)
218
+ else
219
+ puts "dreader error at #{__callee__}: this function takes a Hash as input"
220
+ exit
221
+ end
169
222
 
170
223
  spreadsheet = Dreader::Engine.open_spreadsheet (hash[:filename])
171
224
  sheet = spreadsheet.sheet(hash[:sheet] || 0)
@@ -187,13 +240,23 @@ module Dreader
187
240
  r[colname][:row_number] = row_number
188
241
  r[colname][:col_number] = colspec[:colref]
189
242
 
190
- r[colname][:value] = value = colspec[:process] ? colspec[:process].call(cell) : cell
243
+ begin
244
+ r[colname][:value] = value = colspec[:process] ? colspec[:process].call(cell) : cell
245
+ rescue => e
246
+ puts "dreader error at #{__callee__}: 'process' specification for :#{colname} raised an exception at row #{row_number} (col #{index + 1}, value: #{cell})"
247
+ raise e
248
+ end
191
249
 
192
- if colspec[:check] and not colspec[:check].call(value) then
193
- r[colname][:error] = true
194
- @errors << "Error: value \"#{cell}\" for #{colname} at row #{row_number} (col #{index + 1}) does not pass the check function"
195
- else
196
- r[colname][:error] = false
250
+ begin
251
+ if colspec[:check] and not colspec[:check].call(value) then
252
+ r[colname][:error] = true
253
+ @errors << "dreader error at #{__callee__}: value \"#{cell}\" for #{colname} at row #{row_number} (col #{index + 1}) does not pass the check function"
254
+ else
255
+ r[colname][:error] = false
256
+ end
257
+ rescue => e
258
+ puts "dreader error at #{__callee__}: 'check' specification for :#{colname} raised an exception at row #{row_number} (col #{index + 1}, value: #{cell})"
259
+ raise e
197
260
  end
198
261
  end
199
262
 
@@ -205,10 +268,34 @@ module Dreader
205
268
 
206
269
  alias_method :load, :read
207
270
 
271
+ # get (processed) row number
272
+ #
273
+ # - row_number is the row to get: index starts at 1.
274
+ #
275
+ # get_row(1) get the first line read, that is, the row specified
276
+ # by `first_row` in `options` (or in read)
277
+ #
278
+ # You need to invoke read first
279
+ def get_row row_number
280
+ if row_number > @table.size
281
+ puts "dreader error at #{__callee__}: 'row_number' is out of range (did you invoke read first?)"
282
+ exit
283
+ elsif row_number <= 0
284
+ puts "dreader error at #{__callee__}: 'row_number' is zero or negative (first row is 1)."
285
+ else
286
+ @table[row_number - 1]
287
+ end
288
+ end
289
+
208
290
  # show to stdout the first `n` records we read from the file given the current
209
291
  # configuration
210
292
  def debug args = {}
211
- hash = @options.merge(args)
293
+ if args.class == Hash
294
+ hash = @options.merge(args)
295
+ else
296
+ puts "dreader error at #{__callee__}: this function takes a Hash as input"
297
+ exit
298
+ end
212
299
 
213
300
  # apply some defaults, if not defined in the options
214
301
  hash[:process] = true if not hash.has_key? :process # shall we apply the process function?
@@ -246,13 +333,23 @@ module Dreader
246
333
  checked_str = ""
247
334
 
248
335
  if hash[:process]
249
- processed = colspec[:process] ? colspec[:process].call(cell) : cell
250
- processed_str = "processed: '#{processed}' (#{processed.class})"
336
+ begin
337
+ processed = colspec[:process] ? colspec[:process].call(cell) : cell
338
+ processed_str = "processed: '#{processed}' (#{processed.class})"
339
+ rescue => e
340
+ puts "dreader error at #{__callee__}: 'check' specification for :#{colname} raised an exception at row #{row_number} (col #{index + 1}, value: #{cell})"
341
+ raise e
342
+ end
251
343
  end
252
344
  if hash[:check]
253
- processed = colspec[:process] ? colspec[:process].call(cell) : cell
254
- check = colspec[:check] ? colspec[:check].call(processed) : "no check specified"
255
- checked_str = "checked: '#{check}'"
345
+ begin
346
+ processed = colspec[:process] ? colspec[:process].call(cell) : cell
347
+ check = colspec[:check] ? colspec[:check].call(processed) : "no check specified"
348
+ checked_str = "checked: '#{check}'"
349
+ rescue => e
350
+ puts "dreader error at #{__callee__}: 'check' specification for #{colname} at row #{row_number} raised an exception (col #{index + 1}, value: #{cell})"
351
+ raise e
352
+ end
256
353
  end
257
354
 
258
355
  puts " #{colname} => orig: '#{cell}' (#{cell.class}) #{processed_str} #{checked_str} (column: '#{colspec[:colref]}')"
@@ -268,13 +365,18 @@ module Dreader
268
365
 
269
366
  def virtual_columns
270
367
  # execute the virtual column specification
271
- @virtualcols.each do |virtualcol|
272
- @table.each do |r|
273
- # add the cell to the table
274
- r[virtualcol[:name]] = {
275
- value: virtualcol[:process].call(r),
276
- virtual: true,
277
- }
368
+ @table.each do |r|
369
+ @virtualcols.each do |virtualcol|
370
+ begin
371
+ # add the cell to the table
372
+ r[virtualcol[:name]] = {
373
+ value: virtualcol[:process].call(r),
374
+ virtual: true,
375
+ }
376
+ rescue => e
377
+ puts "dreader error at #{__callee__}: 'process' specification for :#{virtualcol[:name]} raised an exception at row #{r[r.keys.first][:row_number]}"
378
+ raise e
379
+ end
278
380
  end
279
381
  end
280
382
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: dreader
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.4.1
4
+ version: 0.4.2
5
5
  platform: ruby
6
6
  authors:
7
7
  - Adolfo Villafiorita
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2018-04-11 00:00:00.000000000 Z
11
+ date: 2018-05-20 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: bundler
@@ -84,6 +84,7 @@ files:
84
84
  - examples/wikipedia_big_us_cities/cities_by_state.ods
85
85
  - examples/wikipedia_us_cities/us_cities.rb
86
86
  - examples/wikipedia_us_cities/us_cities.tsv
87
+ - examples/wikipedia_us_cities/us_cities_bulk_declare.rb
87
88
  - lib/dreader.rb
88
89
  - lib/dreader/version.rb
89
90
  homepage: http://github.com/avillafiorita/dreader