dreader 0.3.1 → 0.4.1
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/Changelog.org +9 -0
- data/Gemfile.lock +1 -1
- data/README.md +45 -10
- data/examples/age/age.rb +1 -1
- data/examples/wikipedia_us_cities/us_cities.rb +9 -1
- data/lib/dreader/version.rb +1 -1
- data/lib/dreader.rb +51 -17
- metadata +3 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 2216ea348440adb3c2cff5236ae8b3a0d916e4576002b400b4d3121da650dc8e
|
4
|
+
data.tar.gz: 1d22f900bf6b578ddcbf28ab30cbf835cba6121dd6904f071690939ab7b84d5d
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 15a938341dfe5075bad0255fe4c623220c2f4cbd6a3e629410df1f6da58e2554bf7e5b15f17b641a57de45b694254771b9e58911d9c324582f786c8604db35c4
|
7
|
+
data.tar.gz: f02e9653e465e0eedf1d0feac0aecfbb20d7ae5177b51956a917159d4405d786a049d9bb840b8520223c6aeb7239fc1f8b954ac80caa07a322fe5d06f2aeb487
|
data/Changelog.org
ADDED
@@ -0,0 +1,9 @@
|
|
1
|
+
* Version 0.4.1
|
2
|
+
** fixed an issue with ~read~: it always required a hash as input
|
3
|
+
** changed syntax of ~debug~, which now accepts a hash as argument
|
4
|
+
This makes its syntax similar to ~read~.
|
5
|
+
** improved output of ~debug~
|
6
|
+
By default ~debug~ now prints the output of ~process~ and ~check~.
|
7
|
+
You can disable this feature by passing ~process: false~ and/or ~check:
|
8
|
+
false~ to the ~debug~. Notice that ~check~ implies ~process~.
|
9
|
+
|
data/Gemfile.lock
CHANGED
data/README.md
CHANGED
@@ -72,8 +72,9 @@ end
|
|
72
72
|
where:
|
73
73
|
|
74
74
|
* (optional) `filename` is the file to read. If not specified, you
|
75
|
-
will have to supply a filename when loading the file
|
76
|
-
determines the file type. **Use `tsv` for
|
75
|
+
will have to supply a filename when loading the file (see `read`,
|
76
|
+
below). The extension determines the file type. **Use `tsv` for
|
77
|
+
tab-separated files.**
|
77
78
|
* (optional) `first_row` is the first line to read (use `2` if your
|
78
79
|
file has a header)
|
79
80
|
* (optional) `last_row` is the last line to read. If not specified, we
|
@@ -88,7 +89,7 @@ column reference:
|
|
88
89
|
|
89
90
|
```ruby
|
90
91
|
# we will access column A in Ruby code using :name
|
91
|
-
i.column :name
|
92
|
+
i.column :name do
|
92
93
|
colref 'A'
|
93
94
|
end
|
94
95
|
```
|
@@ -138,14 +139,18 @@ end
|
|
138
139
|
|
139
140
|
**Remarks:**
|
140
141
|
|
141
|
-
1.
|
142
|
+
1. the column name can be anything ruby can use as a Hash key. You
|
143
|
+
can use symbols, strings, and even object instances, if you wish to
|
144
|
+
do so.
|
145
|
+
|
146
|
+
2. `colref` can be a string (e.g., `'A'`) or an integer, in which case
|
142
147
|
the first column is one
|
143
148
|
|
144
|
-
|
149
|
+
3. you need to declare only the columns you want to import. For
|
145
150
|
instance, we could skip the declaration for column 1, if 'Date of
|
146
151
|
Birth' is the only data we want to import
|
147
152
|
|
148
|
-
|
153
|
+
4. If `process` and `check` are specified, then `check` will receive
|
149
154
|
the result of invoking `process` on the cell value. This makes
|
150
155
|
sense if process is used to make the cell value more accessible to
|
151
156
|
ruby code (e.g., transforming a string into an integer).
|
@@ -198,6 +203,7 @@ Notice that the data read from each row of our input data is stored in
|
|
198
203
|
a hash. The hash uses column names as the primary key and stores
|
199
204
|
the values in the `:value` key.
|
200
205
|
|
206
|
+
|
201
207
|
### Start working with the data
|
202
208
|
|
203
209
|
We are now all set and we can start working with the data.
|
@@ -209,8 +215,8 @@ into a `@table` instance variable.
|
|
209
215
|
i.read
|
210
216
|
```
|
211
217
|
|
212
|
-
Read applies all the `column` and `virtual_column` declarations and
|
213
|
-
|
218
|
+
**Read applies all the `column` and `virtual_column` declarations and
|
219
|
+
builds a hash with the data read.**
|
214
220
|
|
215
221
|
After reading the file we can use `errors` to see whether any of the
|
216
222
|
`check` functions failed:
|
@@ -232,6 +238,17 @@ i.process
|
|
232
238
|
Look in the examples directory for further details and a couple of
|
233
239
|
working examples.
|
234
240
|
|
241
|
+
**Remark.** You can override some of the defaults by passing a hash as
|
242
|
+
argument to read. For instance:
|
243
|
+
|
244
|
+
```ruby
|
245
|
+
i.read filename: another_filepath
|
246
|
+
```
|
247
|
+
|
248
|
+
will read data from `another_filepath`, rather than from the filename
|
249
|
+
specified in the options. This might be useful, for instance, if the
|
250
|
+
same specification has to be used for different files.
|
251
|
+
|
235
252
|
|
236
253
|
## Digging deeper
|
237
254
|
|
@@ -324,13 +341,30 @@ shows them to standard output:
|
|
324
341
|
|
325
342
|
```ruby
|
326
343
|
i.debug
|
327
|
-
i.debug 40 # read 40 lines (from first_row, if the option is declared)
|
328
|
-
i.debug 40, filename # like above, but read from
|
344
|
+
i.debug n: 40 # read 40 lines (from first_row, if the option is declared)
|
345
|
+
i.debug n: 40, filename: filepath # like above, but read from filepath
|
329
346
|
```
|
330
347
|
|
331
348
|
Another possibility is getting the value of the `@table` variable,
|
332
349
|
which contains all the data read.
|
333
350
|
|
351
|
+
By default `debug` invokes the `process` and `check` directives. Pass
|
352
|
+
the following options, if you want to disable this behavior; this
|
353
|
+
might be useful, for instance, if you intend to check only what data
|
354
|
+
is read:
|
355
|
+
|
356
|
+
```ruby
|
357
|
+
i.debug process: false, debug: false
|
358
|
+
```
|
359
|
+
|
360
|
+
Notice that `check` implies `process`, since `check` is invoked on the
|
361
|
+
output of the `process` directive.`
|
362
|
+
|
363
|
+
|
364
|
+
## Changelog
|
365
|
+
|
366
|
+
See [[Changelog]].
|
367
|
+
|
334
368
|
|
335
369
|
## Known Limitations
|
336
370
|
|
@@ -343,6 +377,7 @@ At the moment:
|
|
343
377
|
correctly parsed.
|
344
378
|
- some testing wouldn't hurt.
|
345
379
|
|
380
|
+
|
346
381
|
## Known Bugs
|
347
382
|
|
348
383
|
No known bugs and an unknown number of unknown bugs.
|
data/examples/age/age.rb
CHANGED
@@ -43,6 +43,10 @@ importer.column :population do |col|
|
|
43
43
|
value.gsub(",", "").to_i
|
44
44
|
end
|
45
45
|
|
46
|
+
col.check do |value|
|
47
|
+
value > 0
|
48
|
+
end
|
49
|
+
|
46
50
|
end
|
47
51
|
|
48
52
|
cities = []
|
@@ -67,7 +71,11 @@ end
|
|
67
71
|
|
68
72
|
# print to stdout what we told dreader to read
|
69
73
|
# (useful only for ... debugging!)
|
70
|
-
importer.debug 10
|
74
|
+
importer.debug n: 10
|
75
|
+
|
76
|
+
# check some other features of debug:
|
77
|
+
# disable processing and debug (e.g., to analyze the raw data read)
|
78
|
+
importer.debug process: false, check: false
|
71
79
|
|
72
80
|
# load and process
|
73
81
|
importer.load
|
data/lib/dreader/version.rb
CHANGED
data/lib/dreader.rb
CHANGED
@@ -156,19 +156,27 @@ module Dreader
|
|
156
156
|
end
|
157
157
|
|
158
158
|
# read a file and store it internally
|
159
|
-
#
|
160
|
-
|
161
|
-
|
162
|
-
|
159
|
+
#
|
160
|
+
# @param hash, a hash, possibly overriding any of the parameters
|
161
|
+
# set in the initial options. This allows you, for
|
162
|
+
# instance, to apply the same column specification to
|
163
|
+
# different files and different sheets
|
164
|
+
#
|
165
|
+
# @return the data read from filename, in the form of an array of
|
166
|
+
# hashes
|
167
|
+
def read args = {}
|
168
|
+
hash = @options.merge(args)
|
169
|
+
|
170
|
+
spreadsheet = Dreader::Engine.open_spreadsheet (hash[:filename])
|
171
|
+
sheet = spreadsheet.sheet(hash[:sheet] || 0)
|
163
172
|
|
164
173
|
@table = Array.new
|
165
174
|
@errors = Array.new
|
166
175
|
|
167
|
-
first_row =
|
168
|
-
last_row =
|
176
|
+
first_row = hash[:first_row] || 1
|
177
|
+
last_row = hash[:last_row] || sheet.last_row
|
169
178
|
|
170
179
|
(first_row..last_row).each do |row_number|
|
171
|
-
|
172
180
|
r = Hash.new
|
173
181
|
@colspec.each_with_index do |colspec, index|
|
174
182
|
cell = sheet.cell(row_number, colspec[:colref])
|
@@ -199,29 +207,55 @@ module Dreader
|
|
199
207
|
|
200
208
|
# show to stdout the first `n` records we read from the file given the current
|
201
209
|
# configuration
|
202
|
-
def debug
|
203
|
-
|
204
|
-
|
210
|
+
def debug args = {}
|
211
|
+
hash = @options.merge(args)
|
212
|
+
|
213
|
+
# apply some defaults, if not defined in the options
|
214
|
+
hash[:process] = true if not hash.has_key? :process # shall we apply the process function?
|
215
|
+
hash[:check] = true if not hash.has_key? :check # shall we check the data read?
|
216
|
+
hash[:n] = 10 if not hash[:n]
|
217
|
+
|
218
|
+
spreadsheet = Dreader::Engine.open_spreadsheet (hash[:filename])
|
219
|
+
sheet = spreadsheet.sheet(hash[:sheet] || 0)
|
205
220
|
|
206
221
|
puts "Current configuration:"
|
207
222
|
@options.each do |k, v|
|
208
223
|
puts " #{k}: #{v}"
|
209
224
|
end
|
210
|
-
if filename and @options[:filename] and filename != @options[:filename]
|
211
|
-
puts "Warning: you asked me to load a file different from the one specified in the otptions."
|
212
|
-
end
|
213
225
|
|
214
|
-
|
226
|
+
puts "Configuration used by debug:"
|
227
|
+
hash.each do |k, v|
|
228
|
+
puts " #{k}: #{v}"
|
229
|
+
end
|
230
|
+
|
231
|
+
n = hash[:n]
|
232
|
+
first_row = hash[:first_row] || 1
|
215
233
|
last_row = first_row + n - 1
|
216
|
-
|
234
|
+
|
235
|
+
puts " Last row (according to roo): #{sheet.last_row}"
|
236
|
+
puts " Number of rows I will read in this session: #{n} (from #{first_row} to #{last_row})"
|
237
|
+
|
217
238
|
(first_row..last_row).each do |row_number|
|
218
239
|
puts "Row #{row_number} is:"
|
219
240
|
r = Hash.new
|
220
241
|
@colspec.each_with_index do |colspec, index|
|
221
|
-
cell = sheet.cell(row_number, colspec[:colref])
|
222
242
|
colname = colspec[:name]
|
243
|
+
cell = sheet.cell(row_number, colspec[:colref])
|
244
|
+
|
245
|
+
processed_str = ""
|
246
|
+
checked_str = ""
|
247
|
+
|
248
|
+
if hash[:process]
|
249
|
+
processed = colspec[:process] ? colspec[:process].call(cell) : cell
|
250
|
+
processed_str = "processed: '#{processed}' (#{processed.class})"
|
251
|
+
end
|
252
|
+
if hash[:check]
|
253
|
+
processed = colspec[:process] ? colspec[:process].call(cell) : cell
|
254
|
+
check = colspec[:check] ? colspec[:check].call(processed) : "no check specified"
|
255
|
+
checked_str = "checked: '#{check}'"
|
256
|
+
end
|
223
257
|
|
224
|
-
puts " #{colname} => '#{cell}' (column: '#{colspec[:colref]}')"
|
258
|
+
puts " #{colname} => orig: '#{cell}' (#{cell.class}) #{processed_str} #{checked_str} (column: '#{colspec[:colref]}')"
|
225
259
|
end
|
226
260
|
end
|
227
261
|
end
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: dreader
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.4.1
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Adolfo Villafiorita
|
8
8
|
autorequire:
|
9
9
|
bindir: exe
|
10
10
|
cert_chain: []
|
11
|
-
date: 2018-
|
11
|
+
date: 2018-04-11 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: bundler
|
@@ -69,6 +69,7 @@ extensions: []
|
|
69
69
|
extra_rdoc_files: []
|
70
70
|
files:
|
71
71
|
- ".gitignore"
|
72
|
+
- Changelog.org
|
72
73
|
- Gemfile
|
73
74
|
- Gemfile.lock
|
74
75
|
- LICENSE.txt
|