dreader 0.3.1 → 0.4.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/Changelog.org +9 -0
- data/Gemfile.lock +1 -1
- data/README.md +45 -10
- data/examples/age/age.rb +1 -1
- data/examples/wikipedia_us_cities/us_cities.rb +9 -1
- data/lib/dreader/version.rb +1 -1
- data/lib/dreader.rb +51 -17
- metadata +3 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 2216ea348440adb3c2cff5236ae8b3a0d916e4576002b400b4d3121da650dc8e
|
4
|
+
data.tar.gz: 1d22f900bf6b578ddcbf28ab30cbf835cba6121dd6904f071690939ab7b84d5d
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 15a938341dfe5075bad0255fe4c623220c2f4cbd6a3e629410df1f6da58e2554bf7e5b15f17b641a57de45b694254771b9e58911d9c324582f786c8604db35c4
|
7
|
+
data.tar.gz: f02e9653e465e0eedf1d0feac0aecfbb20d7ae5177b51956a917159d4405d786a049d9bb840b8520223c6aeb7239fc1f8b954ac80caa07a322fe5d06f2aeb487
|
data/Changelog.org
ADDED
@@ -0,0 +1,9 @@
|
|
1
|
+
* Version 0.4.1
|
2
|
+
** fixed an issue with ~read~: it always required a hash as input
|
3
|
+
** changed syntax of ~debug~, which now accepts a hash as argument
|
4
|
+
This makes its syntax similar to ~read~.
|
5
|
+
** improved output of ~debug~
|
6
|
+
By default ~debug~ now prints the output of ~process~ and ~check~.
|
7
|
+
You can disable this feature by passing ~process: false~ and/or ~check:
|
8
|
+
false~ to the ~debug~. Notice that ~check~ implies ~process~.
|
9
|
+
|
data/Gemfile.lock
CHANGED
data/README.md
CHANGED
@@ -72,8 +72,9 @@ end
|
|
72
72
|
where:
|
73
73
|
|
74
74
|
* (optional) `filename` is the file to read. If not specified, you
|
75
|
-
will have to supply a filename when loading the file
|
76
|
-
determines the file type. **Use `tsv` for
|
75
|
+
will have to supply a filename when loading the file (see `read`,
|
76
|
+
below). The extension determines the file type. **Use `tsv` for
|
77
|
+
tab-separated files.**
|
77
78
|
* (optional) `first_row` is the first line to read (use `2` if your
|
78
79
|
file has a header)
|
79
80
|
* (optional) `last_row` is the last line to read. If not specified, we
|
@@ -88,7 +89,7 @@ column reference:
|
|
88
89
|
|
89
90
|
```ruby
|
90
91
|
# we will access column A in Ruby code using :name
|
91
|
-
i.column :name
|
92
|
+
i.column :name do
|
92
93
|
colref 'A'
|
93
94
|
end
|
94
95
|
```
|
@@ -138,14 +139,18 @@ end
|
|
138
139
|
|
139
140
|
**Remarks:**
|
140
141
|
|
141
|
-
1.
|
142
|
+
1. the column name can be anything ruby can use as a Hash key. You
|
143
|
+
can use symbols, strings, and even object instances, if you wish to
|
144
|
+
do so.
|
145
|
+
|
146
|
+
2. `colref` can be a string (e.g., `'A'`) or an integer, in which case
|
142
147
|
the first column is one
|
143
148
|
|
144
|
-
|
149
|
+
3. you need to declare only the columns you want to import. For
|
145
150
|
instance, we could skip the declaration for column 1, if 'Date of
|
146
151
|
Birth' is the only data we want to import
|
147
152
|
|
148
|
-
|
153
|
+
4. If `process` and `check` are specified, then `check` will receive
|
149
154
|
the result of invoking `process` on the cell value. This makes
|
150
155
|
sense if process is used to make the cell value more accessible to
|
151
156
|
ruby code (e.g., transforming a string into an integer).
|
@@ -198,6 +203,7 @@ Notice that the data read from each row of our input data is stored in
|
|
198
203
|
a hash. The hash uses column names as the primary key and stores
|
199
204
|
the values in the `:value` key.
|
200
205
|
|
206
|
+
|
201
207
|
### Start working with the data
|
202
208
|
|
203
209
|
We are now all set and we can start working with the data.
|
@@ -209,8 +215,8 @@ into a `@table` instance variable.
|
|
209
215
|
i.read
|
210
216
|
```
|
211
217
|
|
212
|
-
Read applies all the `column` and `virtual_column` declarations and
|
213
|
-
|
218
|
+
**Read applies all the `column` and `virtual_column` declarations and
|
219
|
+
builds a hash with the data read.**
|
214
220
|
|
215
221
|
After reading the file we can use `errors` to see whether any of the
|
216
222
|
`check` functions failed:
|
@@ -232,6 +238,17 @@ i.process
|
|
232
238
|
Look in the examples directory for further details and a couple of
|
233
239
|
working examples.
|
234
240
|
|
241
|
+
**Remark.** You can override some of the defaults by passing a hash as
|
242
|
+
argument to read. For instance:
|
243
|
+
|
244
|
+
```ruby
|
245
|
+
i.read filename: another_filepath
|
246
|
+
```
|
247
|
+
|
248
|
+
will read data from `another_filepath`, rather than from the filename
|
249
|
+
specified in the options. This might be useful, for instance, if the
|
250
|
+
same specification has to be used for different files.
|
251
|
+
|
235
252
|
|
236
253
|
## Digging deeper
|
237
254
|
|
@@ -324,13 +341,30 @@ shows them to standard output:
|
|
324
341
|
|
325
342
|
```ruby
|
326
343
|
i.debug
|
327
|
-
i.debug 40 # read 40 lines (from first_row, if the option is declared)
|
328
|
-
i.debug 40, filename # like above, but read from
|
344
|
+
i.debug n: 40 # read 40 lines (from first_row, if the option is declared)
|
345
|
+
i.debug n: 40, filename: filepath # like above, but read from filepath
|
329
346
|
```
|
330
347
|
|
331
348
|
Another possibility is getting the value of the `@table` variable,
|
332
349
|
which contains all the data read.
|
333
350
|
|
351
|
+
By default `debug` invokes the `process` and `check` directives. Pass
|
352
|
+
the following options, if you want to disable this behavior; this
|
353
|
+
might be useful, for instance, if you intend to check only what data
|
354
|
+
is read:
|
355
|
+
|
356
|
+
```ruby
|
357
|
+
i.debug process: false, debug: false
|
358
|
+
```
|
359
|
+
|
360
|
+
Notice that `check` implies `process`, since `check` is invoked on the
|
361
|
+
output of the `process` directive.`
|
362
|
+
|
363
|
+
|
364
|
+
## Changelog
|
365
|
+
|
366
|
+
See [[Changelog]].
|
367
|
+
|
334
368
|
|
335
369
|
## Known Limitations
|
336
370
|
|
@@ -343,6 +377,7 @@ At the moment:
|
|
343
377
|
correctly parsed.
|
344
378
|
- some testing wouldn't hurt.
|
345
379
|
|
380
|
+
|
346
381
|
## Known Bugs
|
347
382
|
|
348
383
|
No known bugs and an unknown number of unknown bugs.
|
data/examples/age/age.rb
CHANGED
@@ -43,6 +43,10 @@ importer.column :population do |col|
|
|
43
43
|
value.gsub(",", "").to_i
|
44
44
|
end
|
45
45
|
|
46
|
+
col.check do |value|
|
47
|
+
value > 0
|
48
|
+
end
|
49
|
+
|
46
50
|
end
|
47
51
|
|
48
52
|
cities = []
|
@@ -67,7 +71,11 @@ end
|
|
67
71
|
|
68
72
|
# print to stdout what we told dreader to read
|
69
73
|
# (useful only for ... debugging!)
|
70
|
-
importer.debug 10
|
74
|
+
importer.debug n: 10
|
75
|
+
|
76
|
+
# check some other features of debug:
|
77
|
+
# disable processing and debug (e.g., to analyze the raw data read)
|
78
|
+
importer.debug process: false, check: false
|
71
79
|
|
72
80
|
# load and process
|
73
81
|
importer.load
|
data/lib/dreader/version.rb
CHANGED
data/lib/dreader.rb
CHANGED
@@ -156,19 +156,27 @@ module Dreader
|
|
156
156
|
end
|
157
157
|
|
158
158
|
# read a file and store it internally
|
159
|
-
#
|
160
|
-
|
161
|
-
|
162
|
-
|
159
|
+
#
|
160
|
+
# @param hash, a hash, possibly overriding any of the parameters
|
161
|
+
# set in the initial options. This allows you, for
|
162
|
+
# instance, to apply the same column specification to
|
163
|
+
# different files and different sheets
|
164
|
+
#
|
165
|
+
# @return the data read from filename, in the form of an array of
|
166
|
+
# hashes
|
167
|
+
def read args = {}
|
168
|
+
hash = @options.merge(args)
|
169
|
+
|
170
|
+
spreadsheet = Dreader::Engine.open_spreadsheet (hash[:filename])
|
171
|
+
sheet = spreadsheet.sheet(hash[:sheet] || 0)
|
163
172
|
|
164
173
|
@table = Array.new
|
165
174
|
@errors = Array.new
|
166
175
|
|
167
|
-
first_row =
|
168
|
-
last_row =
|
176
|
+
first_row = hash[:first_row] || 1
|
177
|
+
last_row = hash[:last_row] || sheet.last_row
|
169
178
|
|
170
179
|
(first_row..last_row).each do |row_number|
|
171
|
-
|
172
180
|
r = Hash.new
|
173
181
|
@colspec.each_with_index do |colspec, index|
|
174
182
|
cell = sheet.cell(row_number, colspec[:colref])
|
@@ -199,29 +207,55 @@ module Dreader
|
|
199
207
|
|
200
208
|
# show to stdout the first `n` records we read from the file given the current
|
201
209
|
# configuration
|
202
|
-
def debug
|
203
|
-
|
204
|
-
|
210
|
+
def debug args = {}
|
211
|
+
hash = @options.merge(args)
|
212
|
+
|
213
|
+
# apply some defaults, if not defined in the options
|
214
|
+
hash[:process] = true if not hash.has_key? :process # shall we apply the process function?
|
215
|
+
hash[:check] = true if not hash.has_key? :check # shall we check the data read?
|
216
|
+
hash[:n] = 10 if not hash[:n]
|
217
|
+
|
218
|
+
spreadsheet = Dreader::Engine.open_spreadsheet (hash[:filename])
|
219
|
+
sheet = spreadsheet.sheet(hash[:sheet] || 0)
|
205
220
|
|
206
221
|
puts "Current configuration:"
|
207
222
|
@options.each do |k, v|
|
208
223
|
puts " #{k}: #{v}"
|
209
224
|
end
|
210
|
-
if filename and @options[:filename] and filename != @options[:filename]
|
211
|
-
puts "Warning: you asked me to load a file different from the one specified in the otptions."
|
212
|
-
end
|
213
225
|
|
214
|
-
|
226
|
+
puts "Configuration used by debug:"
|
227
|
+
hash.each do |k, v|
|
228
|
+
puts " #{k}: #{v}"
|
229
|
+
end
|
230
|
+
|
231
|
+
n = hash[:n]
|
232
|
+
first_row = hash[:first_row] || 1
|
215
233
|
last_row = first_row + n - 1
|
216
|
-
|
234
|
+
|
235
|
+
puts " Last row (according to roo): #{sheet.last_row}"
|
236
|
+
puts " Number of rows I will read in this session: #{n} (from #{first_row} to #{last_row})"
|
237
|
+
|
217
238
|
(first_row..last_row).each do |row_number|
|
218
239
|
puts "Row #{row_number} is:"
|
219
240
|
r = Hash.new
|
220
241
|
@colspec.each_with_index do |colspec, index|
|
221
|
-
cell = sheet.cell(row_number, colspec[:colref])
|
222
242
|
colname = colspec[:name]
|
243
|
+
cell = sheet.cell(row_number, colspec[:colref])
|
244
|
+
|
245
|
+
processed_str = ""
|
246
|
+
checked_str = ""
|
247
|
+
|
248
|
+
if hash[:process]
|
249
|
+
processed = colspec[:process] ? colspec[:process].call(cell) : cell
|
250
|
+
processed_str = "processed: '#{processed}' (#{processed.class})"
|
251
|
+
end
|
252
|
+
if hash[:check]
|
253
|
+
processed = colspec[:process] ? colspec[:process].call(cell) : cell
|
254
|
+
check = colspec[:check] ? colspec[:check].call(processed) : "no check specified"
|
255
|
+
checked_str = "checked: '#{check}'"
|
256
|
+
end
|
223
257
|
|
224
|
-
puts " #{colname} => '#{cell}' (column: '#{colspec[:colref]}')"
|
258
|
+
puts " #{colname} => orig: '#{cell}' (#{cell.class}) #{processed_str} #{checked_str} (column: '#{colspec[:colref]}')"
|
225
259
|
end
|
226
260
|
end
|
227
261
|
end
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: dreader
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.4.1
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Adolfo Villafiorita
|
8
8
|
autorequire:
|
9
9
|
bindir: exe
|
10
10
|
cert_chain: []
|
11
|
-
date: 2018-
|
11
|
+
date: 2018-04-11 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: bundler
|
@@ -69,6 +69,7 @@ extensions: []
|
|
69
69
|
extra_rdoc_files: []
|
70
70
|
files:
|
71
71
|
- ".gitignore"
|
72
|
+
- Changelog.org
|
72
73
|
- Gemfile
|
73
74
|
- Gemfile.lock
|
74
75
|
- LICENSE.txt
|