dreader 0.4.2 → 1.0.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/CHANGELOG.ORG +45 -0
- data/Gemfile.lock +21 -8
- data/README.org +794 -0
- data/dreader.gemspec +6 -4
- data/examples/age/age.rb +22 -6
- data/examples/age_with_multiple_checks/Birthdays.ods +0 -0
- data/examples/age_with_multiple_checks/age_with_multiple_checks.rb +62 -0
- data/examples/template/template_generation.rb +37 -0
- data/examples/wikipedia_big_us_cities/big_us_cities.rb +20 -18
- data/examples/wikipedia_us_cities/us_cities.rb +28 -27
- data/examples/wikipedia_us_cities/us_cities_bulk_declare.rb +22 -22
- data/lib/dreader/column.rb +39 -0
- data/lib/dreader/engine.rb +473 -0
- data/lib/dreader/options.rb +16 -0
- data/lib/dreader/util.rb +71 -0
- data/lib/dreader/version.rb +1 -1
- data/lib/dreader.rb +5 -411
- metadata +59 -25
- data/Changelog.org +0 -20
- data/README.md +0 -469
data/README.org
ADDED
@@ -0,0 +1,794 @@
|
|
1
|
+
#+TITLE: Dreader
|
2
|
+
#+AUTHOR: Adolfo Villafiorita
|
3
|
+
#+STARTUP: showall
|
4
|
+
|
5
|
+
Dreader is a simple DSL built on top of [[https://github.com/roo-rb/roo][Roo]] to read and process
|
6
|
+
tabular data (CSV, LibreOffice, Excel) in a simple and structured way.
|
7
|
+
|
8
|
+
Main advantages:
|
9
|
+
|
10
|
+
1. All code to parse input data has the same structure, simplifying
|
11
|
+
code management and understanding (convention over configuration).
|
12
|
+
2. It favors a declarative approach, clearly identifying from which
|
13
|
+
data has to be read and in which way.
|
14
|
+
3. Has facilities to run simulations, to debug and check code and
|
15
|
+
data.
|
16
|
+
|
17
|
+
We use Dreader for importing fairly big files (in the order of
|
18
|
+
10K-100K records) in MIP, an ERP to manage distribution of bins to the
|
19
|
+
population. The main issues we had before using Dreader were errors
|
20
|
+
and exceptional cases in the input data. We also had to manage
|
21
|
+
several small variations in the input files (coming from different
|
22
|
+
ERPs) and Dreader helped us standardizing the input code.
|
23
|
+
|
24
|
+
The gem depends on =roo=, from which it leverages all data
|
25
|
+
reading/parsing facilities keeping its size in about 250 lines of
|
26
|
+
code.
|
27
|
+
|
28
|
+
It should be relatively easy to use; /dreader/ stands for /d/ata /r/eader.
|
29
|
+
|
30
|
+
* Installation
|
31
|
+
|
32
|
+
Add this line to your application's Gemfile:
|
33
|
+
|
34
|
+
#+BEGIN_EXAMPLE ruby
|
35
|
+
gem 'dreader'
|
36
|
+
#+END_EXAMPLE
|
37
|
+
|
38
|
+
And then execute:
|
39
|
+
|
40
|
+
#+BEGIN_EXAMPLE
|
41
|
+
$ bundle
|
42
|
+
#+END_EXAMPLE
|
43
|
+
|
44
|
+
Or install it yourself as:
|
45
|
+
|
46
|
+
#+BEGIN_EXAMPLE
|
47
|
+
$ gem install dreader
|
48
|
+
#+END_EXAMPLE
|
49
|
+
|
50
|
+
|
51
|
+
* Usage
|
52
|
+
|
53
|
+
** Quick start
|
54
|
+
|
55
|
+
Print name and age of people from the following data:
|
56
|
+
|
57
|
+
| Name | Date of birth |
|
58
|
+
|------------------+-----------------|
|
59
|
+
| Forest Whitaker | July 15, 1961 |
|
60
|
+
| Daniel Day-Lewis | April 29, 1957 |
|
61
|
+
| Sean Penn | August 17, 1960 |
|
62
|
+
|
63
|
+
#+BEGIN_EXAMPLE ruby
|
64
|
+
require 'dreader'
|
65
|
+
|
66
|
+
class Reader < Dreader::Engine
|
67
|
+
options do
|
68
|
+
# we start reading from row 2
|
69
|
+
first_row 2
|
70
|
+
end
|
71
|
+
|
72
|
+
column :name do
|
73
|
+
doc "column A contains :name, a string; doc is optional"
|
74
|
+
colref 'A'
|
75
|
+
end
|
76
|
+
|
77
|
+
# column B contains :birthdate, a date. We can use a Hash and omit
|
78
|
+
# colref
|
79
|
+
column({ birthdate: 'B' }) do
|
80
|
+
process do |c|
|
81
|
+
Date.parse(c)
|
82
|
+
end
|
83
|
+
end
|
84
|
+
|
85
|
+
# add as many example lines as you want to show examples of good
|
86
|
+
# records these example lines are added to the template generated with
|
87
|
+
# generate_template
|
88
|
+
example { name: "John", birthday: "27/03/2020" }
|
89
|
+
|
90
|
+
# for each line, :age is computed from :birthdate
|
91
|
+
virtual_column :age do
|
92
|
+
process do |row|
|
93
|
+
birthdate = row[:birthdate][:value]
|
94
|
+
birthday = Date.new(Date.today.year, birthdate.month, birthdate.day)
|
95
|
+
today = Date.today
|
96
|
+
[0, today.year - birthdate.year - (birthday < today ? 1 : 0)].max
|
97
|
+
end
|
98
|
+
end
|
99
|
+
|
100
|
+
# this is how we process each line of the input file
|
101
|
+
mapping do |row|
|
102
|
+
r = Dreader::Util.simplify(row)
|
103
|
+
puts "#{r[:name]} is #{r[:age]} years old (born on #{r[:birthdate]})"
|
104
|
+
end
|
105
|
+
end
|
106
|
+
|
107
|
+
reader = Reader.new
|
108
|
+
|
109
|
+
# read the file
|
110
|
+
reader.read filename: "Birthdays.ods"
|
111
|
+
# compute the virtual columns
|
112
|
+
reader.virtual_columns
|
113
|
+
# run the mapping declaration
|
114
|
+
reader.process
|
115
|
+
|
116
|
+
#
|
117
|
+
# Here we can do further processing on the data
|
118
|
+
#
|
119
|
+
File.open("ages.txt", "w") do |file|
|
120
|
+
reader.table.each do |row|
|
121
|
+
unless row[:row_errors].any?
|
122
|
+
file.puts "#{row[:name][:value]} #{row[:age][:value]}"
|
123
|
+
end
|
124
|
+
end
|
125
|
+
end
|
126
|
+
#+END_EXAMPLE
|
127
|
+
|
128
|
+
** Gentler Introduction
|
129
|
+
|
130
|
+
To write an import function with Dreader:
|
131
|
+
|
132
|
+
- Declare which is the input file and where we can find data (Sheet
|
133
|
+
and first row)
|
134
|
+
- Declare the content of columns and how to check raw data, parse data,
|
135
|
+
and check parsed data
|
136
|
+
- Add virtual columns, that is, columns computed from other values
|
137
|
+
in the row
|
138
|
+
- Specify how to process data each line. This is where you do the actual work
|
139
|
+
(for instance, if you process a file line by line) or put together data for
|
140
|
+
processing after the file has been fully read --- see the next step.
|
141
|
+
|
142
|
+
Dreader has now collected and shaped the data according to your instructions
|
143
|
+
and collected errors in the process. We are now ready to do the actual
|
144
|
+
processing:
|
145
|
+
|
146
|
+
- Do the processing
|
147
|
+
|
148
|
+
Each step is described in more details in the following sections.
|
149
|
+
|
150
|
+
*** Declare which is the input file and where we can find data
|
151
|
+
|
152
|
+
Require =dreader= and declare a class which inherits from =Dreader::Engine=:
|
153
|
+
|
154
|
+
|
155
|
+
#+BEGIN_EXAMPLE ruby
|
156
|
+
require 'dreader'
|
157
|
+
|
158
|
+
class Reader < Dreader::Engine
|
159
|
+
[...]
|
160
|
+
end
|
161
|
+
#+END_EXAMPLE
|
162
|
+
|
163
|
+
In the class specify parsing option, using the following syntax:
|
164
|
+
|
165
|
+
#+BEGIN_EXAMPLE ruby
|
166
|
+
options do
|
167
|
+
filename 'example.ods'
|
168
|
+
|
169
|
+
sheet 'Sheet 1'
|
170
|
+
|
171
|
+
first_row 1
|
172
|
+
last_row 20
|
173
|
+
|
174
|
+
# optional (this allows to integrate with other applications already
|
175
|
+
# using a logger)
|
176
|
+
logger Logger.new
|
177
|
+
logger_level Logger::INFO
|
178
|
+
end
|
179
|
+
#+END_EXAMPLE
|
180
|
+
|
181
|
+
where:
|
182
|
+
|
183
|
+
- (optional) =filename= is the file to read. If not specified, you will
|
184
|
+
have to supply a filename when loading the file (see =read=, below).
|
185
|
+
The extension determines the file type. *Use =.tsv= for tab-separated
|
186
|
+
files.*
|
187
|
+
- (optional) =first_row= is the first line to read (use =2= if your file
|
188
|
+
has a header)
|
189
|
+
- (optional) =last_row= is the last line to read. If not specified, we
|
190
|
+
will rely on =roo= to determine the last row. This is useful for
|
191
|
+
those files in which you only want to process some of the content or
|
192
|
+
contain "garbage" after the records.
|
193
|
+
- (optional) =sheet= is the sheet name or number to read from. If not
|
194
|
+
specified, the first (default) sheet is used
|
195
|
+
|
196
|
+
#+BEGIN_NOTES
|
197
|
+
You can override some of the defaults by passing a hash as argument to
|
198
|
+
the =read= function. For instance:
|
199
|
+
|
200
|
+
#+BEGIN_EXAMPLE ruby
|
201
|
+
i.read filename: another_filepath
|
202
|
+
#+END_EXAMPLE
|
203
|
+
|
204
|
+
will read data from =another_filepath=, rather than from the filename
|
205
|
+
specified in the options. This might be useful, for instance, if the
|
206
|
+
same specification has to be used for different files.
|
207
|
+
#+END_NOTES
|
208
|
+
|
209
|
+
|
210
|
+
*** Declare the content of columns and how to parse them
|
211
|
+
|
212
|
+
Declare the columns you want to read by assigning them a name and a column
|
213
|
+
reference.
|
214
|
+
|
215
|
+
There are two notations:
|
216
|
+
|
217
|
+
#+BEGIN_EXAMPLE ruby
|
218
|
+
# First notation, colref is put in the block
|
219
|
+
i.column :name do
|
220
|
+
colref 'A'
|
221
|
+
end
|
222
|
+
|
223
|
+
# Second notation, a hash is passed in the name
|
224
|
+
i.column({ name: 'A' }) do
|
225
|
+
end
|
226
|
+
#+END_EXAMPLE
|
227
|
+
|
228
|
+
The reference to a column can either be a letter or a number. First column
|
229
|
+
is ='A'= or =1=.
|
230
|
+
|
231
|
+
The =column= declaration can contain Ruby blocks:
|
232
|
+
|
233
|
+
- one or more =check_raw= block check raw data as read from the input
|
234
|
+
file. They can be used, for instance, to verify presence of a value in the
|
235
|
+
input file. *Check must return true if there are no errors; any other
|
236
|
+
value (e.g. an array of messages) is considered an error.*
|
237
|
+
- =process= can be used to transform data into something closer to the input
|
238
|
+
data required for the importing (e.g., it can be used for downcase or
|
239
|
+
strip a string)
|
240
|
+
- one or more =check= block perform a check on the =process=ed data, to check
|
241
|
+
for errors. They can be used, for instance, to check that a model built with
|
242
|
+
=process= is valid. *Check must return true if there are no errors.*
|
243
|
+
|
244
|
+
#+begin_example
|
245
|
+
i.column({ name: 'A' }) do
|
246
|
+
check_raw do |cell|
|
247
|
+
!cell.nil?
|
248
|
+
end
|
249
|
+
end
|
250
|
+
#+end_example
|
251
|
+
|
252
|
+
#+begin_quote
|
253
|
+
*If you declare more than a check block of the same type per column, use a
|
254
|
+
unique symbol to distinguish the blocks or the error messages will be
|
255
|
+
overwritten*.
|
256
|
+
#+end_quote
|
257
|
+
|
258
|
+
#+begin_example
|
259
|
+
i.column({ name: 'A' }) do
|
260
|
+
check_raw :must_be_non_nil do |cell|
|
261
|
+
!cell.nil?
|
262
|
+
end
|
263
|
+
|
264
|
+
check_raw :first_letter_must_be_a do |cell|
|
265
|
+
cell[0] == 'A'
|
266
|
+
end
|
267
|
+
end
|
268
|
+
#+end_example
|
269
|
+
|
270
|
+
#+begin_quote
|
271
|
+
=process= is always executed before =check=. If you want to check raw data
|
272
|
+
use the =check_raw= directive.
|
273
|
+
#+end_quote
|
274
|
+
|
275
|
+
#+begin_quote
|
276
|
+
There can be only one process block. *If you define more than one per
|
277
|
+
column, only the last one is executed.*
|
278
|
+
#+end_quote
|
279
|
+
|
280
|
+
#+begin_example
|
281
|
+
i.column({ name: 'A' }) do
|
282
|
+
check_raw do |cell|
|
283
|
+
# Here cell is like in the input file
|
284
|
+
end
|
285
|
+
|
286
|
+
process do |cell|
|
287
|
+
cell.upcase
|
288
|
+
end
|
289
|
+
|
290
|
+
check do |cell|
|
291
|
+
# Here cell is upcase and
|
292
|
+
end
|
293
|
+
end
|
294
|
+
#+end_example
|
295
|
+
|
296
|
+
For instance, given the tabular data:
|
297
|
+
|
298
|
+
| Name | Date of birth |
|
299
|
+
|------------------+-----------------|
|
300
|
+
| Forest Whitaker | July 15, 1961 |
|
301
|
+
| Daniel Day-Lewis | April 29, 1957 |
|
302
|
+
| Sean Penn | August 17, 1960 |
|
303
|
+
|
304
|
+
we could use the following declaration to specify the data to read:
|
305
|
+
|
306
|
+
#+BEGIN_EXAMPLE ruby
|
307
|
+
# we want to access column 1 using :name (1 and A are equivalent)
|
308
|
+
# :name should be non nil and of length greater than 0
|
309
|
+
column :name do
|
310
|
+
colref 1
|
311
|
+
check do |x|
|
312
|
+
x and x.length > 0
|
313
|
+
end
|
314
|
+
end
|
315
|
+
|
316
|
+
# we want to access column 2 (Date of birth) using :birthdate
|
317
|
+
column :birthdate do
|
318
|
+
colref 2
|
319
|
+
|
320
|
+
# make sure the column is transformed into a Date
|
321
|
+
process do |x|
|
322
|
+
Date.parse(x)
|
323
|
+
end
|
324
|
+
|
325
|
+
# check age is a date (check is invoked on the value returned
|
326
|
+
# by process)
|
327
|
+
check do |x|
|
328
|
+
x.class == Date
|
329
|
+
end
|
330
|
+
end
|
331
|
+
#+END_EXAMPLE
|
332
|
+
|
333
|
+
#+BEGIN_NOTES
|
334
|
+
1. The column name can be anything Ruby can use as a key for a Hash,
|
335
|
+
such as, for instance, symbols, strings, and even object instances.
|
336
|
+
2. =colref= can be a string (e.g., ='A'=) or an integer, with
|
337
|
+
1 and "A" being the first column.
|
338
|
+
3. *You need to declare only the columns you want to import.* For
|
339
|
+
instance, we could skip the declaration for column 1, if 'Date of
|
340
|
+
Birth' is the only data we want to import
|
341
|
+
4. If =process= and =check= are specified, then =check= will receive the
|
342
|
+
result of invoking =process= on the cell value. This makes sense if
|
343
|
+
process is used to make the cell value more accessible to ruby code
|
344
|
+
(e.g., transforming a string into an integer).
|
345
|
+
#+END_NOTES
|
346
|
+
|
347
|
+
If there are different columns that have to be read and processed in the same
|
348
|
+
way, =columns= (notice the plural form) allows for a more compact
|
349
|
+
representation:
|
350
|
+
|
351
|
+
#+BEGIN_EXAMPLE ruby
|
352
|
+
columns { a: 'A', b: 'B' }
|
353
|
+
#+END_EXAMPLE
|
354
|
+
|
355
|
+
is equivalent to:
|
356
|
+
|
357
|
+
#+BEGIN_EXAMPLE ruby
|
358
|
+
column :a do
|
359
|
+
colref 'A'
|
360
|
+
end
|
361
|
+
|
362
|
+
column :b do
|
363
|
+
colref 'B'
|
364
|
+
end
|
365
|
+
#+END_EXAMPLE
|
366
|
+
|
367
|
+
=columns= accepts a code block, which can be used to add =process= and =check=
|
368
|
+
declarations:
|
369
|
+
|
370
|
+
#+BEGIN_EXAMPLE ruby
|
371
|
+
columns({ a: 'A', b: 'B' }) do
|
372
|
+
process do |cell|
|
373
|
+
...
|
374
|
+
end
|
375
|
+
end
|
376
|
+
#+END_EXAMPLE
|
377
|
+
|
378
|
+
See [[file:examples/wikipedia_us_cities/us_cities_bulk_declare.rb][us_cities_bulk_declare.rb]] for an example of =columns=.
|
379
|
+
|
380
|
+
#+BEGIN_NOTES
|
381
|
+
If you use code blocks, don't forget to put in parentheses the
|
382
|
+
column mapping, or the Ruby parser won't be able to distinguish the
|
383
|
+
hash from the code block.
|
384
|
+
#+END_NOTES
|
385
|
+
|
386
|
+
|
387
|
+
*** Add virtual columns
|
388
|
+
|
389
|
+
Sometimes it is convenient to aggregate or otherwise manipulate the data
|
390
|
+
read from each row, before doing the actual processing.
|
391
|
+
|
392
|
+
For instance, we might have a table with dates of birth, while we are
|
393
|
+
really interested in the age of people.
|
394
|
+
|
395
|
+
In such cases, we can use virtual column. A *virtual column* allows
|
396
|
+
one to add a column to the data read, computed using the values of
|
397
|
+
other cells in the same row.
|
398
|
+
|
399
|
+
The following declaration adds an =age= column to each row of the data
|
400
|
+
read from the previous example:
|
401
|
+
|
402
|
+
#+BEGIN_EXAMPLE ruby
|
403
|
+
virtual_column :age do
|
404
|
+
process do |row|
|
405
|
+
# the function `compute_birthday` has to be defined
|
406
|
+
compute_birthday(row[:birthdate])
|
407
|
+
end
|
408
|
+
end
|
409
|
+
#+END_EXAMPLE
|
410
|
+
|
411
|
+
Virtual columns are, of course, available to the =mapping= directive
|
412
|
+
(see below).
|
413
|
+
|
414
|
+
|
415
|
+
*** Specify how to process each line
|
416
|
+
|
417
|
+
The =mapping= directive specifies what to do with each line read. The
|
418
|
+
=mapping= declaration takes an arbitrary piece of ruby code, which can
|
419
|
+
reference the fields using the column names we declared.
|
420
|
+
|
421
|
+
For instance the following code gets the value of column =:name=, the
|
422
|
+
value of column =:age= and prints them to standard output
|
423
|
+
|
424
|
+
#+BEGIN_EXAMPLE ruby
|
425
|
+
mapping do |row|
|
426
|
+
puts "#{row[:name][:value]} is #{row[:age][:value]} years old"
|
427
|
+
end
|
428
|
+
#+END_EXAMPLE
|
429
|
+
|
430
|
+
The data read from each row of our input data is stored in a hash. The hash
|
431
|
+
uses column names as the primary key and stores the values in the =:value=
|
432
|
+
key.
|
433
|
+
|
434
|
+
|
435
|
+
*** Process data
|
436
|
+
|
437
|
+
If =mapping= does not work for your data processing activities (e.g., you need
|
438
|
+
to make elaborations on data which span different rows), you can add your own
|
439
|
+
code after the =process= directive.
|
440
|
+
|
441
|
+
A typical scenario works as follows:
|
442
|
+
|
443
|
+
1. Instantiate the class: ~i = Reader.new~
|
444
|
+
|
445
|
+
1. Use =i.read= or =i.load= (synonyms), to read all data.
|
446
|
+
|
447
|
+
#+BEGIN_EXAMPLE ruby
|
448
|
+
i.read
|
449
|
+
#+END_EXAMPLE
|
450
|
+
|
451
|
+
2. Use =errors= to see whether any of the check functions failed:
|
452
|
+
|
453
|
+
#+BEGIN_EXAMPLE ruby
|
454
|
+
array_of_hashes = i.errors
|
455
|
+
array_of_hashes.each do |error_hash|
|
456
|
+
puts error_hash
|
457
|
+
end
|
458
|
+
#+END_EXAMPLE
|
459
|
+
|
460
|
+
3. Use =virtual_columns= to generate the virtual columns:
|
461
|
+
|
462
|
+
#+BEGIN_EXAMPLE ruby
|
463
|
+
i.virtual_columns
|
464
|
+
#+END_EXAMPLE
|
465
|
+
|
466
|
+
(Optionally: check again for errors.)
|
467
|
+
|
468
|
+
4. Use the =process= function to execute the =mapping=
|
469
|
+
directive on each line read from the file.
|
470
|
+
|
471
|
+
#+BEGIN_EXAMPLE ruby
|
472
|
+
i.process
|
473
|
+
#+END_EXAMPLE
|
474
|
+
|
475
|
+
(Optionally: check again for errors.)
|
476
|
+
|
477
|
+
5. Add your own code to process data. Use the =table= function to access data.
|
478
|
+
|
479
|
+
Look in the examples directory for further details and a couple of
|
480
|
+
working examples.
|
481
|
+
|
482
|
+
|
483
|
+
*** Managing Errors
|
484
|
+
|
485
|
+
**** Finding errors in input data
|
486
|
+
|
487
|
+
Dreader collects errors in three specific ways:
|
488
|
+
|
489
|
+
1. In each column specification, using =check_raw= and =check=. This allows
|
490
|
+
to check each field for errors (e.g., a =nil= value in a cell)
|
491
|
+
2. In virtual columns, using =check_raw= and =check=. This allows to perform
|
492
|
+
more complex checks by putting together all the values read from a row
|
493
|
+
(e.g., =to_date= occurs before =from_date=)
|
494
|
+
|
495
|
+
The following, for instance checks that name or surname have a valid value:
|
496
|
+
|
497
|
+
#+begin_example ruby
|
498
|
+
virtual_column :global_check do
|
499
|
+
doc "Name or Surname must exist"
|
500
|
+
check :name_or_surname_must_be_defined do |row|
|
501
|
+
row[:name] || row[:surname]
|
502
|
+
end
|
503
|
+
end
|
504
|
+
#+end_example
|
505
|
+
|
506
|
+
If you prefer, you can also define a virtual column that contains the value of
|
507
|
+
the check:
|
508
|
+
|
509
|
+
#+begin_example ruby
|
510
|
+
virtual_column :name_or_surname_exist do
|
511
|
+
doc "Name or Surname must exist"
|
512
|
+
process do |row|
|
513
|
+
row[:name] || row[:surname]
|
514
|
+
end
|
515
|
+
end
|
516
|
+
#+end_example
|
517
|
+
|
518
|
+
You can then act in the mapping directive according to value returned by the
|
519
|
+
virtual column:
|
520
|
+
|
521
|
+
#+begin_example ruby
|
522
|
+
mapping do |row|
|
523
|
+
unless row[:global_check][:value] == false
|
524
|
+
[...]
|
525
|
+
end
|
526
|
+
#+end_example
|
527
|
+
|
528
|
+
**** Managing Errors
|
529
|
+
|
530
|
+
You can check for errors in two different ways:
|
531
|
+
|
532
|
+
The first is in the =mapping= directive, where can check whether some checks for
|
533
|
+
the =row= failed, by:
|
534
|
+
|
535
|
+
1. checking from the =:error= boolean key associated to each column, that is:
|
536
|
+
|
537
|
+
=row[<column_name>][:error]=
|
538
|
+
|
539
|
+
2. looking at the value of the =:row_errors= key, which contains all error messages
|
540
|
+
generated for the row:
|
541
|
+
|
542
|
+
=row[:row_errors]=
|
543
|
+
|
544
|
+
3. After the processing, by using the method =errors=, which lists all the errors.
|
545
|
+
|
546
|
+
The utility function =Dreader::Util.errors= takes as input the errors generated by
|
547
|
+
Dreader and extract those of a specific row and, optionally column:
|
548
|
+
|
549
|
+
#+begin_example ruby
|
550
|
+
# get all the errors at line 2
|
551
|
+
Dreader::Util.errors i.errors, 2
|
552
|
+
|
553
|
+
# get all the errors at line 2, column 'C'
|
554
|
+
Dreader::Util.errors i.errors, 2, 3
|
555
|
+
#+end_example
|
556
|
+
|
557
|
+
|
558
|
+
* Generating a Template from the specification
|
559
|
+
|
560
|
+
From version 0.6.0 =dreader= allows to generate a template starting from the
|
561
|
+
specification.
|
562
|
+
|
563
|
+
The template is generated by the following call:
|
564
|
+
|
565
|
+
#+begin_example ruby
|
566
|
+
generate_template template_filename: "template.xlsx"
|
567
|
+
#+end_example
|
568
|
+
|
569
|
+
(The =template_filename= directive can also be specified in the =options=
|
570
|
+
section).
|
571
|
+
|
572
|
+
The template contains the following rows:
|
573
|
+
|
574
|
+
- The first row contains the names of the columns, as specified in the
|
575
|
+
=columns= declarations and made into a human readable form.
|
576
|
+
- The second row contains the doc strings of the columns, if set.
|
577
|
+
- The remaining rows contain the example records added with the
|
578
|
+
=example= directive
|
579
|
+
|
580
|
+
The position of the first row is determined by the value of =first_row=, that
|
581
|
+
is, if =first_row= is 2 (content starts from the second row), the header row
|
582
|
+
is put in row 1.
|
583
|
+
|
584
|
+
Only Excel is supported, at the moment.
|
585
|
+
|
586
|
+
An example of template generation can be found in the Examples.
|
587
|
+
|
588
|
+
** Digging deeper
|
589
|
+
|
590
|
+
If you need to perform elaborations which cannot be performed row by
|
591
|
+
row you can access all data, with the =table= method:
|
592
|
+
|
593
|
+
#+BEGIN_EXAMPLE ruby
|
594
|
+
i.read
|
595
|
+
i.table
|
596
|
+
#+END_EXAMPLE
|
597
|
+
|
598
|
+
The function =i.table= returns an array of Hashes. Each element of
|
599
|
+
the array is a row of the input file. Each element/row has the
|
600
|
+
following structure:
|
601
|
+
|
602
|
+
#+BEGIN_EXAMPLE ruby
|
603
|
+
{
|
604
|
+
col_name1: { <info about col_name_1 in row_j> },
|
605
|
+
[...]
|
606
|
+
col_nameN: { <info about col_name_N in row_j> },
|
607
|
+
row_errors: [ <errors associated to row> ],
|
608
|
+
row_number: <row number>
|
609
|
+
}
|
610
|
+
#+END_EXAMPLE
|
611
|
+
|
612
|
+
where =col_name1=, ..., =col_nameN= are the names you have assigned to
|
613
|
+
the columns and the information stored for each cell is the
|
614
|
+
following:
|
615
|
+
|
616
|
+
#+BEGIN_EXAMPLE ruby
|
617
|
+
{
|
618
|
+
value: ..., # the result of calling process on the cell
|
619
|
+
row_number: ..., # the row number
|
620
|
+
col_number: ..., # the column number
|
621
|
+
error: ... # the result of calling check on the cell processed value
|
622
|
+
}
|
623
|
+
#+END_EXAMPLE
|
624
|
+
|
625
|
+
(Note that virtual columns only store =value= and a Boolean =virtual=,
|
626
|
+
which is always =true=.)
|
627
|
+
|
628
|
+
Thus, for instance, given the example above returns:
|
629
|
+
|
630
|
+
#+BEGIN_EXAMPLE ruby
|
631
|
+
i.table
|
632
|
+
[
|
633
|
+
{
|
634
|
+
name: { value: "John", row_number: 1, col_number: 1, errors: nil },
|
635
|
+
age: { value: 30, row_number: 1, col_number: 2, errors: nil }
|
636
|
+
},
|
637
|
+
{
|
638
|
+
name: { value: "Jane", row_number: 2, col_number: 1, errors: nil },
|
639
|
+
age: { value: 31, row_number: 2, col_number: 2, errors: nil }
|
640
|
+
}
|
641
|
+
]
|
642
|
+
#+END_EXAMPLE
|
643
|
+
|
644
|
+
|
645
|
+
* Simplifying the hash with the data read
|
646
|
+
|
647
|
+
The =Dreader::Util= class provides some functions to simplify the
|
648
|
+
hashes built by =dreader=. This is useful to simplify the code you
|
649
|
+
write and to genereate hashes you can pass, for instance, to
|
650
|
+
ActiveRecord creators.
|
651
|
+
|
652
|
+
** Simplify removes everything but the values
|
653
|
+
|
654
|
+
=Dreader::Util.simplify hash= removes all information but the value
|
655
|
+
and making the value accessible directly from the name of the column.
|
656
|
+
|
657
|
+
#+BEGIN_EXAMPLE ruby
|
658
|
+
i.table[0]
|
659
|
+
{ name: { value: "John", row_number: 1, col_number: 1, errors: nil },
|
660
|
+
age: { value: 30, row_number: 1, col_number: 2, errors: nil } }
|
661
|
+
|
662
|
+
Dreader::Util.simplify i.table[0]
|
663
|
+
{ name: "John", age: 30 }
|
664
|
+
#+END_EXAMPLE
|
665
|
+
|
666
|
+
*As an additional bonus, it removes the keys =row_number= and =row_errors=,
|
667
|
+
which are not part of the data read, in the first place.*
|
668
|
+
|
669
|
+
** Slice and Clean select columns
|
670
|
+
|
671
|
+
=Dreader::Util.slice hash, keys= and =Dreader::Util.clean hash, keys=,
|
672
|
+
where =keys= is an arrays of keys, are respectively used to select or
|
673
|
+
remove some keys from the hash returned by Dreader. (Notice that the
|
674
|
+
Ruby Hash class already provides similar methods.)
|
675
|
+
|
676
|
+
#+BEGIN_EXAMPLE ruby
|
677
|
+
i.table[0]
|
678
|
+
{ name: { value: "John", row_number: 1, col_number: 1, errors: nil },
|
679
|
+
age: { value: 30, row_number: 1, col_number: 2, errors: nil }}
|
680
|
+
|
681
|
+
Dreader::Util.slice i.table[0], :name
|
682
|
+
{ name: { value: "John", row_number: 1, col_number: 1, errors: nil}
|
683
|
+
|
684
|
+
Dreader::Util.clean i.table[0], :name
|
685
|
+
{ age: { value: 30, row_number: 1, col_number: 2, errors: nil }
|
686
|
+
#+END_EXAMPLE
|
687
|
+
|
688
|
+
The methods =slice= and =clean= are more useful when used in
|
689
|
+
conjuction with =simplify=:
|
690
|
+
|
691
|
+
#+BEGIN_EXAMPLE ruby
|
692
|
+
hash = Dreader::Util.simplify i.table[0]
|
693
|
+
{ name: "John", age: 30 }
|
694
|
+
|
695
|
+
Dreader::Util.slice hash, [:age]
|
696
|
+
{ age: 30 }
|
697
|
+
|
698
|
+
Dreader::Util.clean hash, [:age]
|
699
|
+
{ name: "John" }
|
700
|
+
#+END_EXAMPLE
|
701
|
+
|
702
|
+
The output produced by =slice= and =simplify= is a hash which can be used to
|
703
|
+
create an =ActiveRecord= object.
|
704
|
+
|
705
|
+
** Better Integration with ActiveRecord
|
706
|
+
|
707
|
+
Finally, the =Dreader::Util.restructure= method helps building hashes
|
708
|
+
to create [[http://api.rubyonrails.org/classes/ActiveModel/Model.html][ActiveModel]] objects with nested attributes:
|
709
|
+
|
710
|
+
#+BEGIN_EXAMPLE ruby
|
711
|
+
hash = {name: "John", surname: "Doe", address: "Unknown", city: "NY" }
|
712
|
+
|
713
|
+
Dreader::Util.restructure hash, [:name, :surname], :address_attributes, [:address, :city]
|
714
|
+
{name: "John", surname: "Doe", address_attributes: {address: "Unknonw", city: "NY"}}
|
715
|
+
#+END_EXAMPLE
|
716
|
+
|
717
|
+
|
718
|
+
* Debugging your specification
|
719
|
+
|
720
|
+
The =debug= function prints the current configuration, reads some
|
721
|
+
records from the input file(s), and shows the records read:
|
722
|
+
|
723
|
+
#+BEGIN_EXAMPLE ruby
|
724
|
+
i.debug
|
725
|
+
i.debug n: 40 # read 40 lines (from first_row)
|
726
|
+
i.debug n: 40, filename: filepath # like above, but read from filepath
|
727
|
+
#+END_EXAMPLE
|
728
|
+
|
729
|
+
By default =debug= invokes the =check_raw=, =process=, and =check=
|
730
|
+
directives. Pass the following options, if you want to disable this behavior;
|
731
|
+
this might be useful, for instance, if you intend to check only what data is
|
732
|
+
read:
|
733
|
+
|
734
|
+
#+BEGIN_EXAMPLE ruby
|
735
|
+
i.debug process: false, check: false
|
736
|
+
#+END_EXAMPLE
|
737
|
+
|
738
|
+
Notice that =check= implies =process=, since =check= is invoked on the
|
739
|
+
output of the =process= directive.`
|
740
|
+
|
741
|
+
If you prefer, in alternative to =debug= you can also use configuration
|
742
|
+
variables (but then you need to change the configuration according to the
|
743
|
+
environment):
|
744
|
+
|
745
|
+
#+begin_example ruby
|
746
|
+
i.options do
|
747
|
+
debug true
|
748
|
+
end
|
749
|
+
#+end_example
|
750
|
+
|
751
|
+
|
752
|
+
* Changelog
|
753
|
+
|
754
|
+
See [[file:CHANGELOG.ORG][CHANGELOG]].
|
755
|
+
|
756
|
+
* Known Limitations
|
757
|
+
|
758
|
+
At the moment:
|
759
|
+
|
760
|
+
- it is not possible to specify column references using header names
|
761
|
+
(like Roo does).
|
762
|
+
- it is not possible to pass options to the file readers. As a
|
763
|
+
consequence tab-separated files must have the =.tsv= extension or
|
764
|
+
they will not be parsed correctly
|
765
|
+
- some more testing wouldn't hurt.
|
766
|
+
|
767
|
+
* Known Bugs
|
768
|
+
|
769
|
+
Some known bugs and an unknown number of unknown bugs.
|
770
|
+
|
771
|
+
(See the open issues for the known bugs.)
|
772
|
+
|
773
|
+
* Development
|
774
|
+
|
775
|
+
After checking out the repo, run =bin/setup= to install dependencies.
|
776
|
+
You can also run =bin/console= for an interactive prompt that will
|
777
|
+
allow you to experiment.
|
778
|
+
|
779
|
+
To install this gem onto your local machine, run =bundle exec rake
|
780
|
+
install=. To release a new version, update the version number in
|
781
|
+
=version.rb=, and then run =bundle exec rake release=, which will
|
782
|
+
create a git tag for the version, push git commits and tags, and push
|
783
|
+
the =.gem= file to [[https://rubygems.org][rubygems.org]].
|
784
|
+
|
785
|
+
* Contributing
|
786
|
+
|
787
|
+
Bug reports and pull requests are welcome.
|
788
|
+
|
789
|
+
You need to get in touch with me by email, till I figure how to enable
|
790
|
+
it in Gitea.
|
791
|
+
|
792
|
+
* License
|
793
|
+
|
794
|
+
[[https://opensource.org/licenses/MIT][MIT License]].
|