dreader 0.5.0 → 1.1.0

Sign up to get free protection for your applications and to get access to all the features.
data/README.org ADDED
@@ -0,0 +1,821 @@
1
+ #+TITLE: Dreader
2
+ #+AUTHOR: Adolfo Villafiorita
3
+ #+STARTUP: showall
4
+
5
+ Dreader is a simple DSL built on top of [[https://github.com/roo-rb/roo][Roo]] to read and process
6
+ tabular data (CSV, LibreOffice, Excel) in a simple and structured way.
7
+
8
+ Main advantages:
9
+
10
+ 1. All code to parse input data has the same structure, simplifying
11
+ code management and understanding (convention over configuration).
12
+ 2. It favors a declarative approach, clearly identifying from which
13
+ data has to be read and in which way.
14
+ 3. Has facilities to run simulations, to debug and check code and
15
+ data.
16
+
17
+ We use Dreader for importing fairly big files (in the order of
18
+ 10K-100K records) in MIP, an ERP to manage distribution of bins to the
19
+ population. The main issues we had before using Dreader were errors
20
+ and exceptional cases in the input data. We also had to manage
21
+ several small variations in the input files (coming from different
22
+ ERPs) and Dreader helped us standardizing the input code.
23
+
24
+ The gem depends on =roo=, from which it leverages all data
25
+ reading/parsing facilities keeping its size in about 250 lines of
26
+ code.
27
+
28
+ It should be relatively easy to use; /dreader/ stands for /d/ata /r/eader.
29
+
30
+ * Installation
31
+
32
+ Add this line to your application's Gemfile:
33
+
34
+ #+BEGIN_EXAMPLE ruby
35
+ gem 'dreader'
36
+ #+END_EXAMPLE
37
+
38
+ And then execute:
39
+
40
+ #+BEGIN_EXAMPLE
41
+ $ bundle
42
+ #+END_EXAMPLE
43
+
44
+ Or install it yourself as:
45
+
46
+ #+BEGIN_EXAMPLE
47
+ $ gem install dreader
48
+ #+END_EXAMPLE
49
+
50
+
51
+ * Usage
52
+
53
+ ** Quick start
54
+
55
+ Print name and age of people from the following data:
56
+
57
+ | Name | Date of birth |
58
+ |------------------+-----------------|
59
+ | Forest Whitaker | July 15, 1961 |
60
+ | Daniel Day-Lewis | April 29, 1957 |
61
+ | Sean Penn | August 17, 1960 |
62
+
63
+ #+BEGIN_EXAMPLE ruby
64
+ require 'dreader'
65
+
66
+ class Reader
67
+ extend Dreader::Engine
68
+
69
+ options do
70
+ # we start reading from row 2
71
+ first_row 2
72
+ end
73
+
74
+ column :name do
75
+ doc "column A contains :name, a string; doc is optional"
76
+ colref 'A'
77
+ end
78
+
79
+ # column B contains :birthdate, a date. We can use a Hash and omit
80
+ # colref
81
+ column({ birthdate: 'B' }) do
82
+ process do |c|
83
+ Date.parse(c)
84
+ end
85
+ end
86
+
87
+ # add as many example lines as you want to show examples of good
88
+ # records these example lines are added to the template generated with
89
+ # generate_template
90
+ example { name: "John", birthday: "27/03/2020" }
91
+
92
+ # for each line, :age is computed from :birthdate
93
+ virtual_column :age do
94
+ process do |row|
95
+ birthdate = row[:birthdate][:value]
96
+ birthday = Date.new(Date.today.year, birthdate.month, birthdate.day)
97
+ today = Date.today
98
+ [0, today.year - birthdate.year - (birthday < today ? 1 : 0)].max
99
+ end
100
+ end
101
+
102
+ # this is how we process each line of the input file
103
+ mapping do |row|
104
+ r = Dreader::Util.simplify(row)
105
+ puts "#{r[:name]} is #{r[:age]} years old (born on #{r[:birthdate]})"
106
+ end
107
+ end
108
+
109
+ reader = Reader
110
+
111
+ # read the file
112
+ reader.read filename: "Birthdays.ods"
113
+ # compute the virtual columns
114
+ reader.virtual_columns
115
+ # run the mapping declaration
116
+ reader.mappings
117
+
118
+ #
119
+ # Here we can do further processing on the data
120
+ #
121
+ File.open("ages.txt", "w") do |file|
122
+ reader.table.each do |row|
123
+ unless row[:row_errors].any?
124
+ file.puts "#{row[:name][:value]} #{row[:age][:value]}"
125
+ end
126
+ end
127
+ end
128
+ #+END_EXAMPLE
129
+
130
+ ** Gentler Introduction
131
+
132
+ To write an import function with Dreader:
133
+
134
+ - Declare which is the input file and where we can find data (Sheet
135
+ and first row) (This can also be specified in each call.)
136
+ - Declare the content of columns and, then, how to check raw data, parse data,
137
+ and check parsed data
138
+ - Add virtual columns, that is, columns computed from other values
139
+ in the row
140
+ - Specify how to map line. This is where you do the actual work
141
+ (for instance, if you process a file line by line) or put together data for
142
+ processing after the file has been fully read --- see the next step.
143
+
144
+ Dreader now knows ho to collect, shape, and tranform (map) data according to
145
+ your instructions. We are now ready to do the actual work. This consists of
146
+ the following steps, various of which can be performed together:
147
+
148
+ - Read the file
149
+ - Do the parsing/transformations
150
+ - Compute the virtual columns
151
+ - Do the mappings
152
+
153
+ Each step is described in more details in the following sections.
154
+
155
+ *** Declare which is the input file and where we can find data
156
+
157
+ Require =dreader= and declare a class which extends =Dreader::Engine=:
158
+
159
+ #+BEGIN_EXAMPLE ruby
160
+ require 'dreader'
161
+
162
+ class Reader
163
+ extend Dreader::Engine
164
+ [...]
165
+ end
166
+ #+END_EXAMPLE
167
+
168
+ In the class specify parsing option, using the following syntax:
169
+
170
+ #+BEGIN_EXAMPLE ruby
171
+ options do
172
+ filename 'example.ods'
173
+
174
+ sheet 'Sheet 1'
175
+
176
+ first_row 1
177
+ last_row 20
178
+
179
+ # optional (this allows to integrate with other applications already
180
+ # using a logger)
181
+ logger Logger.new
182
+ logger_level Logger::INFO
183
+ end
184
+ #+END_EXAMPLE
185
+
186
+ where:
187
+
188
+ - (optional) =filename= is the file to read. If not specified, you will
189
+ have to supply a filename when loading the file (see =read=, below).
190
+ The extension determines the file type. *Use =.tsv= for tab-separated
191
+ files.*
192
+ - (optional) =first_row= is the first line to read (use =2= if your file
193
+ has a header)
194
+ - (optional) =last_row= is the last line to read. If not specified, we
195
+ will rely on =roo= to determine the last row. This is useful for
196
+ those files in which you only want to process some of the content or
197
+ contain "garbage" after the records.
198
+ - (optional) =sheet= is the sheet name or number to read from. If not
199
+ specified, the first (default) sheet is used
200
+ - (optional) =debug= specifies that we are debugging
201
+ - (optional) =logger= specifies the logger
202
+ - (optional) =logger_level= specifies the logger level
203
+
204
+ You can override some of the defaults by passing a hash as argument to
205
+ the =read= function. For instance:
206
+
207
+ #+BEGIN_EXAMPLE ruby
208
+ Reader.read filename: another_filepath
209
+ #+END_EXAMPLE
210
+
211
+ will read data from =another_filepath=, rather than from the filename
212
+ specified in the options. This might be useful, for instance, if the
213
+ same specification has to be used for different files.
214
+
215
+
216
+ *** Declare the content of columns and how to parse them
217
+
218
+ Declare the columns you want to read by assigning them a name and a column
219
+ reference.
220
+
221
+ There are two notations:
222
+
223
+ #+BEGIN_EXAMPLE ruby
224
+ # First notation, colref is put in the block
225
+ column :name do
226
+ colref 'A'
227
+ end
228
+
229
+ # Second notation, a hash is passed in the name
230
+ column({ name: 'A' }) do
231
+ end
232
+ #+END_EXAMPLE
233
+
234
+ The reference to a column can either be a letter or a number. First column
235
+ is ='A'= or =1=.
236
+
237
+ The =column= declaration can contain Ruby blocks:
238
+
239
+ - one or more =check_raw= block check raw data as read from the input
240
+ file. They can be used, for instance, to verify presence of a value in the
241
+ input file. *Check must return true if there are no errors; any other
242
+ value (e.g. an array of messages) is considered an error.*
243
+ - =process= can be used to transform data into something closer to the input
244
+ data required for the importing (e.g., it can be used for downcase or
245
+ strip a string)
246
+ - one or more =check= block perform a check on the =process=ed data, to check
247
+ for errors. They can be used, for instance, to check that a model built with
248
+ =process= is valid. *Check must return true if there are no errors.*
249
+
250
+ #+begin_example
251
+ column({ name: 'A' }) do
252
+ check_raw do |cell|
253
+ !cell.nil?
254
+ end
255
+ end
256
+ #+end_example
257
+
258
+ #+begin_quote
259
+ *If you declare more than a check block of the same type per column, use a
260
+ unique symbol to distinguish the blocks or the error messages will be
261
+ overwritten*.
262
+ #+end_quote
263
+
264
+ #+begin_example
265
+ column({ name: 'A' }) do
266
+ check_raw :must_be_non_nil do |cell|
267
+ !cell.nil?
268
+ end
269
+
270
+ check_raw :first_letter_must_be_a do |cell|
271
+ cell[0] == 'A'
272
+ end
273
+ end
274
+ #+end_example
275
+
276
+ #+begin_quote
277
+ =process= is always executed before =check=. If you want to check raw data
278
+ use the =check_raw= directive.
279
+ #+end_quote
280
+
281
+ #+begin_quote
282
+ There can be only one process block. *If you define more than one per
283
+ column, only the last one is executed.*
284
+ #+end_quote
285
+
286
+ #+begin_example
287
+ column({ name: 'A' }) do
288
+ check_raw do |cell|
289
+ # Here cell is like in the input file
290
+ end
291
+
292
+ process do |cell|
293
+ cell.upcase
294
+ end
295
+
296
+ check do |cell|
297
+ # Here cell is upcase and
298
+ end
299
+ end
300
+ #+end_example
301
+
302
+ For instance, given the tabular data:
303
+
304
+ | Name | Date of birth |
305
+ |------------------+-----------------|
306
+ | Forest Whitaker | July 15, 1961 |
307
+ | Daniel Day-Lewis | April 29, 1957 |
308
+ | Sean Penn | August 17, 1960 |
309
+
310
+ we could use the following declaration to specify the data to read:
311
+
312
+ #+BEGIN_EXAMPLE ruby
313
+ # we want to access column 1 using :name (1 and A are equivalent)
314
+ # :name should be non nil and of length greater than 0
315
+ column :name do
316
+ colref 1
317
+ check do |x|
318
+ x and x.length > 0
319
+ end
320
+ end
321
+
322
+ # we want to access column 2 (Date of birth) using :birthdate
323
+ column :birthdate do
324
+ colref 2
325
+
326
+ # make sure the column is transformed into a Date
327
+ process do |x|
328
+ Date.parse(x)
329
+ end
330
+
331
+ # check age is a date (check is invoked on the value returned
332
+ # by process)
333
+ check do |x|
334
+ x.class == Date
335
+ end
336
+ end
337
+ #+END_EXAMPLE
338
+
339
+ #+BEGIN_NOTES
340
+ 1. The column name can be anything Ruby can use as a key for a Hash,
341
+ such as, for instance, symbols, strings, and even object instances.
342
+ 2. =colref= can be a string (e.g., ='A'=) or an integer, with
343
+ 1 and "A" being the first column.
344
+ 3. *You need to declare only the columns you want to import.* For
345
+ instance, we could skip the declaration for column 1, if 'Date of
346
+ Birth' is the only data we want to import
347
+ 4. If =process= and =check= are specified, then =check= will receive the
348
+ result of invoking =process= on the cell value. This makes sense if
349
+ process is used to make the cell value more accessible to ruby code
350
+ (e.g., transforming a string into an integer).
351
+ #+END_NOTES
352
+
353
+ If there are different columns that have to be read and processed in the same
354
+ way, =columns= (notice the plural form) allows for a more compact
355
+ representation:
356
+
357
+ #+BEGIN_EXAMPLE ruby
358
+ columns { a: 'A', b: 'B' }
359
+ #+END_EXAMPLE
360
+
361
+ is equivalent to:
362
+
363
+ #+BEGIN_EXAMPLE ruby
364
+ column :a do
365
+ colref 'A'
366
+ end
367
+
368
+ column :b do
369
+ colref 'B'
370
+ end
371
+ #+END_EXAMPLE
372
+
373
+ =columns= accepts a code block, which can be used to add =process= and =check=
374
+ declarations:
375
+
376
+ #+BEGIN_EXAMPLE ruby
377
+ columns({ a: 'A', b: 'B' }) do
378
+ process do |cell|
379
+ ...
380
+ end
381
+ end
382
+ #+END_EXAMPLE
383
+
384
+ See [[file:examples/wikipedia_us_cities/us_cities_bulk_declare.rb][us_cities_bulk_declare.rb]] for an example of =columns=.
385
+
386
+ #+BEGIN_NOTES
387
+ If you use code blocks, don't forget to put in parentheses the
388
+ column mapping, or the Ruby parser won't be able to distinguish the
389
+ hash from the code block.
390
+ #+END_NOTES
391
+
392
+
393
+ *** Add virtual columns
394
+
395
+ Sometimes it is convenient to aggregate or otherwise manipulate the data
396
+ read from each row, before doing the actual processing.
397
+
398
+ For instance, we might have a table with dates of birth, while we are
399
+ really interested in the age of people.
400
+
401
+ In such cases, we can use virtual column. A *virtual column* allows
402
+ one to add a column to the data read, computed using the values of
403
+ other cells in the same row.
404
+
405
+ The following declaration adds an =age= column to each row of the data
406
+ read from the previous example:
407
+
408
+ #+BEGIN_EXAMPLE ruby
409
+ virtual_column :age do
410
+ process do |row|
411
+ # the function `compute_birthday` has to be defined
412
+ compute_birthday(row[:birthdate])
413
+ end
414
+ end
415
+ #+END_EXAMPLE
416
+
417
+ Virtual columns are, of course, available to the =mapping= directive
418
+ (see below).
419
+
420
+
421
+ *** Specify how to process each line
422
+
423
+ The =mapping= directive specifies what to do with each line read. The
424
+ =mapping= declaration takes an arbitrary piece of ruby code, which can
425
+ reference the fields using the column names we declared.
426
+
427
+ For instance the following code gets the value of column =:name=, the
428
+ value of column =:age= and prints them to standard output
429
+
430
+ #+BEGIN_EXAMPLE ruby
431
+ mapping do |row|
432
+ puts "#{row[:name][:value]} is #{row[:age][:value]} years old"
433
+ end
434
+ #+END_EXAMPLE
435
+
436
+ The data read from each row of our input data is stored in a hash. The hash
437
+ uses column names as the primary key and stores the values in the =:value=
438
+ key.
439
+
440
+
441
+ *** Process data
442
+
443
+ If =mapping= does not work for your data processing activities (e.g., you need
444
+ to make elaborations on data which span different rows), you can add your perform
445
+ your elaborations on the data transformed by =mappings=.
446
+
447
+ A typical scenario works as follows:
448
+
449
+ 1. Reference the class =i = Reader= and use =i.read= or =i.load=
450
+ (synonyms), to read all data.
451
+
452
+ #+BEGIN_EXAMPLE ruby
453
+ i = Reader
454
+ i.read
455
+
456
+ # alternatively
457
+ Reader.read
458
+ #+END_EXAMPLE
459
+
460
+ 2. Use =errors= to see whether any of the check functions failed:
461
+
462
+ #+BEGIN_EXAMPLE ruby
463
+ array_of_hashes = i.errors
464
+ array_of_hashes.each do |error_hash|
465
+ puts error_hash
466
+ end
467
+ #+END_EXAMPLE
468
+
469
+ 3. Use =virtual_columns= to generate the virtual columns:
470
+
471
+ #+BEGIN_EXAMPLE ruby
472
+ i.virtual_columns
473
+ #+END_EXAMPLE
474
+
475
+ (Optionally: check again for errors.)
476
+
477
+ 4. Use the =mappings= function to execute the =mapping= directive on each line
478
+ read from the file.
479
+
480
+ #+BEGIN_EXAMPLE ruby
481
+ i.mappings
482
+ #+END_EXAMPLE
483
+
484
+ (Optionally: check again for errors.)
485
+
486
+ 5. Add your own code to process the data returned after =mappings=, which you
487
+ can access with =i.table= or =i.data= (synonyms).
488
+
489
+ Look in the examples directory for further details and a couple of working
490
+ examples.
491
+
492
+ *** Improving performances
493
+
494
+ While debugging your specification executing =read=, =virtual_columns=, and
495
+ =mappings= in distinct steps is a good idea. When you go in production, you
496
+ might want to reduce the number of passes you perform on the data.
497
+
498
+ You can pass the option =virtual: true= to =read= to compute virtual
499
+ columns while you are reading data.
500
+
501
+ You can pass the option =mapping: true= to =read= to compute virtual
502
+ columns and perform the mapping while you are reading data. Notice that:
503
+
504
+ - =mapping= implies =virtual=, that is, if you pass =mapping: true= the read
505
+ function will also compute virtual columns
506
+ - =mapping= alters the content of =@table= and **subsequent calls to
507
+ =virtual_column= and =mapping= will fail.** You have reset by invoking
508
+ =read= again.
509
+
510
+ *** Managing Errors
511
+
512
+ **** Finding errors in input data
513
+
514
+ Dreader collects errors in three specific ways:
515
+
516
+ 1. In each column specification, using =check_raw= and =check=. This allows
517
+ to check each field for errors (e.g., a =nil= value in a cell)
518
+ 2. In virtual columns, using =check_raw= and =check=. This allows to perform
519
+ more complex checks by putting together all the values read from a row
520
+ (e.g., =to_date= occurs before =from_date=)
521
+
522
+ The following, for instance checks that name or surname have a valid value:
523
+
524
+ #+begin_example ruby
525
+ virtual_column :global_check do
526
+ doc "Name or Surname must exist"
527
+ check :name_or_surname_must_be_defined do |row|
528
+ row[:name] || row[:surname]
529
+ end
530
+ end
531
+ #+end_example
532
+
533
+ If you prefer, you can also define a virtual column that contains the value of
534
+ the check:
535
+
536
+ #+begin_example ruby
537
+ virtual_column :name_or_surname_exist do
538
+ doc "Name or Surname must exist"
539
+ process do |row|
540
+ row[:name] || row[:surname]
541
+ end
542
+ end
543
+ #+end_example
544
+
545
+ You can then act in the mapping directive according to value returned by the
546
+ virtual column:
547
+
548
+ #+begin_example ruby
549
+ mapping do |row|
550
+ unless row[:global_check][:value] == false
551
+ [...]
552
+ end
553
+ #+end_example
554
+
555
+ **** Managing Errors
556
+
557
+ You can check for errors in two different ways:
558
+
559
+ The first is in the =mapping= directive, where can check whether some checks
560
+ for the =row= failed, by:
561
+
562
+ 1. checking from the =:error= boolean key associated to each column, that is:
563
+
564
+ =row[<column_name>][:error]=
565
+
566
+ 2. looking at the value of the =:row_errors= key, which contains all error
567
+ messages generated for the row:
568
+
569
+ =row[:row_errors]=
570
+
571
+ 3. After the processing, by using the method =errors=, which lists all the
572
+ errors.
573
+
574
+ The utility function =Dreader::Util.errors= takes as input the errors generated
575
+ by Dreader and extract those of a specific row and, optionally column:
576
+
577
+ #+begin_example ruby
578
+ # get all the errors at line 2
579
+ Dreader::Util.errors i.errors, 2
580
+
581
+ # get all the errors at line 2, column 'C'
582
+ Dreader::Util.errors i.errors, 2, 3
583
+ #+end_example
584
+
585
+
586
+ * Generating a Template from the specification
587
+
588
+ From version 0.6.0 =dreader= allows to generate a template starting from the
589
+ specification.
590
+
591
+ The template is generated by the following call:
592
+
593
+ #+begin_example ruby
594
+ generate_template template_filename: "template.xlsx"
595
+ #+end_example
596
+
597
+ (The =template_filename= directive can also be specified in the =options=
598
+ section).
599
+
600
+ The template contains the following rows:
601
+
602
+ - The first row contains the names of the columns, as specified in the
603
+ =columns= declarations and made into a human readable form.
604
+ - The second row contains the doc strings of the columns, if set.
605
+ - The remaining rows contain the example records added with the
606
+ =example= directive
607
+
608
+ The position of the first row is determined by the value of =first_row=, that
609
+ is, if =first_row= is 2 (content starts from the second row), the header row
610
+ is put in row 1.
611
+
612
+ Only Excel is supported, at the moment.
613
+
614
+ An example of template generation can be found in the Examples.
615
+
616
+ ** Digging deeper
617
+
618
+ If you need to perform elaborations which cannot be performed row by
619
+ row you can access all data, with the =table= method:
620
+
621
+ #+BEGIN_EXAMPLE ruby
622
+ i.read
623
+ i.table
624
+ #+END_EXAMPLE
625
+
626
+ The function =i.table= returns an array of Hashes. Each element of
627
+ the array is a row of the input file. Each element/row has the
628
+ following structure:
629
+
630
+ #+BEGIN_EXAMPLE ruby
631
+ {
632
+ col_name1: { <info about col_name_1 in row_j> },
633
+ [...]
634
+ col_nameN: { <info about col_name_N in row_j> },
635
+ row_errors: [ <errors associated to row> ],
636
+ row_number: <row number>
637
+ }
638
+ #+END_EXAMPLE
639
+
640
+ where =col_name1=, ..., =col_nameN= are the names you have assigned to
641
+ the columns and the information stored for each cell is the
642
+ following:
643
+
644
+ #+BEGIN_EXAMPLE ruby
645
+ {
646
+ value: ..., # the result of calling process on the cell
647
+ row_number: ..., # the row number
648
+ col_number: ..., # the column number
649
+ error: ... # the result of calling check on the cell processed value
650
+ }
651
+ #+END_EXAMPLE
652
+
653
+ (Note that virtual columns only store =value= and a Boolean =virtual=,
654
+ which is always =true=.)
655
+
656
+ Thus, for instance, given the example above returns:
657
+
658
+ #+BEGIN_EXAMPLE ruby
659
+ i.table
660
+ [
661
+ {
662
+ name: { value: "John", row_number: 1, col_number: 1, errors: nil },
663
+ age: { value: 30, row_number: 1, col_number: 2, errors: nil }
664
+ },
665
+ {
666
+ name: { value: "Jane", row_number: 2, col_number: 1, errors: nil },
667
+ age: { value: 31, row_number: 2, col_number: 2, errors: nil }
668
+ }
669
+ ]
670
+ #+END_EXAMPLE
671
+
672
+
673
+ * Simplifying the hash with the data read
674
+
675
+ The =Dreader::Util= class provides some functions to simplify the hashes built
676
+ by =dreader=. This is useful to simplify the code you write and to genereate
677
+ hashes you can pass, for instance, to ActiveRecord creators.
678
+
679
+ ** Simplify removes everything but the values
680
+
681
+ =Dreader::Util.simplify(hash)= removes all information but the value and makes
682
+ the value accessible directly from the name of the column.
683
+
684
+ #+BEGIN_EXAMPLE ruby
685
+ i.table[0]
686
+ {
687
+ name: { value: "John", row_number: 1, col_number: 1, errors: nil },
688
+ age: { value: 30, row_number: 1, col_number: 2, errors: nil }
689
+ }
690
+
691
+ Dreader::Util.simplify i.table[0]
692
+ { name: "John", age: 30 }
693
+ #+END_EXAMPLE
694
+
695
+ ** Slice and Clean select columns
696
+
697
+ =Dreader::Util.slice(hash, keys)= and =Dreader::Util.clean(hash, keys)=, where
698
+ =keys= is an arrays of keys, are respectively used to select or remove some
699
+ keys from the hash returned by Dreader. (Notice that the Ruby Hash class
700
+ already provides similar methods.)
701
+
702
+ #+BEGIN_EXAMPLE ruby
703
+ i.table[0]
704
+ {
705
+ name: { value: "John", row_number: 1, col_number: 1, errors: nil },
706
+ age: { value: 30, row_number: 1, col_number: 2, errors: nil }
707
+ }
708
+
709
+ Dreader::Util.slice i.table[0], :name
710
+ { name: { value: "John", row_number: 1, col_number: 1, errors: nil}
711
+
712
+ Dreader::Util.clean i.table[0], :name
713
+ { age: { value: 30, row_number: 1, col_number: 2, errors: nil }
714
+ #+END_EXAMPLE
715
+
716
+ The methods =slice= and =clean= are more useful when used in conjuction with
717
+ =simplify=:
718
+
719
+ #+BEGIN_EXAMPLE ruby
720
+ hash = Dreader::Util.simplify i.table[0]
721
+ { name: "John", age: 30 }
722
+
723
+ Dreader::Util.slice hash, [:age]
724
+ { age: 30 }
725
+
726
+ Dreader::Util.clean hash, [:age]
727
+ { name: "John" }
728
+ #+END_EXAMPLE
729
+
730
+ The output produced by =slice= and =simplify= is a hash which can be used to
731
+ create an =ActiveRecord= object.
732
+
733
+ ** Better Integration with ActiveRecord
734
+
735
+ Finally, the =Dreader::Util.restructure= method helps building hashes to create
736
+ [[http://api.rubyonrails.org/classes/ActiveModel/Model.html][ActiveModel]] objects with nested attributes.
737
+
738
+ **The starting point is a simplified row.**
739
+
740
+ #+BEGIN_EXAMPLE ruby
741
+ hash = { name: "John", surname: "Doe", address: "Unknown", city: "NY" }
742
+
743
+ Dreader::Util.restructure hash, [:name, :surname, :address_attributes, [:address, :city]]
744
+ { name: "John", surname: "Doe", address_attributes: { address: "Unknown", city: "NY" } }
745
+ #+END_EXAMPLE
746
+
747
+
748
+ * Debugging your specification
749
+
750
+ The =debug= function prints the current configuration, reads some records from
751
+ the input file(s), and shows the records read:
752
+
753
+ #+BEGIN_EXAMPLE ruby
754
+ i.debug
755
+ i.debug n: 40 # read 40 lines (from first_row)
756
+ i.debug n: 40, filename: filepath # like above, but read from filepath
757
+ #+END_EXAMPLE
758
+
759
+ By default =debug= invokes the =check_raw=, =process=, and =check=
760
+ directives. Pass the following options, if you want to disable this behavior;
761
+ this might be useful, for instance, if you intend to check only what data is
762
+ read:
763
+
764
+ #+BEGIN_EXAMPLE ruby
765
+ i.debug process: false, check: false
766
+ #+END_EXAMPLE
767
+
768
+ Notice that =check= implies =process=, since =check= is invoked on the output
769
+ of the =process= directive.`
770
+
771
+ If you prefer, in alternative to =debug= you can also use configuration
772
+ variables (but then you need to change the configuration according to the
773
+ environment):
774
+
775
+ #+begin_example ruby
776
+ i.options do
777
+ debug true
778
+ end
779
+ #+end_example
780
+
781
+
782
+ * Changelog
783
+
784
+ See [[file:CHANGELOG.org][CHANGELOG]].
785
+
786
+ * Known Limitations
787
+
788
+ At the moment:
789
+
790
+ - it is not possible to specify column references using header names
791
+ (like Roo does).
792
+ - some more testing wouldn't hurt.
793
+
794
+ * Known Bugs
795
+
796
+ Some known bugs and an unknown number of unknown bugs.
797
+
798
+ (See the open issues for the known bugs.)
799
+
800
+ * Development
801
+
802
+ After checking out the repo, run =bin/setup= to install dependencies.
803
+ You can also run =bin/console= for an interactive prompt that will
804
+ allow you to experiment.
805
+
806
+ To install this gem onto your local machine, run =bundle exec rake
807
+ install=. To release a new version, update the version number in
808
+ =version.rb=, and then run =bundle exec rake release=, which will
809
+ create a git tag for the version, push git commits and tags, and push
810
+ the =.gem= file to [[https://rubygems.org][rubygems.org]].
811
+
812
+ * Contributing
813
+
814
+ Bug reports and pull requests are welcome.
815
+
816
+ You need to get in touch with me by email, till I figure how to enable
817
+ it in Gitea.
818
+
819
+ * License
820
+
821
+ [[https://opensource.org/licenses/MIT][MIT License]].