dreader 0.5.0 → 1.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/README.org ADDED
@@ -0,0 +1,821 @@
1
+ #+TITLE: Dreader
2
+ #+AUTHOR: Adolfo Villafiorita
3
+ #+STARTUP: showall
4
+
5
+ Dreader is a simple DSL built on top of [[https://github.com/roo-rb/roo][Roo]] to read and process
6
+ tabular data (CSV, LibreOffice, Excel) in a simple and structured way.
7
+
8
+ Main advantages:
9
+
10
+ 1. All code to parse input data has the same structure, simplifying
11
+ code management and understanding (convention over configuration).
12
+ 2. It favors a declarative approach, clearly identifying from which
13
+ data has to be read and in which way.
14
+ 3. Has facilities to run simulations, to debug and check code and
15
+ data.
16
+
17
+ We use Dreader for importing fairly big files (in the order of
18
+ 10K-100K records) in MIP, an ERP to manage distribution of bins to the
19
+ population. The main issues we had before using Dreader were errors
20
+ and exceptional cases in the input data. We also had to manage
21
+ several small variations in the input files (coming from different
22
+ ERPs) and Dreader helped us standardizing the input code.
23
+
24
+ The gem depends on =roo=, from which it leverages all data
25
+ reading/parsing facilities keeping its size in about 250 lines of
26
+ code.
27
+
28
+ It should be relatively easy to use; /dreader/ stands for /d/ata /r/eader.
29
+
30
+ * Installation
31
+
32
+ Add this line to your application's Gemfile:
33
+
34
+ #+BEGIN_EXAMPLE ruby
35
+ gem 'dreader'
36
+ #+END_EXAMPLE
37
+
38
+ And then execute:
39
+
40
+ #+BEGIN_EXAMPLE
41
+ $ bundle
42
+ #+END_EXAMPLE
43
+
44
+ Or install it yourself as:
45
+
46
+ #+BEGIN_EXAMPLE
47
+ $ gem install dreader
48
+ #+END_EXAMPLE
49
+
50
+
51
+ * Usage
52
+
53
+ ** Quick start
54
+
55
+ Print name and age of people from the following data:
56
+
57
+ | Name | Date of birth |
58
+ |------------------+-----------------|
59
+ | Forest Whitaker | July 15, 1961 |
60
+ | Daniel Day-Lewis | April 29, 1957 |
61
+ | Sean Penn | August 17, 1960 |
62
+
63
+ #+BEGIN_EXAMPLE ruby
64
+ require 'dreader'
65
+
66
+ class Reader
67
+ extend Dreader::Engine
68
+
69
+ options do
70
+ # we start reading from row 2
71
+ first_row 2
72
+ end
73
+
74
+ column :name do
75
+ doc "column A contains :name, a string; doc is optional"
76
+ colref 'A'
77
+ end
78
+
79
+ # column B contains :birthdate, a date. We can use a Hash and omit
80
+ # colref
81
+ column({ birthdate: 'B' }) do
82
+ process do |c|
83
+ Date.parse(c)
84
+ end
85
+ end
86
+
87
+ # add as many example lines as you want to show examples of good
88
+ # records these example lines are added to the template generated with
89
+ # generate_template
90
+ example { name: "John", birthday: "27/03/2020" }
91
+
92
+ # for each line, :age is computed from :birthdate
93
+ virtual_column :age do
94
+ process do |row|
95
+ birthdate = row[:birthdate][:value]
96
+ birthday = Date.new(Date.today.year, birthdate.month, birthdate.day)
97
+ today = Date.today
98
+ [0, today.year - birthdate.year - (birthday < today ? 1 : 0)].max
99
+ end
100
+ end
101
+
102
+ # this is how we process each line of the input file
103
+ mapping do |row|
104
+ r = Dreader::Util.simplify(row)
105
+ puts "#{r[:name]} is #{r[:age]} years old (born on #{r[:birthdate]})"
106
+ end
107
+ end
108
+
109
+ reader = Reader
110
+
111
+ # read the file
112
+ reader.read filename: "Birthdays.ods"
113
+ # compute the virtual columns
114
+ reader.virtual_columns
115
+ # run the mapping declaration
116
+ reader.mappings
117
+
118
+ #
119
+ # Here we can do further processing on the data
120
+ #
121
+ File.open("ages.txt", "w") do |file|
122
+ reader.table.each do |row|
123
+ unless row[:row_errors].any?
124
+ file.puts "#{row[:name][:value]} #{row[:age][:value]}"
125
+ end
126
+ end
127
+ end
128
+ #+END_EXAMPLE
129
+
130
+ ** Gentler Introduction
131
+
132
+ To write an import function with Dreader:
133
+
134
+ - Declare which is the input file and where we can find data (Sheet
135
+ and first row) (This can also be specified in each call.)
136
+ - Declare the content of columns and, then, how to check raw data, parse data,
137
+ and check parsed data
138
+ - Add virtual columns, that is, columns computed from other values
139
+ in the row
140
+ - Specify how to map line. This is where you do the actual work
141
+ (for instance, if you process a file line by line) or put together data for
142
+ processing after the file has been fully read --- see the next step.
143
+
144
+ Dreader now knows ho to collect, shape, and tranform (map) data according to
145
+ your instructions. We are now ready to do the actual work. This consists of
146
+ the following steps, various of which can be performed together:
147
+
148
+ - Read the file
149
+ - Do the parsing/transformations
150
+ - Compute the virtual columns
151
+ - Do the mappings
152
+
153
+ Each step is described in more details in the following sections.
154
+
155
+ *** Declare which is the input file and where we can find data
156
+
157
+ Require =dreader= and declare a class which extends =Dreader::Engine=:
158
+
159
+ #+BEGIN_EXAMPLE ruby
160
+ require 'dreader'
161
+
162
+ class Reader
163
+ extend Dreader::Engine
164
+ [...]
165
+ end
166
+ #+END_EXAMPLE
167
+
168
+ In the class specify parsing option, using the following syntax:
169
+
170
+ #+BEGIN_EXAMPLE ruby
171
+ options do
172
+ filename 'example.ods'
173
+
174
+ sheet 'Sheet 1'
175
+
176
+ first_row 1
177
+ last_row 20
178
+
179
+ # optional (this allows to integrate with other applications already
180
+ # using a logger)
181
+ logger Logger.new
182
+ logger_level Logger::INFO
183
+ end
184
+ #+END_EXAMPLE
185
+
186
+ where:
187
+
188
+ - (optional) =filename= is the file to read. If not specified, you will
189
+ have to supply a filename when loading the file (see =read=, below).
190
+ The extension determines the file type. *Use =.tsv= for tab-separated
191
+ files.*
192
+ - (optional) =first_row= is the first line to read (use =2= if your file
193
+ has a header)
194
+ - (optional) =last_row= is the last line to read. If not specified, we
195
+ will rely on =roo= to determine the last row. This is useful for
196
+ those files in which you only want to process some of the content or
197
+ contain "garbage" after the records.
198
+ - (optional) =sheet= is the sheet name or number to read from. If not
199
+ specified, the first (default) sheet is used
200
+ - (optional) =debug= specifies that we are debugging
201
+ - (optional) =logger= specifies the logger
202
+ - (optional) =logger_level= specifies the logger level
203
+
204
+ You can override some of the defaults by passing a hash as argument to
205
+ the =read= function. For instance:
206
+
207
+ #+BEGIN_EXAMPLE ruby
208
+ Reader.read filename: another_filepath
209
+ #+END_EXAMPLE
210
+
211
+ will read data from =another_filepath=, rather than from the filename
212
+ specified in the options. This might be useful, for instance, if the
213
+ same specification has to be used for different files.
214
+
215
+
216
+ *** Declare the content of columns and how to parse them
217
+
218
+ Declare the columns you want to read by assigning them a name and a column
219
+ reference.
220
+
221
+ There are two notations:
222
+
223
+ #+BEGIN_EXAMPLE ruby
224
+ # First notation, colref is put in the block
225
+ column :name do
226
+ colref 'A'
227
+ end
228
+
229
+ # Second notation, a hash is passed in the name
230
+ column({ name: 'A' }) do
231
+ end
232
+ #+END_EXAMPLE
233
+
234
+ The reference to a column can either be a letter or a number. First column
235
+ is ='A'= or =1=.
236
+
237
+ The =column= declaration can contain Ruby blocks:
238
+
239
+ - one or more =check_raw= block check raw data as read from the input
240
+ file. They can be used, for instance, to verify presence of a value in the
241
+ input file. *Check must return true if there are no errors; any other
242
+ value (e.g. an array of messages) is considered an error.*
243
+ - =process= can be used to transform data into something closer to the input
244
+ data required for the importing (e.g., it can be used for downcase or
245
+ strip a string)
246
+ - one or more =check= block perform a check on the =process=ed data, to check
247
+ for errors. They can be used, for instance, to check that a model built with
248
+ =process= is valid. *Check must return true if there are no errors.*
249
+
250
+ #+begin_example
251
+ column({ name: 'A' }) do
252
+ check_raw do |cell|
253
+ !cell.nil?
254
+ end
255
+ end
256
+ #+end_example
257
+
258
+ #+begin_quote
259
+ *If you declare more than a check block of the same type per column, use a
260
+ unique symbol to distinguish the blocks or the error messages will be
261
+ overwritten*.
262
+ #+end_quote
263
+
264
+ #+begin_example
265
+ column({ name: 'A' }) do
266
+ check_raw :must_be_non_nil do |cell|
267
+ !cell.nil?
268
+ end
269
+
270
+ check_raw :first_letter_must_be_a do |cell|
271
+ cell[0] == 'A'
272
+ end
273
+ end
274
+ #+end_example
275
+
276
+ #+begin_quote
277
+ =process= is always executed before =check=. If you want to check raw data
278
+ use the =check_raw= directive.
279
+ #+end_quote
280
+
281
+ #+begin_quote
282
+ There can be only one process block. *If you define more than one per
283
+ column, only the last one is executed.*
284
+ #+end_quote
285
+
286
+ #+begin_example
287
+ column({ name: 'A' }) do
288
+ check_raw do |cell|
289
+ # Here cell is like in the input file
290
+ end
291
+
292
+ process do |cell|
293
+ cell.upcase
294
+ end
295
+
296
+ check do |cell|
297
+ # Here cell is upcase and
298
+ end
299
+ end
300
+ #+end_example
301
+
302
+ For instance, given the tabular data:
303
+
304
+ | Name | Date of birth |
305
+ |------------------+-----------------|
306
+ | Forest Whitaker | July 15, 1961 |
307
+ | Daniel Day-Lewis | April 29, 1957 |
308
+ | Sean Penn | August 17, 1960 |
309
+
310
+ we could use the following declaration to specify the data to read:
311
+
312
+ #+BEGIN_EXAMPLE ruby
313
+ # we want to access column 1 using :name (1 and A are equivalent)
314
+ # :name should be non nil and of length greater than 0
315
+ column :name do
316
+ colref 1
317
+ check do |x|
318
+ x and x.length > 0
319
+ end
320
+ end
321
+
322
+ # we want to access column 2 (Date of birth) using :birthdate
323
+ column :birthdate do
324
+ colref 2
325
+
326
+ # make sure the column is transformed into a Date
327
+ process do |x|
328
+ Date.parse(x)
329
+ end
330
+
331
+ # check age is a date (check is invoked on the value returned
332
+ # by process)
333
+ check do |x|
334
+ x.class == Date
335
+ end
336
+ end
337
+ #+END_EXAMPLE
338
+
339
+ #+BEGIN_NOTES
340
+ 1. The column name can be anything Ruby can use as a key for a Hash,
341
+ such as, for instance, symbols, strings, and even object instances.
342
+ 2. =colref= can be a string (e.g., ='A'=) or an integer, with
343
+ 1 and "A" being the first column.
344
+ 3. *You need to declare only the columns you want to import.* For
345
+ instance, we could skip the declaration for column 1, if 'Date of
346
+ Birth' is the only data we want to import
347
+ 4. If =process= and =check= are specified, then =check= will receive the
348
+ result of invoking =process= on the cell value. This makes sense if
349
+ process is used to make the cell value more accessible to ruby code
350
+ (e.g., transforming a string into an integer).
351
+ #+END_NOTES
352
+
353
+ If there are different columns that have to be read and processed in the same
354
+ way, =columns= (notice the plural form) allows for a more compact
355
+ representation:
356
+
357
+ #+BEGIN_EXAMPLE ruby
358
+ columns { a: 'A', b: 'B' }
359
+ #+END_EXAMPLE
360
+
361
+ is equivalent to:
362
+
363
+ #+BEGIN_EXAMPLE ruby
364
+ column :a do
365
+ colref 'A'
366
+ end
367
+
368
+ column :b do
369
+ colref 'B'
370
+ end
371
+ #+END_EXAMPLE
372
+
373
+ =columns= accepts a code block, which can be used to add =process= and =check=
374
+ declarations:
375
+
376
+ #+BEGIN_EXAMPLE ruby
377
+ columns({ a: 'A', b: 'B' }) do
378
+ process do |cell|
379
+ ...
380
+ end
381
+ end
382
+ #+END_EXAMPLE
383
+
384
+ See [[file:examples/wikipedia_us_cities/us_cities_bulk_declare.rb][us_cities_bulk_declare.rb]] for an example of =columns=.
385
+
386
+ #+BEGIN_NOTES
387
+ If you use code blocks, don't forget to put in parentheses the
388
+ column mapping, or the Ruby parser won't be able to distinguish the
389
+ hash from the code block.
390
+ #+END_NOTES
391
+
392
+
393
+ *** Add virtual columns
394
+
395
+ Sometimes it is convenient to aggregate or otherwise manipulate the data
396
+ read from each row, before doing the actual processing.
397
+
398
+ For instance, we might have a table with dates of birth, while we are
399
+ really interested in the age of people.
400
+
401
+ In such cases, we can use virtual column. A *virtual column* allows
402
+ one to add a column to the data read, computed using the values of
403
+ other cells in the same row.
404
+
405
+ The following declaration adds an =age= column to each row of the data
406
+ read from the previous example:
407
+
408
+ #+BEGIN_EXAMPLE ruby
409
+ virtual_column :age do
410
+ process do |row|
411
+ # the function `compute_birthday` has to be defined
412
+ compute_birthday(row[:birthdate])
413
+ end
414
+ end
415
+ #+END_EXAMPLE
416
+
417
+ Virtual columns are, of course, available to the =mapping= directive
418
+ (see below).
419
+
420
+
421
+ *** Specify how to process each line
422
+
423
+ The =mapping= directive specifies what to do with each line read. The
424
+ =mapping= declaration takes an arbitrary piece of ruby code, which can
425
+ reference the fields using the column names we declared.
426
+
427
+ For instance the following code gets the value of column =:name=, the
428
+ value of column =:age= and prints them to standard output
429
+
430
+ #+BEGIN_EXAMPLE ruby
431
+ mapping do |row|
432
+ puts "#{row[:name][:value]} is #{row[:age][:value]} years old"
433
+ end
434
+ #+END_EXAMPLE
435
+
436
+ The data read from each row of our input data is stored in a hash. The hash
437
+ uses column names as the primary key and stores the values in the =:value=
438
+ key.
439
+
440
+
441
+ *** Process data
442
+
443
+ If =mapping= does not work for your data processing activities (e.g., you need
444
+ to make elaborations on data which span different rows), you can add your perform
445
+ your elaborations on the data transformed by =mappings=.
446
+
447
+ A typical scenario works as follows:
448
+
449
+ 1. Reference the class =i = Reader= and use =i.read= or =i.load=
450
+ (synonyms), to read all data.
451
+
452
+ #+BEGIN_EXAMPLE ruby
453
+ i = Reader
454
+ i.read
455
+
456
+ # alternatively
457
+ Reader.read
458
+ #+END_EXAMPLE
459
+
460
+ 2. Use =errors= to see whether any of the check functions failed:
461
+
462
+ #+BEGIN_EXAMPLE ruby
463
+ array_of_hashes = i.errors
464
+ array_of_hashes.each do |error_hash|
465
+ puts error_hash
466
+ end
467
+ #+END_EXAMPLE
468
+
469
+ 3. Use =virtual_columns= to generate the virtual columns:
470
+
471
+ #+BEGIN_EXAMPLE ruby
472
+ i.virtual_columns
473
+ #+END_EXAMPLE
474
+
475
+ (Optionally: check again for errors.)
476
+
477
+ 4. Use the =mappings= function to execute the =mapping= directive on each line
478
+ read from the file.
479
+
480
+ #+BEGIN_EXAMPLE ruby
481
+ i.mappings
482
+ #+END_EXAMPLE
483
+
484
+ (Optionally: check again for errors.)
485
+
486
+ 5. Add your own code to process the data returned after =mappings=, which you
487
+ can access with =i.table= or =i.data= (synonyms).
488
+
489
+ Look in the examples directory for further details and a couple of working
490
+ examples.
491
+
492
+ *** Improving performances
493
+
494
+ While debugging your specification executing =read=, =virtual_columns=, and
495
+ =mappings= in distinct steps is a good idea. When you go in production, you
496
+ might want to reduce the number of passes you perform on the data.
497
+
498
+ You can pass the option =virtual: true= to =read= to compute virtual
499
+ columns while you are reading data.
500
+
501
+ You can pass the option =mapping: true= to =read= to compute virtual
502
+ columns and perform the mapping while you are reading data. Notice that:
503
+
504
+ - =mapping= implies =virtual=, that is, if you pass =mapping: true= the read
505
+ function will also compute virtual columns
506
+ - =mapping= alters the content of =@table= and **subsequent calls to
507
+ =virtual_column= and =mapping= will fail.** You have reset by invoking
508
+ =read= again.
509
+
510
+ *** Managing Errors
511
+
512
+ **** Finding errors in input data
513
+
514
+ Dreader collects errors in three specific ways:
515
+
516
+ 1. In each column specification, using =check_raw= and =check=. This allows
517
+ to check each field for errors (e.g., a =nil= value in a cell)
518
+ 2. In virtual columns, using =check_raw= and =check=. This allows to perform
519
+ more complex checks by putting together all the values read from a row
520
+ (e.g., =to_date= occurs before =from_date=)
521
+
522
+ The following, for instance checks that name or surname have a valid value:
523
+
524
+ #+begin_example ruby
525
+ virtual_column :global_check do
526
+ doc "Name or Surname must exist"
527
+ check :name_or_surname_must_be_defined do |row|
528
+ row[:name] || row[:surname]
529
+ end
530
+ end
531
+ #+end_example
532
+
533
+ If you prefer, you can also define a virtual column that contains the value of
534
+ the check:
535
+
536
+ #+begin_example ruby
537
+ virtual_column :name_or_surname_exist do
538
+ doc "Name or Surname must exist"
539
+ process do |row|
540
+ row[:name] || row[:surname]
541
+ end
542
+ end
543
+ #+end_example
544
+
545
+ You can then act in the mapping directive according to value returned by the
546
+ virtual column:
547
+
548
+ #+begin_example ruby
549
+ mapping do |row|
550
+ unless row[:global_check][:value] == false
551
+ [...]
552
+ end
553
+ #+end_example
554
+
555
+ **** Managing Errors
556
+
557
+ You can check for errors in two different ways:
558
+
559
+ The first is in the =mapping= directive, where can check whether some checks
560
+ for the =row= failed, by:
561
+
562
+ 1. checking from the =:error= boolean key associated to each column, that is:
563
+
564
+ =row[<column_name>][:error]=
565
+
566
+ 2. looking at the value of the =:row_errors= key, which contains all error
567
+ messages generated for the row:
568
+
569
+ =row[:row_errors]=
570
+
571
+ 3. After the processing, by using the method =errors=, which lists all the
572
+ errors.
573
+
574
+ The utility function =Dreader::Util.errors= takes as input the errors generated
575
+ by Dreader and extract those of a specific row and, optionally column:
576
+
577
+ #+begin_example ruby
578
+ # get all the errors at line 2
579
+ Dreader::Util.errors i.errors, 2
580
+
581
+ # get all the errors at line 2, column 'C'
582
+ Dreader::Util.errors i.errors, 2, 3
583
+ #+end_example
584
+
585
+
586
+ * Generating a Template from the specification
587
+
588
+ From version 0.6.0 =dreader= allows to generate a template starting from the
589
+ specification.
590
+
591
+ The template is generated by the following call:
592
+
593
+ #+begin_example ruby
594
+ generate_template template_filename: "template.xlsx"
595
+ #+end_example
596
+
597
+ (The =template_filename= directive can also be specified in the =options=
598
+ section).
599
+
600
+ The template contains the following rows:
601
+
602
+ - The first row contains the names of the columns, as specified in the
603
+ =columns= declarations and made into a human readable form.
604
+ - The second row contains the doc strings of the columns, if set.
605
+ - The remaining rows contain the example records added with the
606
+ =example= directive
607
+
608
+ The position of the first row is determined by the value of =first_row=, that
609
+ is, if =first_row= is 2 (content starts from the second row), the header row
610
+ is put in row 1.
611
+
612
+ Only Excel is supported, at the moment.
613
+
614
+ An example of template generation can be found in the Examples.
615
+
616
+ ** Digging deeper
617
+
618
+ If you need to perform elaborations which cannot be performed row by
619
+ row you can access all data, with the =table= method:
620
+
621
+ #+BEGIN_EXAMPLE ruby
622
+ i.read
623
+ i.table
624
+ #+END_EXAMPLE
625
+
626
+ The function =i.table= returns an array of Hashes. Each element of
627
+ the array is a row of the input file. Each element/row has the
628
+ following structure:
629
+
630
+ #+BEGIN_EXAMPLE ruby
631
+ {
632
+ col_name1: { <info about col_name_1 in row_j> },
633
+ [...]
634
+ col_nameN: { <info about col_name_N in row_j> },
635
+ row_errors: [ <errors associated to row> ],
636
+ row_number: <row number>
637
+ }
638
+ #+END_EXAMPLE
639
+
640
+ where =col_name1=, ..., =col_nameN= are the names you have assigned to
641
+ the columns and the information stored for each cell is the
642
+ following:
643
+
644
+ #+BEGIN_EXAMPLE ruby
645
+ {
646
+ value: ..., # the result of calling process on the cell
647
+ row_number: ..., # the row number
648
+ col_number: ..., # the column number
649
+ error: ... # the result of calling check on the cell processed value
650
+ }
651
+ #+END_EXAMPLE
652
+
653
+ (Note that virtual columns only store =value= and a Boolean =virtual=,
654
+ which is always =true=.)
655
+
656
+ Thus, for instance, given the example above returns:
657
+
658
+ #+BEGIN_EXAMPLE ruby
659
+ i.table
660
+ [
661
+ {
662
+ name: { value: "John", row_number: 1, col_number: 1, errors: nil },
663
+ age: { value: 30, row_number: 1, col_number: 2, errors: nil }
664
+ },
665
+ {
666
+ name: { value: "Jane", row_number: 2, col_number: 1, errors: nil },
667
+ age: { value: 31, row_number: 2, col_number: 2, errors: nil }
668
+ }
669
+ ]
670
+ #+END_EXAMPLE
671
+
672
+
673
+ * Simplifying the hash with the data read
674
+
675
+ The =Dreader::Util= class provides some functions to simplify the hashes built
676
+ by =dreader=. This is useful to simplify the code you write and to genereate
677
+ hashes you can pass, for instance, to ActiveRecord creators.
678
+
679
+ ** Simplify removes everything but the values
680
+
681
+ =Dreader::Util.simplify(hash)= removes all information but the value and makes
682
+ the value accessible directly from the name of the column.
683
+
684
+ #+BEGIN_EXAMPLE ruby
685
+ i.table[0]
686
+ {
687
+ name: { value: "John", row_number: 1, col_number: 1, errors: nil },
688
+ age: { value: 30, row_number: 1, col_number: 2, errors: nil }
689
+ }
690
+
691
+ Dreader::Util.simplify i.table[0]
692
+ { name: "John", age: 30 }
693
+ #+END_EXAMPLE
694
+
695
+ ** Slice and Clean select columns
696
+
697
+ =Dreader::Util.slice(hash, keys)= and =Dreader::Util.clean(hash, keys)=, where
698
+ =keys= is an arrays of keys, are respectively used to select or remove some
699
+ keys from the hash returned by Dreader. (Notice that the Ruby Hash class
700
+ already provides similar methods.)
701
+
702
+ #+BEGIN_EXAMPLE ruby
703
+ i.table[0]
704
+ {
705
+ name: { value: "John", row_number: 1, col_number: 1, errors: nil },
706
+ age: { value: 30, row_number: 1, col_number: 2, errors: nil }
707
+ }
708
+
709
+ Dreader::Util.slice i.table[0], :name
710
+ { name: { value: "John", row_number: 1, col_number: 1, errors: nil}
711
+
712
+ Dreader::Util.clean i.table[0], :name
713
+ { age: { value: 30, row_number: 1, col_number: 2, errors: nil }
714
+ #+END_EXAMPLE
715
+
716
+ The methods =slice= and =clean= are more useful when used in conjuction with
717
+ =simplify=:
718
+
719
+ #+BEGIN_EXAMPLE ruby
720
+ hash = Dreader::Util.simplify i.table[0]
721
+ { name: "John", age: 30 }
722
+
723
+ Dreader::Util.slice hash, [:age]
724
+ { age: 30 }
725
+
726
+ Dreader::Util.clean hash, [:age]
727
+ { name: "John" }
728
+ #+END_EXAMPLE
729
+
730
+ The output produced by =slice= and =simplify= is a hash which can be used to
731
+ create an =ActiveRecord= object.
732
+
733
+ ** Better Integration with ActiveRecord
734
+
735
+ Finally, the =Dreader::Util.restructure= method helps building hashes to create
736
+ [[http://api.rubyonrails.org/classes/ActiveModel/Model.html][ActiveModel]] objects with nested attributes.
737
+
738
+ **The starting point is a simplified row.**
739
+
740
+ #+BEGIN_EXAMPLE ruby
741
+ hash = { name: "John", surname: "Doe", address: "Unknown", city: "NY" }
742
+
743
+ Dreader::Util.restructure hash, [:name, :surname, :address_attributes, [:address, :city]]
744
+ { name: "John", surname: "Doe", address_attributes: { address: "Unknown", city: "NY" } }
745
+ #+END_EXAMPLE
746
+
747
+
748
+ * Debugging your specification
749
+
750
+ The =debug= function prints the current configuration, reads some records from
751
+ the input file(s), and shows the records read:
752
+
753
+ #+BEGIN_EXAMPLE ruby
754
+ i.debug
755
+ i.debug n: 40 # read 40 lines (from first_row)
756
+ i.debug n: 40, filename: filepath # like above, but read from filepath
757
+ #+END_EXAMPLE
758
+
759
+ By default =debug= invokes the =check_raw=, =process=, and =check=
760
+ directives. Pass the following options, if you want to disable this behavior;
761
+ this might be useful, for instance, if you intend to check only what data is
762
+ read:
763
+
764
+ #+BEGIN_EXAMPLE ruby
765
+ i.debug process: false, check: false
766
+ #+END_EXAMPLE
767
+
768
+ Notice that =check= implies =process=, since =check= is invoked on the output
769
+ of the =process= directive.`
770
+
771
+ If you prefer, in alternative to =debug= you can also use configuration
772
+ variables (but then you need to change the configuration according to the
773
+ environment):
774
+
775
+ #+begin_example ruby
776
+ i.options do
777
+ debug true
778
+ end
779
+ #+end_example
780
+
781
+
782
+ * Changelog
783
+
784
+ See [[file:CHANGELOG.org][CHANGELOG]].
785
+
786
+ * Known Limitations
787
+
788
+ At the moment:
789
+
790
+ - it is not possible to specify column references using header names
791
+ (like Roo does).
792
+ - some more testing wouldn't hurt.
793
+
794
+ * Known Bugs
795
+
796
+ Some known bugs and an unknown number of unknown bugs.
797
+
798
+ (See the open issues for the known bugs.)
799
+
800
+ * Development
801
+
802
+ After checking out the repo, run =bin/setup= to install dependencies.
803
+ You can also run =bin/console= for an interactive prompt that will
804
+ allow you to experiment.
805
+
806
+ To install this gem onto your local machine, run =bundle exec rake
807
+ install=. To release a new version, update the version number in
808
+ =version.rb=, and then run =bundle exec rake release=, which will
809
+ create a git tag for the version, push git commits and tags, and push
810
+ the =.gem= file to [[https://rubygems.org][rubygems.org]].
811
+
812
+ * Contributing
813
+
814
+ Bug reports and pull requests are welcome.
815
+
816
+ You need to get in touch with me by email, till I figure how to enable
817
+ it in Gitea.
818
+
819
+ * License
820
+
821
+ [[https://opensource.org/licenses/MIT][MIT License]].