dreader 1.1.1 → 1.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.org +15 -0
- data/README.org +40 -14
- data/examples/age_csv/Birthdays-TabSeparated.csv +13 -0
- data/examples/age_csv/Birthdays.csv +13 -0
- data/examples/age_csv/age.rb +55 -0
- data/examples/age_noext/Birthdays +0 -0
- data/examples/age_noext/Birthdays-xlsx +0 -0
- data/examples/age_noext/Birthdays-xlsx-with-wrong-extension.xls +0 -0
- data/examples/age_noext/age.rb +73 -0
- data/examples/wikipedia_us_cities/us_cities_reject.rb +77 -0
- data/lib/dreader/engine.rb +62 -18
- data/lib/dreader/version.rb +1 -1
- metadata +11 -3
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 3c30be2fe49c6c8ce20d4930c75f1279ac1a92a099f609b1266b14dc61c7cf3c
|
|
4
|
+
data.tar.gz: 58c735a67c45ef11a180bc6f17892ba912656d346320da1a716caa69661f4695
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: 7847892dbcf648432a9c51867fd70e1260e956e82ea7cbbad93f92882dd867be6f36f67a0edfbb972e16e12c1364efe92a54bd03d31801770da4b28fac725350
|
|
7
|
+
data.tar.gz: a717955a2eaa0c406d6fb140daf9cd084c11e0d4710289de34b7757b5fd4f4e920ded4fd1450306e3e42a42278c983e078ff8e53248fe0f5b7a93d09fb8a9d40
|
data/CHANGELOG.org
CHANGED
|
@@ -1,5 +1,20 @@
|
|
|
1
1
|
#+TITLE: Changelog
|
|
2
2
|
|
|
3
|
+
* Version 1.2.0 - <2023-11-02 Thu>
|
|
4
|
+
** reject declaration
|
|
5
|
+
|
|
6
|
+
- A new reject declaration allows to reject some lines. reject takes as
|
|
7
|
+
input a row and can predicate over columns and virtual columns. When
|
|
8
|
+
true, the corresponding line is discarded.
|
|
9
|
+
|
|
10
|
+
* Version 1.1.2 - <2023-10-31 Tue>
|
|
11
|
+
** Fixes an issue with the :extension option
|
|
12
|
+
|
|
13
|
+
- Fixes a bug related to =:extension= and adds a working example, to test
|
|
14
|
+
the feature
|
|
15
|
+
- Changes the extension from a string to a symbol. No initial dot required
|
|
16
|
+
any longer
|
|
17
|
+
|
|
3
18
|
* Version 1.1.1 - <2023-10-16 Mon>
|
|
4
19
|
** Adds option :extension
|
|
5
20
|
|
data/README.org
CHANGED
|
@@ -137,7 +137,8 @@ To write an import function with Dreader:
|
|
|
137
137
|
and check parsed data
|
|
138
138
|
- Add virtual columns, that is, columns computed from other values
|
|
139
139
|
in the row
|
|
140
|
-
- Specify
|
|
140
|
+
- Specify what lines you want to reject, if any
|
|
141
|
+
- Specify how to transform lines. This is where you do the actual work
|
|
141
142
|
(for instance, if you process a file line by line) or put together data for
|
|
142
143
|
processing after the file has been fully read --- see the next step.
|
|
143
144
|
|
|
@@ -165,12 +166,13 @@ Require =dreader= and declare a class which extends =Dreader::Engine=:
|
|
|
165
166
|
end
|
|
166
167
|
#+END_EXAMPLE
|
|
167
168
|
|
|
168
|
-
|
|
169
|
+
Specify parsing option in the class, using the following syntax:
|
|
169
170
|
|
|
170
171
|
#+BEGIN_EXAMPLE ruby
|
|
171
172
|
options do
|
|
172
173
|
filename 'example.ods'
|
|
173
|
-
|
|
174
|
+
# this optional. Use it when the file does not have an extension
|
|
175
|
+
extension :ods
|
|
174
176
|
|
|
175
177
|
sheet 'Sheet 1'
|
|
176
178
|
|
|
@@ -190,10 +192,10 @@ where:
|
|
|
190
192
|
to supply a filename when loading the file (see =read=, below). *Use
|
|
191
193
|
=.tsv= for tab-separated files.*
|
|
192
194
|
- (optional) =extension= overrides or specify the extension of =filename=.
|
|
193
|
-
Takes as input
|
|
194
|
-
**value of this option is not appended to filename** (see =read=
|
|
195
|
-
Filename must thus be a valid reference to a file in the file
|
|
196
|
-
option is useful in one of these two circumstances:
|
|
195
|
+
Takes as input a symbol (e.g., =:xlsx=).
|
|
196
|
+
Notice that **value of this option is not appended to filename** (see =read=
|
|
197
|
+
below). Filename must thus be a valid reference to a file in the file
|
|
198
|
+
system. This option is useful in one of these two circumstances:
|
|
197
199
|
1. When =filename= has no extension
|
|
198
200
|
2. When you want to override the extension of the filename, e.g., to force
|
|
199
201
|
reading a "file.csv" as a tab separated file
|
|
@@ -397,6 +399,9 @@ See [[file:examples/wikipedia_us_cities/us_cities_bulk_declare.rb][us_cities_bul
|
|
|
397
399
|
hash from the code block.
|
|
398
400
|
#+END_NOTES
|
|
399
401
|
|
|
402
|
+
The data read from each row of our input data is stored in a hash. The hash
|
|
403
|
+
uses column names as the primary key and stores the values in the =:value=
|
|
404
|
+
key.
|
|
400
405
|
|
|
401
406
|
*** Add virtual columns
|
|
402
407
|
|
|
@@ -426,6 +431,22 @@ Virtual columns are, of course, available to the =mapping= directive
|
|
|
426
431
|
(see below).
|
|
427
432
|
|
|
428
433
|
|
|
434
|
+
*** Specify which lines to reject
|
|
435
|
+
|
|
436
|
+
You can reject some lines using the =reject= declaration, which is applied row
|
|
437
|
+
by row, can predicate over columns and virtual columns, and has to return a
|
|
438
|
+
Boolean value.
|
|
439
|
+
|
|
440
|
+
All lines returning a truish value will be be rejected, that is, not stored in
|
|
441
|
+
the =@table= variable (and, consequently, passed to the mapping function).
|
|
442
|
+
|
|
443
|
+
For instance, the following declaration rejects all lines in which the
|
|
444
|
+
population column is higher than =3_000_000=:
|
|
445
|
+
|
|
446
|
+
#+begin_src ruby
|
|
447
|
+
reject { |row| row[:population][:value] > 3_000_000 }
|
|
448
|
+
#+end_src
|
|
449
|
+
|
|
429
450
|
*** Specify how to process each line
|
|
430
451
|
|
|
431
452
|
The =mapping= directive specifies what to do with each line read. The
|
|
@@ -441,10 +462,9 @@ value of column =:age= and prints them to standard output
|
|
|
441
462
|
end
|
|
442
463
|
#+END_EXAMPLE
|
|
443
464
|
|
|
444
|
-
|
|
445
|
-
|
|
446
|
-
|
|
447
|
-
|
|
465
|
+
To invoke the =mapping= declaration on a file, use the =mappings= method,
|
|
466
|
+
which invokes =map= to each row and it stores in the =@table= variable
|
|
467
|
+
whatever value mapping returns.
|
|
448
468
|
|
|
449
469
|
*** Process data
|
|
450
470
|
|
|
@@ -464,8 +484,8 @@ A typical scenario works as follows:
|
|
|
464
484
|
# examples:
|
|
465
485
|
# i.read
|
|
466
486
|
# i.read filename: "example.ods"
|
|
467
|
-
# i.read filename: "example.ods", extension:
|
|
468
|
-
# i.read filename: "example", extension:
|
|
487
|
+
# i.read filename: "example.ods", extension: :ods
|
|
488
|
+
# i.read filename: "example", extension: :ods
|
|
469
489
|
# (the line above opens the file "example" as an Open Document Spreasdheet)
|
|
470
490
|
i.read
|
|
471
491
|
|
|
@@ -500,7 +520,13 @@ A typical scenario works as follows:
|
|
|
500
520
|
(Optionally: check again for errors.)
|
|
501
521
|
|
|
502
522
|
5. Add your own code to process the data returned after =mappings=, which you
|
|
503
|
-
can
|
|
523
|
+
can assign to a variable (e.g., =returned_data = i.mappings=) or access
|
|
524
|
+
with =i.table= or =i.data= (synonyms).
|
|
525
|
+
|
|
526
|
+
#+begin_quote
|
|
527
|
+
Notice that =mappings= does a side effect and invoking the mapping twice in a
|
|
528
|
+
row won't work: you need to reload the file first.
|
|
529
|
+
#+end_quote
|
|
504
530
|
|
|
505
531
|
Look in the examples directory for further details and a couple of working
|
|
506
532
|
examples.
|
|
@@ -0,0 +1,13 @@
|
|
|
1
|
+
Name Date of birth
|
|
2
|
+
Forest Whitaker July 15, 1961
|
|
3
|
+
Daniel Day-Lewis April 29, 1957
|
|
4
|
+
Sean Penn August 17, 1960
|
|
5
|
+
Jeff Bridges December 4, 1949
|
|
6
|
+
Colin Firth September 10, 1960
|
|
7
|
+
Jean Dujardin June 19, 1972
|
|
8
|
+
Daniel Day-Lewis April 29, 1957
|
|
9
|
+
Matthew McConaughey November 4, 1969
|
|
10
|
+
Eddie Redmayne January 6, 1982
|
|
11
|
+
Leonardo DiCaprio November 11, 1974
|
|
12
|
+
Casey Affleck August 12, 1975
|
|
13
|
+
Gary Oldman March 21, 1958
|
|
@@ -0,0 +1,13 @@
|
|
|
1
|
+
Name,Date of birth
|
|
2
|
+
Forest Whitaker,"July 15, 1961"
|
|
3
|
+
Daniel Day-Lewis,"April 29, 1957"
|
|
4
|
+
Sean Penn,"August 17, 1960"
|
|
5
|
+
Jeff Bridges,"December 4, 1949"
|
|
6
|
+
Colin Firth,"September 10, 1960"
|
|
7
|
+
Jean Dujardin,"June 19, 1972"
|
|
8
|
+
Daniel Day-Lewis,"April 29, 1957"
|
|
9
|
+
Matthew McConaughey,"November 4, 1969"
|
|
10
|
+
Eddie Redmayne,"January 6, 1982"
|
|
11
|
+
Leonardo DiCaprio,"November 11, 1974"
|
|
12
|
+
Casey Affleck,"August 12, 1975"
|
|
13
|
+
Gary Oldman,"March 21, 1958"
|
|
@@ -0,0 +1,55 @@
|
|
|
1
|
+
require "dreader"
|
|
2
|
+
|
|
3
|
+
class Reader
|
|
4
|
+
extend Dreader::Engine
|
|
5
|
+
|
|
6
|
+
options do
|
|
7
|
+
first_row 2
|
|
8
|
+
debug true
|
|
9
|
+
end
|
|
10
|
+
|
|
11
|
+
column :name do
|
|
12
|
+
doc "A is the name string"
|
|
13
|
+
colref 'A'
|
|
14
|
+
end
|
|
15
|
+
|
|
16
|
+
column :birthdate do
|
|
17
|
+
doc "Birthdate contains a full date (i.e., including the year)"
|
|
18
|
+
colref 'B'
|
|
19
|
+
|
|
20
|
+
process do |c|
|
|
21
|
+
Date.parse(c)
|
|
22
|
+
end
|
|
23
|
+
end
|
|
24
|
+
|
|
25
|
+
virtual_column :age do
|
|
26
|
+
process do |row|
|
|
27
|
+
birthdate = row[:birthdate][:value]
|
|
28
|
+
birthday = Date.new(Date.today.year, birthdate.month, birthdate.day)
|
|
29
|
+
today = Date.today
|
|
30
|
+
|
|
31
|
+
[0, today.year - birthdate.year - (birthday < today ? 1 : 0)].max
|
|
32
|
+
end
|
|
33
|
+
end
|
|
34
|
+
|
|
35
|
+
mapping do |row|
|
|
36
|
+
r = Dreader::Util.simplify(row)
|
|
37
|
+
puts "#{r[:name]} is #{r[:age]} years old (born on #{r[:birthdate]})"
|
|
38
|
+
end
|
|
39
|
+
end
|
|
40
|
+
|
|
41
|
+
i = Reader
|
|
42
|
+
i.read filename: "Birthdays.csv", mapping: true
|
|
43
|
+
|
|
44
|
+
i.read filename: "Birthdays-TabSeparated.csv", extension: :tsv, mapping: true
|
|
45
|
+
|
|
46
|
+
#
|
|
47
|
+
# Here we can do further processing on the data
|
|
48
|
+
#
|
|
49
|
+
File.open("ages.txt", "w") do |file|
|
|
50
|
+
i.table.each do |row|
|
|
51
|
+
unless row[:row_errors].any?
|
|
52
|
+
file.puts "#{row[:name][:value]} #{row[:age][:value]}"
|
|
53
|
+
end
|
|
54
|
+
end
|
|
55
|
+
end
|
|
Binary file
|
|
Binary file
|
|
Binary file
|
|
@@ -0,0 +1,73 @@
|
|
|
1
|
+
require "dreader"
|
|
2
|
+
|
|
3
|
+
class Reader
|
|
4
|
+
extend Dreader::Engine
|
|
5
|
+
|
|
6
|
+
options do
|
|
7
|
+
first_row 2
|
|
8
|
+
debug true
|
|
9
|
+
extension :ods
|
|
10
|
+
end
|
|
11
|
+
|
|
12
|
+
column :name do
|
|
13
|
+
doc "A is the name string"
|
|
14
|
+
colref 'A'
|
|
15
|
+
end
|
|
16
|
+
|
|
17
|
+
column :birthdate do
|
|
18
|
+
doc "Birthdate contains a full date (i.e., including the year)"
|
|
19
|
+
colref 'B'
|
|
20
|
+
|
|
21
|
+
process do |c|
|
|
22
|
+
Date.parse(c)
|
|
23
|
+
end
|
|
24
|
+
end
|
|
25
|
+
|
|
26
|
+
virtual_column :age do
|
|
27
|
+
process do |row|
|
|
28
|
+
birthdate = row[:birthdate][:value]
|
|
29
|
+
birthday = Date.new(Date.today.year, birthdate.month, birthdate.day)
|
|
30
|
+
today = Date.today
|
|
31
|
+
|
|
32
|
+
[0, today.year - birthdate.year - (birthday < today ? 1 : 0)].max
|
|
33
|
+
end
|
|
34
|
+
end
|
|
35
|
+
|
|
36
|
+
mapping do |row|
|
|
37
|
+
r = Dreader::Util.simplify(row)
|
|
38
|
+
puts "#{r[:name]} is #{r[:age]} years old (born on #{r[:birthdate]})"
|
|
39
|
+
end
|
|
40
|
+
end
|
|
41
|
+
|
|
42
|
+
puts
|
|
43
|
+
puts "*****************************************************************"
|
|
44
|
+
puts "Reading ODS with no extension, using extension set in the options"
|
|
45
|
+
puts "*****************************************************************"
|
|
46
|
+
puts
|
|
47
|
+
|
|
48
|
+
i = Reader
|
|
49
|
+
i.read filename: "Birthdays"
|
|
50
|
+
i.virtual_columns
|
|
51
|
+
i.mappings
|
|
52
|
+
|
|
53
|
+
puts
|
|
54
|
+
puts "*****************************************************************"
|
|
55
|
+
puts "Reading XLSX with wrong extension, overriding existing extension"
|
|
56
|
+
puts "*****************************************************************"
|
|
57
|
+
puts
|
|
58
|
+
|
|
59
|
+
i = Reader
|
|
60
|
+
i.read filename: "Birthdays-xlsx-with-wrong-extension.xls", extension: :xlsx
|
|
61
|
+
i.virtual_columns
|
|
62
|
+
i.mappings
|
|
63
|
+
|
|
64
|
+
puts
|
|
65
|
+
puts "*****************************************************************"
|
|
66
|
+
puts "Reading XLSX with no extension"
|
|
67
|
+
puts "*****************************************************************"
|
|
68
|
+
puts
|
|
69
|
+
|
|
70
|
+
i = Reader
|
|
71
|
+
i.read filename: "Birthdays-xlsx", extension: :xlsx
|
|
72
|
+
i.virtual_columns
|
|
73
|
+
i.mappings
|
|
@@ -0,0 +1,77 @@
|
|
|
1
|
+
require 'dreader'
|
|
2
|
+
|
|
3
|
+
# this is the class which will contain all the data we read from the file
|
|
4
|
+
class City
|
|
5
|
+
[:city, :state, :population, :lat, :lon].each do |var|
|
|
6
|
+
attr_accessor var
|
|
7
|
+
end
|
|
8
|
+
|
|
9
|
+
def initialize(hash)
|
|
10
|
+
hash.each do |k, v|
|
|
11
|
+
self.send("#{k}=", v)
|
|
12
|
+
end
|
|
13
|
+
end
|
|
14
|
+
end
|
|
15
|
+
|
|
16
|
+
class Importer
|
|
17
|
+
extend Dreader::Engine
|
|
18
|
+
|
|
19
|
+
# read from us_cities.tsv, lines from 2 to 10 (included)
|
|
20
|
+
options do
|
|
21
|
+
filename "us_cities.tsv"
|
|
22
|
+
first_row 2
|
|
23
|
+
last_row 307
|
|
24
|
+
end
|
|
25
|
+
|
|
26
|
+
# these are the columns for which we only need to specify column and name
|
|
27
|
+
columns ({city: 2, state: 3, latlon: 11}) do
|
|
28
|
+
process { |val| val.strip }
|
|
29
|
+
end
|
|
30
|
+
|
|
31
|
+
# the population column requires more work
|
|
32
|
+
column :population do |col|
|
|
33
|
+
col.colref 4
|
|
34
|
+
|
|
35
|
+
# make "3,000" into 3000 (int)
|
|
36
|
+
col.process { |value| value.gsub(",", "").to_i }
|
|
37
|
+
|
|
38
|
+
# check population is positive
|
|
39
|
+
col.check { |value| value > 0 }
|
|
40
|
+
end
|
|
41
|
+
|
|
42
|
+
# reject all cities with more than 3M people
|
|
43
|
+
reject do |row|
|
|
44
|
+
row[:population][:value] >= 3_000_000
|
|
45
|
+
end
|
|
46
|
+
|
|
47
|
+
mapping do |row|
|
|
48
|
+
# remove all additional information stored in each cell
|
|
49
|
+
r = Dreader::Util.simplify row
|
|
50
|
+
|
|
51
|
+
# make latlon into the lat, lon fields
|
|
52
|
+
r[:lat], r[:lon] = r[:latlon].split(" ")
|
|
53
|
+
|
|
54
|
+
# now r contains something like
|
|
55
|
+
# {lat: ..., lon: ..., city: ..., state: ..., population: ..., latlon: ...}
|
|
56
|
+
|
|
57
|
+
# remove fields which are not understood by the Cities class and
|
|
58
|
+
# make a new instance
|
|
59
|
+
cleaned = Dreader::Util.clean r, [:latlon]
|
|
60
|
+
|
|
61
|
+
# you must declare an array cities before calling importer.mapping
|
|
62
|
+
City.new(cleaned)
|
|
63
|
+
end
|
|
64
|
+
end
|
|
65
|
+
|
|
66
|
+
# load and process
|
|
67
|
+
importer = Importer
|
|
68
|
+
importer.load mapping: true, debug: true
|
|
69
|
+
|
|
70
|
+
# output everything to see whether it works
|
|
71
|
+
puts "First ten cities in the US with less than 3M (source Wikipedia)"
|
|
72
|
+
importer.table.each do |city|
|
|
73
|
+
[:city, :state, :population, :lat, :lon].each do |var|
|
|
74
|
+
puts "#{var.to_s.capitalize}: #{city.send(var)}"
|
|
75
|
+
end
|
|
76
|
+
puts ""
|
|
77
|
+
end
|
data/lib/dreader/engine.rb
CHANGED
|
@@ -21,6 +21,8 @@ module Dreader
|
|
|
21
21
|
attr_accessor :declared_virtual_columns
|
|
22
22
|
# the mapping rules
|
|
23
23
|
attr_accessor :declared_mapping
|
|
24
|
+
# the declared filter
|
|
25
|
+
attr_accessor :declared_reject
|
|
24
26
|
|
|
25
27
|
# the data we read
|
|
26
28
|
attr_reader :table
|
|
@@ -118,6 +120,11 @@ module Dreader
|
|
|
118
120
|
@declared_virtual_columns << column.to_hash.merge({ name: name })
|
|
119
121
|
end
|
|
120
122
|
|
|
123
|
+
# define a filter, which skips some rows
|
|
124
|
+
def reject(&block)
|
|
125
|
+
@declared_reject = block
|
|
126
|
+
end
|
|
127
|
+
|
|
121
128
|
# define what we do with each line we read
|
|
122
129
|
# - `block` is the code which takes as input a `row` and processes
|
|
123
130
|
# `row` is a hash in which each spreadsheet cell is accessible under
|
|
@@ -187,8 +194,13 @@ module Dreader
|
|
|
187
194
|
# this has side-effects on r
|
|
188
195
|
virtual_columns_on(r) if options[:virtual] || options[:mapping]
|
|
189
196
|
|
|
197
|
+
# check whether the filter would ignore this line
|
|
198
|
+
# notice that we need to invoke compact to avoid nil being added
|
|
199
|
+
# to the table
|
|
200
|
+
next if !options[:ignore_reject] && reject?(r)
|
|
201
|
+
|
|
190
202
|
options[:mapping] ? mappings_on(r) : r
|
|
191
|
-
end
|
|
203
|
+
end.compact
|
|
192
204
|
end
|
|
193
205
|
|
|
194
206
|
# TODO: PASS A ROW (and not row_number and sheet)
|
|
@@ -268,6 +280,7 @@ module Dreader
|
|
|
268
280
|
|
|
269
281
|
# Compute virtual columns for, with side effect on row
|
|
270
282
|
def virtual_columns_on(row)
|
|
283
|
+
@declared_virtual_columns ||= []
|
|
271
284
|
@declared_virtual_columns.each do |virtualcol|
|
|
272
285
|
colname = virtualcol[:name]
|
|
273
286
|
row[colname] = { virtual: true }
|
|
@@ -291,13 +304,36 @@ module Dreader
|
|
|
291
304
|
end
|
|
292
305
|
end
|
|
293
306
|
|
|
294
|
-
#
|
|
295
|
-
|
|
307
|
+
# check whether a line has to be rejected
|
|
308
|
+
def reject?(row)
|
|
309
|
+
rejected = @declared_reject&.call(row)
|
|
310
|
+
if rejected
|
|
311
|
+
@logger.debug "[dreader] row rejected by reject declaration #{row}"
|
|
312
|
+
end
|
|
313
|
+
end
|
|
314
|
+
|
|
315
|
+
# apply the mapping code to the @table. Notice that we do a side effect
|
|
316
|
+
# on @table and, hence, invoking the mapping twice won't work (you need to
|
|
317
|
+
# reload first).
|
|
318
|
+
#
|
|
319
|
+
# the mapping is applied only if it defined and it returns the output of
|
|
320
|
+
# the mapping.
|
|
296
321
|
#
|
|
297
|
-
#
|
|
298
|
-
#
|
|
322
|
+
# notice also that we do a side-effect on @table. This is to make the
|
|
323
|
+
# behavior of
|
|
324
|
+
#
|
|
325
|
+
# i.load mapping: true
|
|
326
|
+
# i.table
|
|
327
|
+
#
|
|
328
|
+
# and
|
|
329
|
+
#
|
|
330
|
+
# i = load;
|
|
331
|
+
# i.mappings
|
|
332
|
+
# i.table
|
|
333
|
+
#
|
|
334
|
+
# the same
|
|
299
335
|
def mappings
|
|
300
|
-
@table.map { |row| mappings_on(row) }
|
|
336
|
+
@table = @table.map { |row| mappings_on(row) }
|
|
301
337
|
end
|
|
302
338
|
|
|
303
339
|
def mappings_on(row)
|
|
@@ -398,28 +434,36 @@ module Dreader
|
|
|
398
434
|
# list of keys we support in options. We remove them when reading
|
|
399
435
|
# the CSV file
|
|
400
436
|
OPTION_KEYS = %i[
|
|
401
|
-
filename sheet first_row last_row
|
|
437
|
+
filename extension sheet first_row last_row
|
|
438
|
+
logger logger_level
|
|
439
|
+
debug
|
|
402
440
|
]
|
|
403
441
|
|
|
404
442
|
def open_spreadsheet(options)
|
|
405
443
|
filename = options[:filename]
|
|
406
|
-
|
|
444
|
+
# use the extension option or make ".CSV" into :csv
|
|
445
|
+
extension = options[:extension] || File.extname(filename).downcase[1..-1]&.to_sym
|
|
446
|
+
|
|
447
|
+
|
|
448
|
+
# TODO: MAKE DEBUG AND LOGGER INTO REAL CLASS VARIABLES OR MAKE LOCAL AND/OR FUNCTIONS
|
|
449
|
+
@debug = @declared_options.merge(options)[:debug] == true
|
|
450
|
+
if @debug
|
|
451
|
+
@logger = options[:logger] || Logger.new($stdout)
|
|
452
|
+
@logger.debug "[dreader open_spreadsheet] filename: #{filename}"
|
|
453
|
+
@logger.debug "[dreader open_spreadsheet] extension: #{extension}"
|
|
454
|
+
end
|
|
407
455
|
|
|
408
|
-
case
|
|
409
|
-
when
|
|
456
|
+
case extension
|
|
457
|
+
when :csv
|
|
410
458
|
csv_options = @declared_options.except(*OPTION_KEYS)
|
|
411
459
|
Roo::CSV.new(filename, csv_options:)
|
|
412
|
-
when
|
|
460
|
+
when :tsv
|
|
413
461
|
csv_options = @declared_options.except(*OPTION_KEYS).merge({ col_sep: "\t" })
|
|
414
462
|
Roo::CSV.new(filename, csv_options:)
|
|
415
|
-
when
|
|
416
|
-
Roo::
|
|
417
|
-
when ".xls"
|
|
418
|
-
Roo::Excel.new(filename)
|
|
419
|
-
when ".xlsx"
|
|
420
|
-
Roo::Excelx.new(filename)
|
|
463
|
+
when :ods, :xls, :xlsx
|
|
464
|
+
Roo::Spreadsheet.open(filename, extension:)
|
|
421
465
|
else
|
|
422
|
-
raise "Unknown extension: #{ext}"
|
|
466
|
+
raise "Unknown extension: #{ext}. Use the :extension option."
|
|
423
467
|
end
|
|
424
468
|
end
|
|
425
469
|
|
data/lib/dreader/version.rb
CHANGED
metadata
CHANGED
|
@@ -1,14 +1,14 @@
|
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
|
2
2
|
name: dreader
|
|
3
3
|
version: !ruby/object:Gem::Version
|
|
4
|
-
version: 1.
|
|
4
|
+
version: 1.2.0
|
|
5
5
|
platform: ruby
|
|
6
6
|
authors:
|
|
7
7
|
- Adolfo Villafiorita
|
|
8
8
|
autorequire:
|
|
9
9
|
bindir: exe
|
|
10
10
|
cert_chain: []
|
|
11
|
-
date: 2023-
|
|
11
|
+
date: 2023-11-02 00:00:00.000000000 Z
|
|
12
12
|
dependencies:
|
|
13
13
|
- !ruby/object:Gem::Dependency
|
|
14
14
|
name: roo
|
|
@@ -108,6 +108,13 @@ files:
|
|
|
108
108
|
- dreader.gemspec
|
|
109
109
|
- examples/age/Birthdays.ods
|
|
110
110
|
- examples/age/age.rb
|
|
111
|
+
- examples/age_csv/Birthdays-TabSeparated.csv
|
|
112
|
+
- examples/age_csv/Birthdays.csv
|
|
113
|
+
- examples/age_csv/age.rb
|
|
114
|
+
- examples/age_noext/Birthdays
|
|
115
|
+
- examples/age_noext/Birthdays-xlsx
|
|
116
|
+
- examples/age_noext/Birthdays-xlsx-with-wrong-extension.xls
|
|
117
|
+
- examples/age_noext/age.rb
|
|
111
118
|
- examples/age_with_multiple_checks/Birthdays.ods
|
|
112
119
|
- examples/age_with_multiple_checks/age_with_multiple_checks.rb
|
|
113
120
|
- examples/local_vars/local_vars.rb
|
|
@@ -117,6 +124,7 @@ files:
|
|
|
117
124
|
- examples/wikipedia_us_cities/us_cities.rb
|
|
118
125
|
- examples/wikipedia_us_cities/us_cities.tsv
|
|
119
126
|
- examples/wikipedia_us_cities/us_cities_bulk_declare.rb
|
|
127
|
+
- examples/wikipedia_us_cities/us_cities_reject.rb
|
|
120
128
|
- lib/dreader.rb
|
|
121
129
|
- lib/dreader/column.rb
|
|
122
130
|
- lib/dreader/engine.rb
|
|
@@ -142,7 +150,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
|
142
150
|
- !ruby/object:Gem::Version
|
|
143
151
|
version: '0'
|
|
144
152
|
requirements: []
|
|
145
|
-
rubygems_version: 3.4.
|
|
153
|
+
rubygems_version: 3.4.21
|
|
146
154
|
signing_key:
|
|
147
155
|
specification_version: 4
|
|
148
156
|
summary: Process and import data from cvs and spreadsheets
|