dreader 0.2.1 → 0.3.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/Gemfile.lock +1 -1
- data/README.md +100 -42
- data/examples/age/Birthdays.ods +0 -0
- data/examples/age/age.rb +39 -0
- data/lib/dreader/version.rb +1 -1
- data/lib/dreader.rb +28 -1
- metadata +4 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: b4e929eff1813efc3d2021430773275b213146e00adb01c5ad620bb3a6dfb98b
|
4
|
+
data.tar.gz: 3a9d4ecd7dce0713b29ef54550eb5a026176184798f7ae0bfbe1e634812e912a
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: d31972c25fbc073211e9f398133b11d76dd71667cc4f5042a738d37304c9d0be8a88fd61fb446685c5867c9111ad81798d23b31b2017a396082352855a7af16f
|
7
|
+
data.tar.gz: c57d6b9e059c31262f728748c011a2631a91e64365ce72eea0554492bf7512db430dd9600dea80e64dbda50ec2256848cff2733e432b484e272458f6641bad2f
|
data/Gemfile.lock
CHANGED
data/README.md
CHANGED
@@ -46,6 +46,8 @@ Or install it yourself as:
|
|
46
46
|
|
47
47
|
## Usage
|
48
48
|
|
49
|
+
### Declare the file you want to read
|
50
|
+
|
49
51
|
Require `dreader` and declare an instance of the `Dreader::Engine` class:
|
50
52
|
|
51
53
|
```ruby
|
@@ -79,20 +81,28 @@ where:
|
|
79
81
|
* (optional) `sheet` is the sheet name or number to read from. If not
|
80
82
|
specified, the first (default) sheet is used
|
81
83
|
|
82
|
-
|
83
|
-
|
84
|
-
|
84
|
+
### Declare the columns you want to read
|
85
|
+
|
86
|
+
Declare the columns you want to read by assigning them a name and a
|
87
|
+
column reference:
|
88
|
+
|
89
|
+
```ruby
|
90
|
+
# we will access column A in Ruby code using :name
|
91
|
+
i.column :name
|
92
|
+
colref 'A'
|
93
|
+
end
|
94
|
+
```
|
85
95
|
|
86
96
|
You can also specify two ruby blocks, `process` and `check` to
|
87
97
|
preprocess data and to check for errors.
|
88
98
|
|
89
99
|
For instance, given the following file:
|
90
100
|
|
91
|
-
| Name
|
92
|
-
|
93
|
-
|
|
94
|
-
|
|
95
|
-
|
|
101
|
+
| Name | Date of birth |
|
102
|
+
|------------------|-----------------|
|
103
|
+
| Forest Whitaker | July 15, 1961 |
|
104
|
+
| Daniel Day-Lewis | April 29, 1957 |
|
105
|
+
| Sean Penn | August 17, 1960 |
|
96
106
|
|
97
107
|
we could use the following declaration to specify the data to read:
|
98
108
|
|
@@ -106,19 +116,19 @@ i.column :name do
|
|
106
116
|
end
|
107
117
|
end
|
108
118
|
|
109
|
-
# we want to access column
|
110
|
-
|
111
|
-
|
112
|
-
colref 3
|
119
|
+
# we want to access column 2 (Date of birth) using :birthdate
|
120
|
+
i.column :birthdate do
|
121
|
+
colref 2
|
113
122
|
|
114
123
|
# make sure the column is transformed into an integer
|
115
124
|
process do |x|
|
116
|
-
x
|
125
|
+
Date.parse(x)
|
117
126
|
end
|
118
127
|
|
119
|
-
# check age is
|
128
|
+
# check age is a date (check is invoked on the value returned
|
129
|
+
# by process)
|
120
130
|
check do |x|
|
121
|
-
x
|
131
|
+
x.class == Date
|
122
132
|
end
|
123
133
|
end
|
124
134
|
|
@@ -126,13 +136,51 @@ end
|
|
126
136
|
# we are done with our declarations)
|
127
137
|
```
|
128
138
|
|
129
|
-
|
130
|
-
|
139
|
+
**Remarks:**
|
140
|
+
|
141
|
+
1. `colref` can be a string (e.g., `'A'`) or an integer, in which case
|
142
|
+
the first column is one
|
143
|
+
|
144
|
+
2. you need to declare only the columns you want to import. For
|
145
|
+
instance, we could skip the declaration for column 1, if 'Date of
|
146
|
+
Birth' is the only data we want to import
|
147
|
+
|
148
|
+
3. If `process` and `check` are specified, then `check` will receive
|
149
|
+
the result of invoking `process` on the cell value. This makes
|
150
|
+
sense if process is used to make the cell value more accessible to
|
151
|
+
ruby code (e.g., transforming a string into an integer).
|
152
|
+
|
153
|
+
|
154
|
+
### Add virtual columns, if you want
|
155
|
+
|
156
|
+
Sometimes it is convenient to aggregate or otherwise manipulate the
|
157
|
+
data read from each row before doing the actual processing.
|
158
|
+
|
159
|
+
For instance, we might have a table with dates of birth, while we are
|
160
|
+
really interested in the age of people.
|
161
|
+
|
162
|
+
In such cases, we can use virtual column. A **virtual column** allows
|
163
|
+
one to add a column to the data read. The value of the column for
|
164
|
+
each row is computed using the values of other cells.
|
165
|
+
|
166
|
+
Virtual columns are declared similar to columns. Thus, for instance,
|
167
|
+
the following declaration adds an `age` column to each row of the data
|
168
|
+
we read from the previous example:
|
169
|
+
|
170
|
+
```ruby
|
171
|
+
i.virtual_column :age do
|
172
|
+
process do |row|
|
173
|
+
# `compute_birthday` has to be defined
|
174
|
+
compute_birthday(row[:birthdate])
|
175
|
+
end
|
176
|
+
end
|
177
|
+
```
|
178
|
+
|
179
|
+
Virtual columns are, of course, available to the `mapping` directive
|
180
|
+
(see below).
|
131
181
|
|
132
|
-
|
133
|
-
|
134
|
-
makes sense if process is used to make the cell value more accessible
|
135
|
-
to ruby code (e.g., transforming a string into an integer).
|
182
|
+
|
183
|
+
### Specify how to process data
|
136
184
|
|
137
185
|
Finally we can specify how we process lines, using the `mapping`
|
138
186
|
directive. Mapping takes an arbitrary piece of ruby code, which can
|
@@ -146,22 +194,26 @@ i.mapping do |row|
|
|
146
194
|
end
|
147
195
|
```
|
148
196
|
|
149
|
-
Notice that the data read from
|
150
|
-
|
197
|
+
Notice that the data read from each row of our input data is stored in
|
198
|
+
a hash. The hash uses column names as the primary key and stores
|
199
|
+
the values in the `:value` key.
|
200
|
+
|
201
|
+
### Start working with the data
|
151
202
|
|
152
|
-
|
203
|
+
We are now all set and we can start working with the data.
|
153
204
|
|
154
205
|
First use `read` or `load` (synonyms), to read all data and put it
|
155
|
-
into a `@table` instance variable.
|
156
|
-
declarations to read data and executes the `process` and `check`
|
157
|
-
functions for each cell read.
|
206
|
+
into a `@table` instance variable.
|
158
207
|
|
159
208
|
```ruby
|
160
209
|
i.read
|
161
210
|
```
|
162
211
|
|
163
|
-
|
164
|
-
|
212
|
+
Read applies all the `column` and `virtual_column` declarations and
|
213
|
+
buils a hash with the data read.
|
214
|
+
|
215
|
+
After reading the file we can use `errors` to see whether any of the
|
216
|
+
`check` functions failed:
|
165
217
|
|
166
218
|
```ruby
|
167
219
|
array_of_strings = i.errors
|
@@ -170,29 +222,31 @@ array_of_strings ech do |error_line|
|
|
170
222
|
end
|
171
223
|
```
|
172
224
|
|
173
|
-
|
174
|
-
|
225
|
+
Finally we can use the `process` function to execute the `mapping`
|
226
|
+
directive to each line read from the file.
|
175
227
|
|
176
228
|
```ruby
|
177
229
|
i.process
|
178
230
|
```
|
179
231
|
|
180
|
-
|
181
|
-
|
182
|
-
returns an array of hashes (see next section for the details).
|
232
|
+
Look in the examples directory for further details and a couple of
|
233
|
+
working examples.
|
183
234
|
|
184
|
-
```ruby
|
185
|
-
i.table
|
186
|
-
```
|
187
235
|
|
188
|
-
|
189
|
-
examples directory.
|
236
|
+
## Digging deeper
|
190
237
|
|
238
|
+
If you need to perform more elaborations on the data which cannot be
|
239
|
+
captured with `process` (that is, by processing the data row by row),
|
240
|
+
you can also directly access all data read, using the `table` method:
|
191
241
|
|
192
|
-
|
242
|
+
```ruby
|
243
|
+
i.read
|
244
|
+
i.table
|
245
|
+
# an array of hashes (one hash per row)
|
246
|
+
```
|
193
247
|
|
194
|
-
|
195
|
-
hashes. Each hash represents a line of the file.
|
248
|
+
More in details, the `read` method fills a `@table` instance variable
|
249
|
+
with an array of hashes. Each hash represents a line of the file.
|
196
250
|
|
197
251
|
Each hash contains one key per column, following your specification.
|
198
252
|
Its value is, in turn, a hash with the following structure:
|
@@ -206,6 +260,9 @@ Its value is, in turn, a hash with the following structure:
|
|
206
260
|
}
|
207
261
|
```
|
208
262
|
|
263
|
+
(Note that virtual columns only store `value` and a Boolean `virtual`,
|
264
|
+
which is always `true`.)
|
265
|
+
|
209
266
|
Thus, for instance, given the example above:
|
210
267
|
|
211
268
|
```ruby
|
@@ -274,6 +331,7 @@ i.debug 40, filename # like above, but read from filename
|
|
274
331
|
Another possibility is getting the value of the `@table` variable,
|
275
332
|
which contains all the data read.
|
276
333
|
|
334
|
+
|
277
335
|
## Known Limitations
|
278
336
|
|
279
337
|
At the moment:
|
Binary file
|
data/examples/age/age.rb
ADDED
@@ -0,0 +1,39 @@
|
|
1
|
+
require 'dreader'
|
2
|
+
|
3
|
+
i = Dreader::Engine.new
|
4
|
+
|
5
|
+
i.options do
|
6
|
+
first_row 2
|
7
|
+
end
|
8
|
+
|
9
|
+
i.column :name do
|
10
|
+
colref 'A'
|
11
|
+
end
|
12
|
+
|
13
|
+
i.column :birthdate do
|
14
|
+
colref 'B'
|
15
|
+
|
16
|
+
process do |c|
|
17
|
+
Date.parse(c)
|
18
|
+
end
|
19
|
+
end
|
20
|
+
|
21
|
+
i.virtual_column :age do
|
22
|
+
process do |row|
|
23
|
+
birthdate = row[:birthdate][:value]
|
24
|
+
birthday = Date.new(Date.today.year, birthdate.month, birthdate.day)
|
25
|
+
today = Date.today
|
26
|
+
|
27
|
+
[0, today.year - birthdate.year - (birthday < today ? 1 : 0)].max
|
28
|
+
end
|
29
|
+
end
|
30
|
+
|
31
|
+
i.mapping do |row|
|
32
|
+
r = Dreader::Util.simplify(row)
|
33
|
+
puts "#{r[:name]} is #{r[:age]} years old (born on #{r[:birthdate]})"
|
34
|
+
end
|
35
|
+
|
36
|
+
i.read "Birthdays.ods"
|
37
|
+
i.virtual_columns
|
38
|
+
i.process
|
39
|
+
|
data/lib/dreader/version.rb
CHANGED
data/lib/dreader.rb
CHANGED
@@ -103,11 +103,12 @@ module Dreader
|
|
103
103
|
# the specification of the columns to process
|
104
104
|
attr_reader :colspec
|
105
105
|
# the data we read
|
106
|
-
attr_reader :
|
106
|
+
attr_reader :table
|
107
107
|
|
108
108
|
def initialize
|
109
109
|
@options = {}
|
110
110
|
@colspec = []
|
111
|
+
@virtualcols = []
|
111
112
|
end
|
112
113
|
|
113
114
|
# define a DSL for options
|
@@ -132,6 +133,19 @@ module Dreader
|
|
132
133
|
@colspec << column.to_hash.merge({name: name})
|
133
134
|
end
|
134
135
|
|
136
|
+
# virtual columns define derived attributes
|
137
|
+
# the code specified in the virtual column is executed after reading
|
138
|
+
# a row and before applying the mapping function
|
139
|
+
#
|
140
|
+
# virtual colum declarations are executed in the order in which
|
141
|
+
# they are defined
|
142
|
+
def virtual_column name, &block
|
143
|
+
column = Column.new
|
144
|
+
column.instance_eval &block
|
145
|
+
|
146
|
+
@virtualcols << column.to_hash.merge({name: name})
|
147
|
+
end
|
148
|
+
|
135
149
|
# define what we do with each line we read
|
136
150
|
# - `block` is the code which takes as input a `row` and processes
|
137
151
|
# `row` is a hash in which each spreadsheet cell is accessible under
|
@@ -218,6 +232,19 @@ module Dreader
|
|
218
232
|
@errors
|
219
233
|
end
|
220
234
|
|
235
|
+
def virtual_columns
|
236
|
+
# execute the virtual column specification
|
237
|
+
@virtualcols.each do |virtualcol|
|
238
|
+
@table.each do |r|
|
239
|
+
# add the cell to the table
|
240
|
+
r[virtualcol[:name]] = {
|
241
|
+
value: virtualcol[:process].call(r),
|
242
|
+
virtual: true,
|
243
|
+
}
|
244
|
+
end
|
245
|
+
end
|
246
|
+
end
|
247
|
+
|
221
248
|
# apply the mapping code to the array
|
222
249
|
# it makes sense to invoke it only
|
223
250
|
def process
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: dreader
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.3.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Adolfo Villafiorita
|
8
8
|
autorequire:
|
9
9
|
bindir: exe
|
10
10
|
cert_chain: []
|
11
|
-
date: 2018-03-
|
11
|
+
date: 2018-03-26 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: bundler
|
@@ -77,6 +77,8 @@ files:
|
|
77
77
|
- bin/console
|
78
78
|
- bin/setup
|
79
79
|
- dreader.gemspec
|
80
|
+
- examples/age/Birthdays.ods
|
81
|
+
- examples/age/age.rb
|
80
82
|
- examples/wikipedia_big_us_cities/big_us_cities.rb
|
81
83
|
- examples/wikipedia_big_us_cities/cities_by_state.ods
|
82
84
|
- examples/wikipedia_us_cities/us_cities.rb
|