dreader 0.2.1 → 0.3.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/Gemfile.lock +1 -1
- data/README.md +100 -42
- data/examples/age/Birthdays.ods +0 -0
- data/examples/age/age.rb +39 -0
- data/lib/dreader/version.rb +1 -1
- data/lib/dreader.rb +28 -1
- metadata +4 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: b4e929eff1813efc3d2021430773275b213146e00adb01c5ad620bb3a6dfb98b
|
4
|
+
data.tar.gz: 3a9d4ecd7dce0713b29ef54550eb5a026176184798f7ae0bfbe1e634812e912a
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: d31972c25fbc073211e9f398133b11d76dd71667cc4f5042a738d37304c9d0be8a88fd61fb446685c5867c9111ad81798d23b31b2017a396082352855a7af16f
|
7
|
+
data.tar.gz: c57d6b9e059c31262f728748c011a2631a91e64365ce72eea0554492bf7512db430dd9600dea80e64dbda50ec2256848cff2733e432b484e272458f6641bad2f
|
data/Gemfile.lock
CHANGED
data/README.md
CHANGED
@@ -46,6 +46,8 @@ Or install it yourself as:
|
|
46
46
|
|
47
47
|
## Usage
|
48
48
|
|
49
|
+
### Declare the file you want to read
|
50
|
+
|
49
51
|
Require `dreader` and declare an instance of the `Dreader::Engine` class:
|
50
52
|
|
51
53
|
```ruby
|
@@ -79,20 +81,28 @@ where:
|
|
79
81
|
* (optional) `sheet` is the sheet name or number to read from. If not
|
80
82
|
specified, the first (default) sheet is used
|
81
83
|
|
82
|
-
|
83
|
-
|
84
|
-
|
84
|
+
### Declare the columns you want to read
|
85
|
+
|
86
|
+
Declare the columns you want to read by assigning them a name and a
|
87
|
+
column reference:
|
88
|
+
|
89
|
+
```ruby
|
90
|
+
# we will access column A in Ruby code using :name
|
91
|
+
i.column :name
|
92
|
+
colref 'A'
|
93
|
+
end
|
94
|
+
```
|
85
95
|
|
86
96
|
You can also specify two ruby blocks, `process` and `check` to
|
87
97
|
preprocess data and to check for errors.
|
88
98
|
|
89
99
|
For instance, given the following file:
|
90
100
|
|
91
|
-
| Name
|
92
|
-
|
93
|
-
|
|
94
|
-
|
|
95
|
-
|
|
101
|
+
| Name | Date of birth |
|
102
|
+
|------------------|-----------------|
|
103
|
+
| Forest Whitaker | July 15, 1961 |
|
104
|
+
| Daniel Day-Lewis | April 29, 1957 |
|
105
|
+
| Sean Penn | August 17, 1960 |
|
96
106
|
|
97
107
|
we could use the following declaration to specify the data to read:
|
98
108
|
|
@@ -106,19 +116,19 @@ i.column :name do
|
|
106
116
|
end
|
107
117
|
end
|
108
118
|
|
109
|
-
# we want to access column
|
110
|
-
|
111
|
-
|
112
|
-
colref 3
|
119
|
+
# we want to access column 2 (Date of birth) using :birthdate
|
120
|
+
i.column :birthdate do
|
121
|
+
colref 2
|
113
122
|
|
114
123
|
# make sure the column is transformed into an integer
|
115
124
|
process do |x|
|
116
|
-
x
|
125
|
+
Date.parse(x)
|
117
126
|
end
|
118
127
|
|
119
|
-
# check age is
|
128
|
+
# check age is a date (check is invoked on the value returned
|
129
|
+
# by process)
|
120
130
|
check do |x|
|
121
|
-
x
|
131
|
+
x.class == Date
|
122
132
|
end
|
123
133
|
end
|
124
134
|
|
@@ -126,13 +136,51 @@ end
|
|
126
136
|
# we are done with our declarations)
|
127
137
|
```
|
128
138
|
|
129
|
-
|
130
|
-
|
139
|
+
**Remarks:**
|
140
|
+
|
141
|
+
1. `colref` can be a string (e.g., `'A'`) or an integer, in which case
|
142
|
+
the first column is one
|
143
|
+
|
144
|
+
2. you need to declare only the columns you want to import. For
|
145
|
+
instance, we could skip the declaration for column 1, if 'Date of
|
146
|
+
Birth' is the only data we want to import
|
147
|
+
|
148
|
+
3. If `process` and `check` are specified, then `check` will receive
|
149
|
+
the result of invoking `process` on the cell value. This makes
|
150
|
+
sense if process is used to make the cell value more accessible to
|
151
|
+
ruby code (e.g., transforming a string into an integer).
|
152
|
+
|
153
|
+
|
154
|
+
### Add virtual columns, if you want
|
155
|
+
|
156
|
+
Sometimes it is convenient to aggregate or otherwise manipulate the
|
157
|
+
data read from each row before doing the actual processing.
|
158
|
+
|
159
|
+
For instance, we might have a table with dates of birth, while we are
|
160
|
+
really interested in the age of people.
|
161
|
+
|
162
|
+
In such cases, we can use virtual column. A **virtual column** allows
|
163
|
+
one to add a column to the data read. The value of the column for
|
164
|
+
each row is computed using the values of other cells.
|
165
|
+
|
166
|
+
Virtual columns are declared similar to columns. Thus, for instance,
|
167
|
+
the following declaration adds an `age` column to each row of the data
|
168
|
+
we read from the previous example:
|
169
|
+
|
170
|
+
```ruby
|
171
|
+
i.virtual_column :age do
|
172
|
+
process do |row|
|
173
|
+
# `compute_birthday` has to be defined
|
174
|
+
compute_birthday(row[:birthdate])
|
175
|
+
end
|
176
|
+
end
|
177
|
+
```
|
178
|
+
|
179
|
+
Virtual columns are, of course, available to the `mapping` directive
|
180
|
+
(see below).
|
131
181
|
|
132
|
-
|
133
|
-
|
134
|
-
makes sense if process is used to make the cell value more accessible
|
135
|
-
to ruby code (e.g., transforming a string into an integer).
|
182
|
+
|
183
|
+
### Specify how to process data
|
136
184
|
|
137
185
|
Finally we can specify how we process lines, using the `mapping`
|
138
186
|
directive. Mapping takes an arbitrary piece of ruby code, which can
|
@@ -146,22 +194,26 @@ i.mapping do |row|
|
|
146
194
|
end
|
147
195
|
```
|
148
196
|
|
149
|
-
Notice that the data read from
|
150
|
-
|
197
|
+
Notice that the data read from each row of our input data is stored in
|
198
|
+
a hash. The hash uses column names as the primary key and stores
|
199
|
+
the values in the `:value` key.
|
200
|
+
|
201
|
+
### Start working with the data
|
151
202
|
|
152
|
-
|
203
|
+
We are now all set and we can start working with the data.
|
153
204
|
|
154
205
|
First use `read` or `load` (synonyms), to read all data and put it
|
155
|
-
into a `@table` instance variable.
|
156
|
-
declarations to read data and executes the `process` and `check`
|
157
|
-
functions for each cell read.
|
206
|
+
into a `@table` instance variable.
|
158
207
|
|
159
208
|
```ruby
|
160
209
|
i.read
|
161
210
|
```
|
162
211
|
|
163
|
-
|
164
|
-
|
212
|
+
Read applies all the `column` and `virtual_column` declarations and
|
213
|
+
buils a hash with the data read.
|
214
|
+
|
215
|
+
After reading the file we can use `errors` to see whether any of the
|
216
|
+
`check` functions failed:
|
165
217
|
|
166
218
|
```ruby
|
167
219
|
array_of_strings = i.errors
|
@@ -170,29 +222,31 @@ array_of_strings ech do |error_line|
|
|
170
222
|
end
|
171
223
|
```
|
172
224
|
|
173
|
-
|
174
|
-
|
225
|
+
Finally we can use the `process` function to execute the `mapping`
|
226
|
+
directive to each line read from the file.
|
175
227
|
|
176
228
|
```ruby
|
177
229
|
i.process
|
178
230
|
```
|
179
231
|
|
180
|
-
|
181
|
-
|
182
|
-
returns an array of hashes (see next section for the details).
|
232
|
+
Look in the examples directory for further details and a couple of
|
233
|
+
working examples.
|
183
234
|
|
184
|
-
```ruby
|
185
|
-
i.table
|
186
|
-
```
|
187
235
|
|
188
|
-
|
189
|
-
examples directory.
|
236
|
+
## Digging deeper
|
190
237
|
|
238
|
+
If you need to perform more elaborations on the data which cannot be
|
239
|
+
captured with `process` (that is, by processing the data row by row),
|
240
|
+
you can also directly access all data read, using the `table` method:
|
191
241
|
|
192
|
-
|
242
|
+
```ruby
|
243
|
+
i.read
|
244
|
+
i.table
|
245
|
+
# an array of hashes (one hash per row)
|
246
|
+
```
|
193
247
|
|
194
|
-
|
195
|
-
hashes. Each hash represents a line of the file.
|
248
|
+
More in details, the `read` method fills a `@table` instance variable
|
249
|
+
with an array of hashes. Each hash represents a line of the file.
|
196
250
|
|
197
251
|
Each hash contains one key per column, following your specification.
|
198
252
|
Its value is, in turn, a hash with the following structure:
|
@@ -206,6 +260,9 @@ Its value is, in turn, a hash with the following structure:
|
|
206
260
|
}
|
207
261
|
```
|
208
262
|
|
263
|
+
(Note that virtual columns only store `value` and a Boolean `virtual`,
|
264
|
+
which is always `true`.)
|
265
|
+
|
209
266
|
Thus, for instance, given the example above:
|
210
267
|
|
211
268
|
```ruby
|
@@ -274,6 +331,7 @@ i.debug 40, filename # like above, but read from filename
|
|
274
331
|
Another possibility is getting the value of the `@table` variable,
|
275
332
|
which contains all the data read.
|
276
333
|
|
334
|
+
|
277
335
|
## Known Limitations
|
278
336
|
|
279
337
|
At the moment:
|
Binary file
|
data/examples/age/age.rb
ADDED
@@ -0,0 +1,39 @@
|
|
1
|
+
require 'dreader'
|
2
|
+
|
3
|
+
i = Dreader::Engine.new
|
4
|
+
|
5
|
+
i.options do
|
6
|
+
first_row 2
|
7
|
+
end
|
8
|
+
|
9
|
+
i.column :name do
|
10
|
+
colref 'A'
|
11
|
+
end
|
12
|
+
|
13
|
+
i.column :birthdate do
|
14
|
+
colref 'B'
|
15
|
+
|
16
|
+
process do |c|
|
17
|
+
Date.parse(c)
|
18
|
+
end
|
19
|
+
end
|
20
|
+
|
21
|
+
i.virtual_column :age do
|
22
|
+
process do |row|
|
23
|
+
birthdate = row[:birthdate][:value]
|
24
|
+
birthday = Date.new(Date.today.year, birthdate.month, birthdate.day)
|
25
|
+
today = Date.today
|
26
|
+
|
27
|
+
[0, today.year - birthdate.year - (birthday < today ? 1 : 0)].max
|
28
|
+
end
|
29
|
+
end
|
30
|
+
|
31
|
+
i.mapping do |row|
|
32
|
+
r = Dreader::Util.simplify(row)
|
33
|
+
puts "#{r[:name]} is #{r[:age]} years old (born on #{r[:birthdate]})"
|
34
|
+
end
|
35
|
+
|
36
|
+
i.read "Birthdays.ods"
|
37
|
+
i.virtual_columns
|
38
|
+
i.process
|
39
|
+
|
data/lib/dreader/version.rb
CHANGED
data/lib/dreader.rb
CHANGED
@@ -103,11 +103,12 @@ module Dreader
|
|
103
103
|
# the specification of the columns to process
|
104
104
|
attr_reader :colspec
|
105
105
|
# the data we read
|
106
|
-
attr_reader :
|
106
|
+
attr_reader :table
|
107
107
|
|
108
108
|
def initialize
|
109
109
|
@options = {}
|
110
110
|
@colspec = []
|
111
|
+
@virtualcols = []
|
111
112
|
end
|
112
113
|
|
113
114
|
# define a DSL for options
|
@@ -132,6 +133,19 @@ module Dreader
|
|
132
133
|
@colspec << column.to_hash.merge({name: name})
|
133
134
|
end
|
134
135
|
|
136
|
+
# virtual columns define derived attributes
|
137
|
+
# the code specified in the virtual column is executed after reading
|
138
|
+
# a row and before applying the mapping function
|
139
|
+
#
|
140
|
+
# virtual colum declarations are executed in the order in which
|
141
|
+
# they are defined
|
142
|
+
def virtual_column name, &block
|
143
|
+
column = Column.new
|
144
|
+
column.instance_eval &block
|
145
|
+
|
146
|
+
@virtualcols << column.to_hash.merge({name: name})
|
147
|
+
end
|
148
|
+
|
135
149
|
# define what we do with each line we read
|
136
150
|
# - `block` is the code which takes as input a `row` and processes
|
137
151
|
# `row` is a hash in which each spreadsheet cell is accessible under
|
@@ -218,6 +232,19 @@ module Dreader
|
|
218
232
|
@errors
|
219
233
|
end
|
220
234
|
|
235
|
+
def virtual_columns
|
236
|
+
# execute the virtual column specification
|
237
|
+
@virtualcols.each do |virtualcol|
|
238
|
+
@table.each do |r|
|
239
|
+
# add the cell to the table
|
240
|
+
r[virtualcol[:name]] = {
|
241
|
+
value: virtualcol[:process].call(r),
|
242
|
+
virtual: true,
|
243
|
+
}
|
244
|
+
end
|
245
|
+
end
|
246
|
+
end
|
247
|
+
|
221
248
|
# apply the mapping code to the array
|
222
249
|
# it makes sense to invoke it only
|
223
250
|
def process
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: dreader
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.3.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Adolfo Villafiorita
|
8
8
|
autorequire:
|
9
9
|
bindir: exe
|
10
10
|
cert_chain: []
|
11
|
-
date: 2018-03-
|
11
|
+
date: 2018-03-26 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: bundler
|
@@ -77,6 +77,8 @@ files:
|
|
77
77
|
- bin/console
|
78
78
|
- bin/setup
|
79
79
|
- dreader.gemspec
|
80
|
+
- examples/age/Birthdays.ods
|
81
|
+
- examples/age/age.rb
|
80
82
|
- examples/wikipedia_big_us_cities/big_us_cities.rb
|
81
83
|
- examples/wikipedia_big_us_cities/cities_by_state.ods
|
82
84
|
- examples/wikipedia_us_cities/us_cities.rb
|