dreader 0.2.1 → 0.3.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: eff0a237ea9c0a4162696790f1689a868fc8a3efb3075eb0c890ddcc4651b5d0
4
- data.tar.gz: f3ef74d2063aa07e5f42e3569563b37837ee3fa8d83e243a546d49d6699ef0ae
3
+ metadata.gz: b4e929eff1813efc3d2021430773275b213146e00adb01c5ad620bb3a6dfb98b
4
+ data.tar.gz: 3a9d4ecd7dce0713b29ef54550eb5a026176184798f7ae0bfbe1e634812e912a
5
5
  SHA512:
6
- metadata.gz: 491a031d343211d988d7687601cacaeecac49770510940c56c46a770eddc26d703d8c1eb6e370deb79e0b536ecf2ebc8fbd5eac751f0354067c2242cb90f6268
7
- data.tar.gz: 2ef1c5b00dadeba9f522f874022ec1e88e2409368144e056268155183f2cd5245f64b9e1445e44210cebbba01a24e409cc988651eddc8284080f85d1f216c394
6
+ metadata.gz: d31972c25fbc073211e9f398133b11d76dd71667cc4f5042a738d37304c9d0be8a88fd61fb446685c5867c9111ad81798d23b31b2017a396082352855a7af16f
7
+ data.tar.gz: c57d6b9e059c31262f728748c011a2631a91e64365ce72eea0554492bf7512db430dd9600dea80e64dbda50ec2256848cff2733e432b484e272458f6641bad2f
data/Gemfile.lock CHANGED
@@ -1,7 +1,7 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- dreader (0.2.0)
4
+ dreader (0.2.1)
5
5
  roo
6
6
 
7
7
  GEM
data/README.md CHANGED
@@ -46,6 +46,8 @@ Or install it yourself as:
46
46
 
47
47
  ## Usage
48
48
 
49
+ ### Declare the file you want to read
50
+
49
51
  Require `dreader` and declare an instance of the `Dreader::Engine` class:
50
52
 
51
53
  ```ruby
@@ -79,20 +81,28 @@ where:
79
81
  * (optional) `sheet` is the sheet name or number to read from. If not
80
82
  specified, the first (default) sheet is used
81
83
 
82
- Specify the structure of your file, for the columns you are interested
83
- to process. You have to specify a name (used to access data) and a
84
- column reference (used to read data from the file).
84
+ ### Declare the columns you want to read
85
+
86
+ Declare the columns you want to read by assigning them a name and a
87
+ column reference:
88
+
89
+ ```ruby
90
+ # we will access column A in Ruby code using :name
91
+ i.column :name
92
+ colref 'A'
93
+ end
94
+ ```
85
95
 
86
96
  You can also specify two ruby blocks, `process` and `check` to
87
97
  preprocess data and to check for errors.
88
98
 
89
99
  For instance, given the following file:
90
100
 
91
- | Name | Surname | Age |
92
- |------|---------|-----|
93
- | John | Doe | 30 |
94
- | Jane | Doe | 31 |
95
- | ... | ... | ... |
101
+ | Name | Date of birth |
102
+ |------------------|-----------------|
103
+ | Forest Whitaker | July 15, 1961 |
104
+ | Daniel Day-Lewis | April 29, 1957 |
105
+ | Sean Penn | August 17, 1960 |
96
106
 
97
107
  we could use the following declaration to specify the data to read:
98
108
 
@@ -106,19 +116,19 @@ i.column :name do
106
116
  end
107
117
  end
108
118
 
109
- # we want to access column 3 (Age) using :age
110
- # :age should be non nil and of length greater than 0
111
- i.column :age do
112
- colref 3
119
+ # we want to access column 2 (Date of birth) using :birthdate
120
+ i.column :birthdate do
121
+ colref 2
113
122
 
114
123
  # make sure the column is transformed into an integer
115
124
  process do |x|
116
- x.to_i
125
+ Date.parse(x)
117
126
  end
118
127
 
119
- # check age is greater than zero
128
+ # check age is a date (check is invoked on the value returned
129
+ # by process)
120
130
  check do |x|
121
- x > 0
131
+ x.class == Date
122
132
  end
123
133
  end
124
134
 
@@ -126,13 +136,51 @@ end
126
136
  # we are done with our declarations)
127
137
  ```
128
138
 
129
- Notice that `colref` can be a string (e.g., `'A'`) or an integer
130
- (first column is one).
139
+ **Remarks:**
140
+
141
+ 1. `colref` can be a string (e.g., `'A'`) or an integer, in which case
142
+ the first column is one
143
+
144
+ 2. you need to declare only the columns you want to import. For
145
+ instance, we could skip the declaration for column 1, if 'Date of
146
+ Birth' is the only data we want to import
147
+
148
+ 3. If `process` and `check` are specified, then `check` will receive
149
+ the result of invoking `process` on the cell value. This makes
150
+ sense if process is used to make the cell value more accessible to
151
+ ruby code (e.g., transforming a string into an integer).
152
+
153
+
154
+ ### Add virtual columns, if you want
155
+
156
+ Sometimes it is convenient to aggregate or otherwise manipulate the
157
+ data read from each row before doing the actual processing.
158
+
159
+ For instance, we might have a table with dates of birth, while we are
160
+ really interested in the age of people.
161
+
162
+ In such cases, we can use virtual column. A **virtual column** allows
163
+ one to add a column to the data read. The value of the column for
164
+ each row is computed using the values of other cells.
165
+
166
+ Virtual columns are declared similar to columns. Thus, for instance,
167
+ the following declaration adds an `age` column to each row of the data
168
+ we read from the previous example:
169
+
170
+ ```ruby
171
+ i.virtual_column :age do
172
+ process do |row|
173
+ # `compute_birthday` has to be defined
174
+ compute_birthday(row[:birthdate])
175
+ end
176
+ end
177
+ ```
178
+
179
+ Virtual columns are, of course, available to the `mapping` directive
180
+ (see below).
131
181
 
132
- Notice also that if `process` and `check` are specified, then `check`
133
- will receive the result of invoking `process` on the cell value. This
134
- makes sense if process is used to make the cell value more accessible
135
- to ruby code (e.g., transforming a string into an integer).
182
+
183
+ ### Specify how to process data
136
184
 
137
185
  Finally we can specify how we process lines, using the `mapping`
138
186
  directive. Mapping takes an arbitrary piece of ruby code, which can
@@ -146,22 +194,26 @@ i.mapping do |row|
146
194
  end
147
195
  ```
148
196
 
149
- Notice that the data read from a line is stored in a hash which uses
150
- the column names and stores names in the `:value` key.
197
+ Notice that the data read from each row of our input data is stored in
198
+ a hash. The hash uses column names as the primary key and stores
199
+ the values in the `:value` key.
200
+
201
+ ### Start working with the data
151
202
 
152
- Now we are all set and we can start working with the data.
203
+ We are now all set and we can start working with the data.
153
204
 
154
205
  First use `read` or `load` (synonyms), to read all data and put it
155
- into a `@table` instance variable. This function uses the `column`
156
- declarations to read data and executes the `process` and `check`
157
- functions for each cell read.
206
+ into a `@table` instance variable.
158
207
 
159
208
  ```ruby
160
209
  i.read
161
210
  ```
162
211
 
163
- We can now use `errors` to see whether any of the `check` functions
164
- failed:
212
+ Read applies all the `column` and `virtual_column` declarations and
213
+ buils a hash with the data read.
214
+
215
+ After reading the file we can use `errors` to see whether any of the
216
+ `check` functions failed:
165
217
 
166
218
  ```ruby
167
219
  array_of_strings = i.errors
@@ -170,29 +222,31 @@ array_of_strings ech do |error_line|
170
222
  end
171
223
  ```
172
224
 
173
- Now we can process the file with the `process` function, which
174
- executes the `mapping` directive for each line read from the file.
225
+ Finally we can use the `process` function to execute the `mapping`
226
+ directive to each line read from the file.
175
227
 
176
228
  ```ruby
177
229
  i.process
178
230
  ```
179
231
 
180
- If you need to perform more complex elaborations on the data, you can
181
- also directly access all data read, using the `table` method, which
182
- returns an array of hashes (see next section for the details).
232
+ Look in the examples directory for further details and a couple of
233
+ working examples.
183
234
 
184
- ```ruby
185
- i.table
186
- ```
187
235
 
188
- For further details and a couple of working examples, look in the
189
- examples directory.
236
+ ## Digging deeper
190
237
 
238
+ If you need to perform more elaborations on the data which cannot be
239
+ captured with `process` (that is, by processing the data row by row),
240
+ you can also directly access all data read, using the `table` method:
191
241
 
192
- ## Digging deeper
242
+ ```ruby
243
+ i.read
244
+ i.table
245
+ # an array of hashes (one hash per row)
246
+ ```
193
247
 
194
- The `read` method fills a `@table` instance variable with an array of
195
- hashes. Each hash represents a line of the file.
248
+ More in details, the `read` method fills a `@table` instance variable
249
+ with an array of hashes. Each hash represents a line of the file.
196
250
 
197
251
  Each hash contains one key per column, following your specification.
198
252
  Its value is, in turn, a hash with the following structure:
@@ -206,6 +260,9 @@ Its value is, in turn, a hash with the following structure:
206
260
  }
207
261
  ```
208
262
 
263
+ (Note that virtual columns only store `value` and a Boolean `virtual`,
264
+ which is always `true`.)
265
+
209
266
  Thus, for instance, given the example above:
210
267
 
211
268
  ```ruby
@@ -274,6 +331,7 @@ i.debug 40, filename # like above, but read from filename
274
331
  Another possibility is getting the value of the `@table` variable,
275
332
  which contains all the data read.
276
333
 
334
+
277
335
  ## Known Limitations
278
336
 
279
337
  At the moment:
Binary file
@@ -0,0 +1,39 @@
1
+ require 'dreader'
2
+
3
+ i = Dreader::Engine.new
4
+
5
+ i.options do
6
+ first_row 2
7
+ end
8
+
9
+ i.column :name do
10
+ colref 'A'
11
+ end
12
+
13
+ i.column :birthdate do
14
+ colref 'B'
15
+
16
+ process do |c|
17
+ Date.parse(c)
18
+ end
19
+ end
20
+
21
+ i.virtual_column :age do
22
+ process do |row|
23
+ birthdate = row[:birthdate][:value]
24
+ birthday = Date.new(Date.today.year, birthdate.month, birthdate.day)
25
+ today = Date.today
26
+
27
+ [0, today.year - birthdate.year - (birthday < today ? 1 : 0)].max
28
+ end
29
+ end
30
+
31
+ i.mapping do |row|
32
+ r = Dreader::Util.simplify(row)
33
+ puts "#{r[:name]} is #{r[:age]} years old (born on #{r[:birthdate]})"
34
+ end
35
+
36
+ i.read "Birthdays.ods"
37
+ i.virtual_columns
38
+ i.process
39
+
@@ -1,3 +1,3 @@
1
1
  module Dreader
2
- VERSION = "0.2.1"
2
+ VERSION = "0.3.0"
3
3
  end
data/lib/dreader.rb CHANGED
@@ -103,11 +103,12 @@ module Dreader
103
103
  # the specification of the columns to process
104
104
  attr_reader :colspec
105
105
  # the data we read
106
- attr_reader :array
106
+ attr_reader :table
107
107
 
108
108
  def initialize
109
109
  @options = {}
110
110
  @colspec = []
111
+ @virtualcols = []
111
112
  end
112
113
 
113
114
  # define a DSL for options
@@ -132,6 +133,19 @@ module Dreader
132
133
  @colspec << column.to_hash.merge({name: name})
133
134
  end
134
135
 
136
+ # virtual columns define derived attributes
137
+ # the code specified in the virtual column is executed after reading
138
+ # a row and before applying the mapping function
139
+ #
140
+ # virtual colum declarations are executed in the order in which
141
+ # they are defined
142
+ def virtual_column name, &block
143
+ column = Column.new
144
+ column.instance_eval &block
145
+
146
+ @virtualcols << column.to_hash.merge({name: name})
147
+ end
148
+
135
149
  # define what we do with each line we read
136
150
  # - `block` is the code which takes as input a `row` and processes
137
151
  # `row` is a hash in which each spreadsheet cell is accessible under
@@ -218,6 +232,19 @@ module Dreader
218
232
  @errors
219
233
  end
220
234
 
235
+ def virtual_columns
236
+ # execute the virtual column specification
237
+ @virtualcols.each do |virtualcol|
238
+ @table.each do |r|
239
+ # add the cell to the table
240
+ r[virtualcol[:name]] = {
241
+ value: virtualcol[:process].call(r),
242
+ virtual: true,
243
+ }
244
+ end
245
+ end
246
+ end
247
+
221
248
  # apply the mapping code to the array
222
249
  # it makes sense to invoke it only
223
250
  def process
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: dreader
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.2.1
4
+ version: 0.3.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Adolfo Villafiorita
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2018-03-23 00:00:00.000000000 Z
11
+ date: 2018-03-26 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: bundler
@@ -77,6 +77,8 @@ files:
77
77
  - bin/console
78
78
  - bin/setup
79
79
  - dreader.gemspec
80
+ - examples/age/Birthdays.ods
81
+ - examples/age/age.rb
80
82
  - examples/wikipedia_big_us_cities/big_us_cities.rb
81
83
  - examples/wikipedia_big_us_cities/cities_by_state.ods
82
84
  - examples/wikipedia_us_cities/us_cities.rb