dreader 0.2.1 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: eff0a237ea9c0a4162696790f1689a868fc8a3efb3075eb0c890ddcc4651b5d0
4
- data.tar.gz: f3ef74d2063aa07e5f42e3569563b37837ee3fa8d83e243a546d49d6699ef0ae
3
+ metadata.gz: b4e929eff1813efc3d2021430773275b213146e00adb01c5ad620bb3a6dfb98b
4
+ data.tar.gz: 3a9d4ecd7dce0713b29ef54550eb5a026176184798f7ae0bfbe1e634812e912a
5
5
  SHA512:
6
- metadata.gz: 491a031d343211d988d7687601cacaeecac49770510940c56c46a770eddc26d703d8c1eb6e370deb79e0b536ecf2ebc8fbd5eac751f0354067c2242cb90f6268
7
- data.tar.gz: 2ef1c5b00dadeba9f522f874022ec1e88e2409368144e056268155183f2cd5245f64b9e1445e44210cebbba01a24e409cc988651eddc8284080f85d1f216c394
6
+ metadata.gz: d31972c25fbc073211e9f398133b11d76dd71667cc4f5042a738d37304c9d0be8a88fd61fb446685c5867c9111ad81798d23b31b2017a396082352855a7af16f
7
+ data.tar.gz: c57d6b9e059c31262f728748c011a2631a91e64365ce72eea0554492bf7512db430dd9600dea80e64dbda50ec2256848cff2733e432b484e272458f6641bad2f
data/Gemfile.lock CHANGED
@@ -1,7 +1,7 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- dreader (0.2.0)
4
+ dreader (0.2.1)
5
5
  roo
6
6
 
7
7
  GEM
data/README.md CHANGED
@@ -46,6 +46,8 @@ Or install it yourself as:
46
46
 
47
47
  ## Usage
48
48
 
49
+ ### Declare the file you want to read
50
+
49
51
  Require `dreader` and declare an instance of the `Dreader::Engine` class:
50
52
 
51
53
  ```ruby
@@ -79,20 +81,28 @@ where:
79
81
  * (optional) `sheet` is the sheet name or number to read from. If not
80
82
  specified, the first (default) sheet is used
81
83
 
82
- Specify the structure of your file, for the columns you are interested
83
- to process. You have to specify a name (used to access data) and a
84
- column reference (used to read data from the file).
84
+ ### Declare the columns you want to read
85
+
86
+ Declare the columns you want to read by assigning them a name and a
87
+ column reference:
88
+
89
+ ```ruby
90
+ # we will access column A in Ruby code using :name
91
+ i.column :name
92
+ colref 'A'
93
+ end
94
+ ```
85
95
 
86
96
  You can also specify two ruby blocks, `process` and `check` to
87
97
  preprocess data and to check for errors.
88
98
 
89
99
  For instance, given the following file:
90
100
 
91
- | Name | Surname | Age |
92
- |------|---------|-----|
93
- | John | Doe | 30 |
94
- | Jane | Doe | 31 |
95
- | ... | ... | ... |
101
+ | Name | Date of birth |
102
+ |------------------|-----------------|
103
+ | Forest Whitaker | July 15, 1961 |
104
+ | Daniel Day-Lewis | April 29, 1957 |
105
+ | Sean Penn | August 17, 1960 |
96
106
 
97
107
  we could use the following declaration to specify the data to read:
98
108
 
@@ -106,19 +116,19 @@ i.column :name do
106
116
  end
107
117
  end
108
118
 
109
- # we want to access column 3 (Age) using :age
110
- # :age should be non nil and of length greater than 0
111
- i.column :age do
112
- colref 3
119
+ # we want to access column 2 (Date of birth) using :birthdate
120
+ i.column :birthdate do
121
+ colref 2
113
122
 
114
123
  # make sure the column is transformed into an integer
115
124
  process do |x|
116
- x.to_i
125
+ Date.parse(x)
117
126
  end
118
127
 
119
- # check age is greater than zero
128
+ # check age is a date (check is invoked on the value returned
129
+ # by process)
120
130
  check do |x|
121
- x > 0
131
+ x.class == Date
122
132
  end
123
133
  end
124
134
 
@@ -126,13 +136,51 @@ end
126
136
  # we are done with our declarations)
127
137
  ```
128
138
 
129
- Notice that `colref` can be a string (e.g., `'A'`) or an integer
130
- (first column is one).
139
+ **Remarks:**
140
+
141
+ 1. `colref` can be a string (e.g., `'A'`) or an integer, in which case
142
+ the first column is one
143
+
144
+ 2. you need to declare only the columns you want to import. For
145
+ instance, we could skip the declaration for column 1, if 'Date of
146
+ Birth' is the only data we want to import
147
+
148
+ 3. If `process` and `check` are specified, then `check` will receive
149
+ the result of invoking `process` on the cell value. This makes
150
+ sense if process is used to make the cell value more accessible to
151
+ ruby code (e.g., transforming a string into an integer).
152
+
153
+
154
+ ### Add virtual columns, if you want
155
+
156
+ Sometimes it is convenient to aggregate or otherwise manipulate the
157
+ data read from each row before doing the actual processing.
158
+
159
+ For instance, we might have a table with dates of birth, while we are
160
+ really interested in the age of people.
161
+
162
+ In such cases, we can use virtual column. A **virtual column** allows
163
+ one to add a column to the data read. The value of the column for
164
+ each row is computed using the values of other cells.
165
+
166
+ Virtual columns are declared similar to columns. Thus, for instance,
167
+ the following declaration adds an `age` column to each row of the data
168
+ we read from the previous example:
169
+
170
+ ```ruby
171
+ i.virtual_column :age do
172
+ process do |row|
173
+ # `compute_birthday` has to be defined
174
+ compute_birthday(row[:birthdate])
175
+ end
176
+ end
177
+ ```
178
+
179
+ Virtual columns are, of course, available to the `mapping` directive
180
+ (see below).
131
181
 
132
- Notice also that if `process` and `check` are specified, then `check`
133
- will receive the result of invoking `process` on the cell value. This
134
- makes sense if process is used to make the cell value more accessible
135
- to ruby code (e.g., transforming a string into an integer).
182
+
183
+ ### Specify how to process data
136
184
 
137
185
  Finally we can specify how we process lines, using the `mapping`
138
186
  directive. Mapping takes an arbitrary piece of ruby code, which can
@@ -146,22 +194,26 @@ i.mapping do |row|
146
194
  end
147
195
  ```
148
196
 
149
- Notice that the data read from a line is stored in a hash which uses
150
- the column names and stores names in the `:value` key.
197
+ Notice that the data read from each row of our input data is stored in
198
+ a hash. The hash uses column names as the primary key and stores
199
+ the values in the `:value` key.
200
+
201
+ ### Start working with the data
151
202
 
152
- Now we are all set and we can start working with the data.
203
+ We are now all set and we can start working with the data.
153
204
 
154
205
  First use `read` or `load` (synonyms), to read all data and put it
155
- into a `@table` instance variable. This function uses the `column`
156
- declarations to read data and executes the `process` and `check`
157
- functions for each cell read.
206
+ into a `@table` instance variable.
158
207
 
159
208
  ```ruby
160
209
  i.read
161
210
  ```
162
211
 
163
- We can now use `errors` to see whether any of the `check` functions
164
- failed:
212
+ Read applies all the `column` and `virtual_column` declarations and
213
+ buils a hash with the data read.
214
+
215
+ After reading the file we can use `errors` to see whether any of the
216
+ `check` functions failed:
165
217
 
166
218
  ```ruby
167
219
  array_of_strings = i.errors
@@ -170,29 +222,31 @@ array_of_strings ech do |error_line|
170
222
  end
171
223
  ```
172
224
 
173
- Now we can process the file with the `process` function, which
174
- executes the `mapping` directive for each line read from the file.
225
+ Finally we can use the `process` function to execute the `mapping`
226
+ directive to each line read from the file.
175
227
 
176
228
  ```ruby
177
229
  i.process
178
230
  ```
179
231
 
180
- If you need to perform more complex elaborations on the data, you can
181
- also directly access all data read, using the `table` method, which
182
- returns an array of hashes (see next section for the details).
232
+ Look in the examples directory for further details and a couple of
233
+ working examples.
183
234
 
184
- ```ruby
185
- i.table
186
- ```
187
235
 
188
- For further details and a couple of working examples, look in the
189
- examples directory.
236
+ ## Digging deeper
190
237
 
238
+ If you need to perform more elaborations on the data which cannot be
239
+ captured with `process` (that is, by processing the data row by row),
240
+ you can also directly access all data read, using the `table` method:
191
241
 
192
- ## Digging deeper
242
+ ```ruby
243
+ i.read
244
+ i.table
245
+ # an array of hashes (one hash per row)
246
+ ```
193
247
 
194
- The `read` method fills a `@table` instance variable with an array of
195
- hashes. Each hash represents a line of the file.
248
+ More in details, the `read` method fills a `@table` instance variable
249
+ with an array of hashes. Each hash represents a line of the file.
196
250
 
197
251
  Each hash contains one key per column, following your specification.
198
252
  Its value is, in turn, a hash with the following structure:
@@ -206,6 +260,9 @@ Its value is, in turn, a hash with the following structure:
206
260
  }
207
261
  ```
208
262
 
263
+ (Note that virtual columns only store `value` and a Boolean `virtual`,
264
+ which is always `true`.)
265
+
209
266
  Thus, for instance, given the example above:
210
267
 
211
268
  ```ruby
@@ -274,6 +331,7 @@ i.debug 40, filename # like above, but read from filename
274
331
  Another possibility is getting the value of the `@table` variable,
275
332
  which contains all the data read.
276
333
 
334
+
277
335
  ## Known Limitations
278
336
 
279
337
  At the moment:
Binary file
@@ -0,0 +1,39 @@
1
+ require 'dreader'
2
+
3
+ i = Dreader::Engine.new
4
+
5
+ i.options do
6
+ first_row 2
7
+ end
8
+
9
+ i.column :name do
10
+ colref 'A'
11
+ end
12
+
13
+ i.column :birthdate do
14
+ colref 'B'
15
+
16
+ process do |c|
17
+ Date.parse(c)
18
+ end
19
+ end
20
+
21
+ i.virtual_column :age do
22
+ process do |row|
23
+ birthdate = row[:birthdate][:value]
24
+ birthday = Date.new(Date.today.year, birthdate.month, birthdate.day)
25
+ today = Date.today
26
+
27
+ [0, today.year - birthdate.year - (birthday < today ? 1 : 0)].max
28
+ end
29
+ end
30
+
31
+ i.mapping do |row|
32
+ r = Dreader::Util.simplify(row)
33
+ puts "#{r[:name]} is #{r[:age]} years old (born on #{r[:birthdate]})"
34
+ end
35
+
36
+ i.read "Birthdays.ods"
37
+ i.virtual_columns
38
+ i.process
39
+
@@ -1,3 +1,3 @@
1
1
  module Dreader
2
- VERSION = "0.2.1"
2
+ VERSION = "0.3.0"
3
3
  end
data/lib/dreader.rb CHANGED
@@ -103,11 +103,12 @@ module Dreader
103
103
  # the specification of the columns to process
104
104
  attr_reader :colspec
105
105
  # the data we read
106
- attr_reader :array
106
+ attr_reader :table
107
107
 
108
108
  def initialize
109
109
  @options = {}
110
110
  @colspec = []
111
+ @virtualcols = []
111
112
  end
112
113
 
113
114
  # define a DSL for options
@@ -132,6 +133,19 @@ module Dreader
132
133
  @colspec << column.to_hash.merge({name: name})
133
134
  end
134
135
 
136
+ # virtual columns define derived attributes
137
+ # the code specified in the virtual column is executed after reading
138
+ # a row and before applying the mapping function
139
+ #
140
+ # virtual colum declarations are executed in the order in which
141
+ # they are defined
142
+ def virtual_column name, &block
143
+ column = Column.new
144
+ column.instance_eval &block
145
+
146
+ @virtualcols << column.to_hash.merge({name: name})
147
+ end
148
+
135
149
  # define what we do with each line we read
136
150
  # - `block` is the code which takes as input a `row` and processes
137
151
  # `row` is a hash in which each spreadsheet cell is accessible under
@@ -218,6 +232,19 @@ module Dreader
218
232
  @errors
219
233
  end
220
234
 
235
+ def virtual_columns
236
+ # execute the virtual column specification
237
+ @virtualcols.each do |virtualcol|
238
+ @table.each do |r|
239
+ # add the cell to the table
240
+ r[virtualcol[:name]] = {
241
+ value: virtualcol[:process].call(r),
242
+ virtual: true,
243
+ }
244
+ end
245
+ end
246
+ end
247
+
221
248
  # apply the mapping code to the array
222
249
  # it makes sense to invoke it only
223
250
  def process
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: dreader
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.2.1
4
+ version: 0.3.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Adolfo Villafiorita
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2018-03-23 00:00:00.000000000 Z
11
+ date: 2018-03-26 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: bundler
@@ -77,6 +77,8 @@ files:
77
77
  - bin/console
78
78
  - bin/setup
79
79
  - dreader.gemspec
80
+ - examples/age/Birthdays.ods
81
+ - examples/age/age.rb
80
82
  - examples/wikipedia_big_us_cities/big_us_cities.rb
81
83
  - examples/wikipedia_big_us_cities/cities_by_state.ods
82
84
  - examples/wikipedia_us_cities/us_cities.rb