csv_party 0.0.1.pre9 → 1.0.0.rc4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: b70149f2976c65072cd5ba2e20b1187f250ed7cf
4
- data.tar.gz: 64c20377be56bc6b01249b092d01e41f8caf756e
3
+ metadata.gz: 67d5895445a9fe397df95275260491ed6d9f6ce5
4
+ data.tar.gz: 542a1442466867afa33cf0ef883a32443777cfd5
5
5
  SHA512:
6
- metadata.gz: fa5d31d3102e420b6f5f4d63bea3b13c8a9752a77047d28e32804abdc3ba07e2e21a68e9cf5ec909dd474537345bd4785750cdfaf0fb7112138fbae9d5082ec8
7
- data.tar.gz: fd46166bafee072f6308212cd5c3d028d560078e7800f61b003d7c0a050afc6919ce14cf680465e4beb70860baa3630c44377c283e723c5440033fc1e8e28f48
6
+ metadata.gz: 1035dac76f5ec71d97015a5d9c2874074c28ad4b0ff7994ab565735fd6d49e539e4292a42ee15f44d8d2de2fb356e81a06f61685019fa3ad250d12ab9c160ba8
7
+ data.tar.gz: 93c748026adc2fa907f3e08825c0fef8b41fed5b2945e3066aff70978a86ed4c9d47df3a98abc30e25b2fb44a597cddb91efbb519699d2a9e3b4c580b4ef1100
data/LICENSE.md ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2018 Richard A. Jones
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,218 @@
1
+ [![Gem Version](https://badge.fury.io/rb/csv_party.svg)](https://badge.fury.io/rb/csv_party)
2
+ [![Build Status](https://travis-ci.org/toasterlovin/csv_party.svg?branch=master)](https://travis-ci.org/toasterlovin/csv_party)
3
+ [![Code Climate Maintainability](https://api.codeclimate.com/v1/badges/946d0dec172fda05d631/maintainability)](https://codeclimate.com/github/toasterlovin/csv_party/maintainability)
4
+ [![Code Climate Test Coverage](https://api.codeclimate.com/v1/badges/946d0dec172fda05d631/test_coverage)](https://codeclimate.com/github/toasterlovin/csv_party/test_coverage)
5
+
6
+ # Make importing CSV files a party
7
+
8
+ The point of this gem is to make it easier to focus on the business
9
+ logic of your CSV imports. You start by defining which columns you
10
+ will be importing, as well as how they will be parsed. Then, you
11
+ specify what you want to do with each row after it has been parsed.
12
+ That's it; CSVParty takes care of all the tedious stuff for you.
13
+
14
+ ## Defining Columns
15
+
16
+ This is what defining your import columns look like:
17
+
18
+ class MyImporter < CSVParty
19
+ column :price, header: 'Nonsensical Column Name', as: :decimal
20
+ end
21
+
22
+ This will take the value in the 'Nonsensical Column Name' column,
23
+ parse it as a decimal, then make it available to your import logic
24
+ as a nice, sane variable named `price`.
25
+
26
+ The available built-in parsers are:
27
+
28
+ - `:raw` returns the value from the CSV file, unchanged
29
+ - `:string` strips whitespace and returns the resulting string
30
+ - `:integer` strips whitespace, then calls `to_i` on the resulting string
31
+ - `:decimal` strips all characters except `0-9` and `.`, then passes the
32
+ resulting string to `BigDecimal.new`
33
+ - `:boolean` strips whitespace, downcases, then returns `true` if the
34
+ resulting string is `'1'`, `'t'`, or `'true'`, otherwise it returns `false`
35
+
36
+ When defining a column, you can also pass a block if you need custom
37
+ parsing logic:
38
+
39
+ class MyImporter < CSVParty
40
+ column :product, header: 'Product' do |value|
41
+ Product.find_by(name: value)
42
+ end
43
+ end
44
+
45
+ Or, if you want to re-use a custom parser for multiple columns, just
46
+ define a method on your class with a name that ends in `_parser` and
47
+ you can use it the same way you use the built-in parsers:
48
+
49
+ class MyImporter < CSVParty
50
+ def dollars_to_cents_parser(value)
51
+ (BigDecimal.new(value) * 100).to_i
52
+ end
53
+
54
+ column :price_in_cents, header: 'Price in $', as: :dollars_to_cents
55
+ column :cost_in_cents, header: 'Cost in $', as: :dollars_to_cents
56
+ end
57
+
58
+ #### NOTE: Parsing nil and blank values
59
+
60
+ By default, CSVParty will intercept any values that are `nil` or which contain
61
+ only whitespace and coerce them to `nil` _without invoking the parser for that
62
+ column_. This applies to all parsers, including custom parsers which you
63
+ define, with one exception: the :raw parser. This is done as a convenience to
64
+ avoid pesky `NoMethodError`s that arise when a parser tries to do its thing
65
+ to a `nil` value that it wasn't expecting. You can turn this behavior off on a
66
+ given column by setting `intercept_blanks` to `false` in the options hash:
67
+
68
+ class MyImporter < CSVParty
69
+ column :price, header: 'Price', intercept_blanks: false do |value|
70
+ if value.nil?
71
+ 'n/a'
72
+ else
73
+ BigDecimal.new(value)
74
+ end
75
+ end
76
+ end
77
+
78
+ #### NOTE: Parsers cannot reference each other
79
+
80
+ When using a custom parser to parse a column, the block or method that you
81
+ define has no way to reference the values from any other columns. So, this won't
82
+ work:
83
+
84
+ class MyImporter < CSVParty
85
+ column :product, header: 'Product', do |value|
86
+ Product.find_by(name: value)
87
+ end
88
+
89
+ column :price, header: 'Price', do |value|
90
+ # product is not defined...
91
+ product.price = BigDecimal.new(value)
92
+ end
93
+ end
94
+
95
+ Instead, you would do this in your row import logic. Which brings us to:
96
+
97
+ ## Importing Rows
98
+
99
+ Once you've defined all of your columns, you specify your logic for importing
100
+ rows by passing a block to the `rows` DSL method. That block will have access
101
+ to a `row` variable which contains all of the parsed values for your columns.
102
+ Here's what that looks like:
103
+
104
+ class MyImporter < CSVParty
105
+ rows do |row|
106
+ product = row.product
107
+ product.price = row.price
108
+ product.save
109
+ end
110
+ end
111
+
112
+ The `row` variable also provides access to two other things:
113
+
114
+ - The unparsed values for your columns
115
+ - The raw CSV string for that row
116
+
117
+ Here's how you access those:
118
+
119
+ class MyImporter < CSVParty
120
+ rows do |row|
121
+ row.price # parsed value: #<BigDecimal:7f88d92cb820,'0.9E1',9(18)>
122
+ row.unparsed.price # unparsed value: '$9.00'
123
+ row.string # raw CSV string: 'USB Cable,$9.00,Box,Blue'
124
+ end
125
+ end
126
+
127
+ ## Importing
128
+
129
+ Once your importer class is defined, you use it like this:
130
+
131
+ importer = MyImporter.new('path/to/file.csv')
132
+ importer.import!
133
+
134
+ You can also specify what should happen before and after your import by passing
135
+ a block to `import`, like so:
136
+
137
+ class MyImporter < CSVParty
138
+ # column definitions
139
+ # row import logic
140
+
141
+ import do
142
+ puts 'Starting import'
143
+ import_rows!
144
+ puts 'Import finished!'
145
+ end
146
+ end
147
+
148
+ You can do whatever you want inside of the `import` block, just make sure to
149
+ call `import_rows!` somewhere in there.
150
+
151
+ ## Handling Errors
152
+
153
+ One of the hallmarks of importing data from CSV files is that there are
154
+ inevitably rows with errors of some kind. You can handle error rows by
155
+ specifying an `errors` block:
156
+
157
+ class MyImporter < CSVParty
158
+ # column definitions
159
+ # row import logic
160
+
161
+ errors do |error, line_number|
162
+ # log error
163
+ end
164
+ end
165
+
166
+ Any row in your CSV file which results in an exception will be passed to this
167
+ block. Which means you can specify that there is an error with a given row by
168
+ raising an exception:
169
+
170
+ rows do |row|
171
+ # rows with price less than 0 will be treated as errors
172
+ raise if row.price < 0
173
+ end
174
+
175
+ ## External Dependencies
176
+
177
+ Sometimes you need access to external objects in your importer's logic. You can specify
178
+ what external objects your importer depends on with `depends_on`. Dependencies declared
179
+ this way will then be available in your parsers and your `rows`, `import`, and `errors`
180
+ blocks:
181
+
182
+ class MyImporter < CSVParty
183
+ # column definitions...
184
+
185
+ depends_on: :product_import
186
+
187
+ rows do |row|
188
+ # do some stuff
189
+
190
+ # product_import is not provided by the class,
191
+ # but is passed in at runtime instead!
192
+ product_import.log_success(product)
193
+ end
194
+ end
195
+
196
+ Then, to pass the dependency in at runtime, you just add an option to `.new` with
197
+ the name and value of the dependency:
198
+
199
+ MyImporter.new(
200
+ 'path/to/csv',
201
+ product_import: @product_import
202
+ )
203
+
204
+ # Tested Rubies
205
+
206
+ CSVParty has been tested against the following Rubies:
207
+
208
+ MRI
209
+ - 2.5
210
+ - 2.4
211
+ - 2.3
212
+ - 2.2
213
+ - 2.1
214
+ - 2.0
215
+
216
+ # License
217
+
218
+ This project uses the MIT License. See LICENSE.md for details.
data/ROADMAP.md ADDED
@@ -0,0 +1,271 @@
1
+ Roadmap
2
+ -
3
+
4
+ - [1.1 Early Return While Parsing](#11-early-return-while-parsing)
5
+ - [1.2 Rows to Hash](#12-rows-to-hash)
6
+ - [1.3 Generate Unimported Rows CSV](#13-generate-unimported-rows-csv)
7
+ - [1.4 Batch API](#14-batch-api)
8
+ - [1.5 Runtime Configuration](#15-runtime-configuration)
9
+ - [1.6 CSV Parse Error Handling](#16-csv-parse-error-handling)
10
+ - [Someday Features](#someday-features)
11
+ - [Column Numbers](#column-numbers)
12
+ - [Multi-column Parsing](#multi-column-parsing)
13
+ - [Parse Dependencies](#parse-dependencies)
14
+
15
+ #### 1.1 Early Return While Parsing
16
+
17
+ Currently, CSVParty is pretty well thought out about what should happen when
18
+ either 1) one of the built in flow control methods (`next_row`, `skip_row`,
19
+ `abort_row`, and `abort_import`) is used, or 2) an error is raised while
20
+ the row importer block is being executed. However, all of these things can also
21
+ happen when the columns for a row are being parsed. When/if it does, most of the
22
+ flow control and error handling kind of assumes that the row has been fully
23
+ parsed. So some design work should go into deciding what should happen in these
24
+ cases. And then tests should be written for all of the various scenarios.
25
+
26
+ #### 1.2 Rows to Hash
27
+
28
+ One of the primary use cases for importing CSV files is to insert their contents
29
+ into a database. Apparently this is common enough that the
30
+ [csv-importer](https://github.com/pcreux/csv-importer) gem, which almost
31
+ completely automates this process without much room for customization, is very
32
+ popular. So, in the case where there is a pretty simple correspondence between
33
+ the contents of a CSV file and ActiveRecord models, it should be dead simple to
34
+ get the job done.
35
+
36
+ What I have in mind is something like:
37
+
38
+ class MyImporter < CSVParty::Importer
39
+ column :product_id
40
+ column :quantity
41
+ column :price
42
+
43
+ rows do |row|
44
+ LineItem.create(row.attributes)
45
+ end
46
+ end
47
+
48
+ Where `row.attributes` returns a hash with all of the column names as keys and
49
+ all of the parsed values as values. So, with an importer like the one above,
50
+ `row.attributes` would return a hash like so:
51
+
52
+ { product_id: 42, quantity: 3, price: 9.99 }
53
+
54
+ #### 1.3 Generate Unimported Rows CSV
55
+
56
+ Most user inputs to an application are relatively constrained. CSV files, on the
57
+ other hand, are not. Users can, and will, put all kinds of erroneous data into
58
+ their CSV files. So, it is useful to be able to provide a user with a list of
59
+ the rows in their file that could not be imported, so that they can re-import
60
+ these rows after they have resolved whatever issues existed. And CSV is a
61
+ natural format for this, since the user can open the file in Excel and make
62
+ edits.
63
+
64
+ A motivated user of CSVParty can already achieve this by accessing the
65
+ `skipped_rows`, `aborted_rows`, and `error_rows` arrays and constructing one or
66
+ more CSV files from these, but it would be nice to provide a default
67
+ implementation that is only a method call away. What I have in mind is for the
68
+ CSV file that is created to have the exact same column structure as the original
69
+ file, but with three additional columns:
70
+
71
+ - The original row number
72
+ - The status (skipped, aborted, errored)
73
+ - A message explaining the reason for the status
74
+
75
+ Conveniently, all of these pieces of data are available for skipped, aborted,
76
+ and errored rows. Then, the file would be generated with a method, like so:
77
+
78
+ # all three combined
79
+ importer.unimported_rows_as_csv
80
+ # or separate
81
+ importer.skipped_rows_as_csv
82
+ importer.aborted_rows_as_csv
83
+ importer.error_rows_as_csv
84
+
85
+ #### 1.4 Batch API
86
+
87
+ It can be way more performant to batch imports so that expensive operations,
88
+ like persisting data, are only done every so often. This would add an API to
89
+ accumulate data, execute some logic every X number of rows, reset the
90
+ accumulators, then repeat. Here's a rough sketch of what that API might look
91
+ like:
92
+
93
+ rows do |row|
94
+ customers[row.customer_id] = { name: row.customer_name, phone: row.phone }
95
+ orders[row.order_id] = { customer_id: row.customer_id, invoice_number: row.invoice_number }
96
+ end
97
+
98
+ batch 50, customers: {}, orders: {} do
99
+ # insert customers into database
100
+ # insert orders into database
101
+ end
102
+
103
+ The first argument is how often the batch logic should be executed. In this
104
+ case, every 50 rows. Then there is a hash of accumulators, where the keys are
105
+ the names of the accumulators and the values are the initial values. Declaring
106
+ the accumulators accomplished two things:
107
+
108
+ 1. It provides accessor methods so that the accumulators can be accessed from
109
+ within the row import block.
110
+ 2. It automatically resets the accumulators to their initial values each time
111
+ the batch block is executed.
112
+
113
+ So, it is essentially functionally identical to doing the following:
114
+
115
+ class MyImporter < CSVParty::Importer
116
+ attr_accessor :customers, :orders
117
+
118
+ def customers
119
+ @customers ||= {}
120
+ end
121
+
122
+ def orders
123
+ @orders ||= {}
124
+ end
125
+
126
+ rows do |row|
127
+ # add customer to customers accumulator
128
+ # add order to orders accumulator
129
+ end
130
+
131
+ batch 50 do
132
+ # insert customers into database
133
+ # insert orders into database
134
+ customers = {}
135
+ orders = {}
136
+ end
137
+ end
138
+
139
+ _Note:_ The following is a rough sketch of an API that would handle a use case
140
+ that has come up. However, some research should be done first to figure out if
141
+ the use case it addresses is common.
142
+
143
+ One use case that has been mentioned is when rows are grouped by their
144
+ relationship to a parent record and those rows need to be acted on as a group.
145
+ So, imagine a CSV file like so:
146
+
147
+ Customer,Address,Product,Quantity,Price
148
+ Joe Smith,123 Main St.,Birkenstocks,1,74.99
149
+ Joe Smith,123 Main St.,Air Jordans,1,129.99
150
+ Joe Smith,123 Main St.,Tevas,3,59.99
151
+ Jane Doe,713 Broadway,Converse All-Star,1,39.99
152
+ Jane Doe,713 Broadway,Toms,1,59.99
153
+
154
+ It might be useful to be able to specify the batch interval in terms of one of
155
+ the columns in the CSV file, rather than as a number of rows. So, you would be
156
+ able to do:
157
+
158
+ class MyImporter < CSVParty::Importer
159
+ column :customer
160
+ column :address
161
+ column :product
162
+ column :quantity, as: :integer
163
+ column :price, as: :decimal
164
+
165
+ rows do |row|
166
+ line_items << { product: row.product, quantity: row.quantity, price: row.price }
167
+ end
168
+
169
+ batch :customer, line_items: [] do |current_row|
170
+ Customer.create(name: current_row.customer, address: current_row.address)
171
+ line_items.each do |li|
172
+ LineItem.create(li)
173
+ end
174
+ end
175
+ end
176
+
177
+ In this case, the batch logic gets executed everytime there is a change in the
178
+ `:customer` column from one row to the next, rather than every X number of rows.
179
+ The accumulator works the same way: accessors are made available for adding
180
+ records to the accumulator and then the accumulator is automatically reset to
181
+ its initial value each time the batch logic is executed.
182
+
183
+ #### 1.5 Runtime Configuration
184
+
185
+ Sometimes it useful to be able to configure an importer at runtime, rather than
186
+ at code writing time. An obvious example of when this would be useful is in the
187
+ case of user defined column header names. So, imagine a UI in which the user
188
+ uploads their CSV file, then specifies which column is, for example, the product
189
+ column, which is the quantity column, and which is the price column. In a case
190
+ like this, there is no way to specify the column definitions ahead of time; we
191
+ have to wait for the header names from the user.
192
+
193
+ Here is a sketch of what the API for runtime configuration would look like:
194
+
195
+ class MyImporter < CSVParty::Importer
196
+ rows do |row|
197
+ # persist data
198
+ end
199
+ end
200
+
201
+ # then:
202
+
203
+ my_importer = MyImporter.new
204
+ my_importer.configure do
205
+ column :product, header: user_product_header
206
+ column :quantity, header: user_quantity_header, as: :integer
207
+ column :price, header: user_price_header, as: :decimal
208
+ end
209
+
210
+ An open question is whether all DSL methods should be configurable at runtime.
211
+
212
+ #### 1.6 CSV Parse Error Handling
213
+
214
+ Sometimes it is useful to be able to completely ignore parsing and encoding
215
+ errors raised by the `CSV` class. To be clear, doing so is dangerous, since the
216
+ parsing logic in the `CSV` class is not designed to continue operating after it
217
+ encounters an error and raises. But sometimes you don't want to let a single
218
+ improperly encoded character prevent you from importing an entire CSV file. So,
219
+ this feature would be an optional way to either ignore those errors or respond
220
+ to them, and then continue importing. The API would probably be similar to the
221
+ error handling API for non-parse errors. So:
222
+
223
+ parse_errors :ignore # silently continue importing the next row
224
+
225
+ parse_errors do |line_number|
226
+ # handle parse error
227
+ end
228
+
229
+ my_import.parse_error_rows # returns array of parse error rows
230
+
231
+ ## Someday Features
232
+
233
+ #### Column Numbers
234
+
235
+ CSVParty is entirely oriented around a CSV file having a header. This is not
236
+ always the case, though. This would add the ability to specify columns using a
237
+ column number, rather than a header. A rough sketch of the API might look like:
238
+
239
+ class MyImporter < CSVParty::Importer
240
+ column :product, number: 7
241
+ column :quantity, number: 8, as: :integer
242
+ column :price, number: 9, as: :decimal
243
+ end
244
+
245
+ #### Multi-column Parsing
246
+
247
+ The whole idea behind custom parsers is that it makes for much cleaner code to
248
+ get all the logic related to parsing a raw value into a useful intermediate
249
+ object in one place, away from the larger logic of what needs to happen to each
250
+ row. Sometimes, though, you need access to multiple column values to create a
251
+ useful parsed value. Here is what an API for that might look like:
252
+
253
+ column :total, header: ['Price', 'Quantity'] do |price, quantity|
254
+ BigDecimal.new(price) * BigDecimal.new(quantity)
255
+ end
256
+
257
+ #### Parse Dependencies
258
+
259
+ Sometimes, while parsing a column, it would be useful to have access to the
260
+ parsed value from another column. This would make that possible. Here is what
261
+ that might look like:
262
+
263
+ class MyImporter < CSVParty::Importer
264
+ column :customer do |customer_id|
265
+ Customer.find(customer_id)
266
+ end
267
+
268
+ column :order, depends_on: :customer do |order_id, customer|
269
+ customer.orders.find(order_id)
270
+ end
271
+ end