csv_party 0.0.1.pre9 → 1.0.0.rc4

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: b70149f2976c65072cd5ba2e20b1187f250ed7cf
4
- data.tar.gz: 64c20377be56bc6b01249b092d01e41f8caf756e
3
+ metadata.gz: 67d5895445a9fe397df95275260491ed6d9f6ce5
4
+ data.tar.gz: 542a1442466867afa33cf0ef883a32443777cfd5
5
5
  SHA512:
6
- metadata.gz: fa5d31d3102e420b6f5f4d63bea3b13c8a9752a77047d28e32804abdc3ba07e2e21a68e9cf5ec909dd474537345bd4785750cdfaf0fb7112138fbae9d5082ec8
7
- data.tar.gz: fd46166bafee072f6308212cd5c3d028d560078e7800f61b003d7c0a050afc6919ce14cf680465e4beb70860baa3630c44377c283e723c5440033fc1e8e28f48
6
+ metadata.gz: 1035dac76f5ec71d97015a5d9c2874074c28ad4b0ff7994ab565735fd6d49e539e4292a42ee15f44d8d2de2fb356e81a06f61685019fa3ad250d12ab9c160ba8
7
+ data.tar.gz: 93c748026adc2fa907f3e08825c0fef8b41fed5b2945e3066aff70978a86ed4c9d47df3a98abc30e25b2fb44a597cddb91efbb519699d2a9e3b4c580b4ef1100
data/LICENSE.md ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2018 Richard A. Jones
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,218 @@
1
+ [![Gem Version](https://badge.fury.io/rb/csv_party.svg)](https://badge.fury.io/rb/csv_party)
2
+ [![Build Status](https://travis-ci.org/toasterlovin/csv_party.svg?branch=master)](https://travis-ci.org/toasterlovin/csv_party)
3
+ [![Code Climate Maintainability](https://api.codeclimate.com/v1/badges/946d0dec172fda05d631/maintainability)](https://codeclimate.com/github/toasterlovin/csv_party/maintainability)
4
+ [![Code Climate Test Coverage](https://api.codeclimate.com/v1/badges/946d0dec172fda05d631/test_coverage)](https://codeclimate.com/github/toasterlovin/csv_party/test_coverage)
5
+
6
+ # Make importing CSV files a party
7
+
8
+ The point of this gem is to make it easier to focus on the business
9
+ logic of your CSV imports. You start by defining which columns you
10
+ will be importing, as well as how they will be parsed. Then, you
11
+ specify what you want to do with each row after it has been parsed.
12
+ That's it; CSVParty takes care of all the tedious stuff for you.
13
+
14
+ ## Defining Columns
15
+
16
+ This is what defining your import columns look like:
17
+
18
+ class MyImporter < CSVParty
19
+ column :price, header: 'Nonsensical Column Name', as: :decimal
20
+ end
21
+
22
+ This will take the value in the 'Nonsensical Column Name' column,
23
+ parse it as a decimal, then make it available to your import logic
24
+ as a nice, sane variable named `price`.
25
+
26
+ The available built-in parsers are:
27
+
28
+ - `:raw` returns the value from the CSV file, unchanged
29
+ - `:string` strips whitespace and returns the resulting string
30
+ - `:integer` strips whitespace, then calls `to_i` on the resulting string
31
+ - `:decimal` strips all characters except `0-9` and `.`, then passes the
32
+ resulting string to `BigDecimal.new`
33
+ - `:boolean` strips whitespace, downcases, then returns `true` if the
34
+ resulting string is `'1'`, `'t'`, or `'true'`, otherwise it returns `false`
35
+
36
+ When defining a column, you can also pass a block if you need custom
37
+ parsing logic:
38
+
39
+ class MyImporter < CSVParty
40
+ column :product, header: 'Product' do |value|
41
+ Product.find_by(name: value)
42
+ end
43
+ end
44
+
45
+ Or, if you want to re-use a custom parser for multiple columns, just
46
+ define a method on your class with a name that ends in `_parser` and
47
+ you can use it the same way you use the built-in parsers:
48
+
49
+ class MyImporter < CSVParty
50
+ def dollars_to_cents_parser(value)
51
+ (BigDecimal.new(value) * 100).to_i
52
+ end
53
+
54
+ column :price_in_cents, header: 'Price in $', as: :dollars_to_cents
55
+ column :cost_in_cents, header: 'Cost in $', as: :dollars_to_cents
56
+ end
57
+
58
+ #### NOTE: Parsing nil and blank values
59
+
60
+ By default, CSVParty will intercept any values that are `nil` or which contain
61
+ only whitespace and coerce them to `nil` _without invoking the parser for that
62
+ column_. This applies to all parsers, including custom parsers which you
63
+ define, with one exception: the :raw parser. This is done as a convenience to
64
+ avoid pesky `NoMethodError`s that arise when a parser tries to do its thing
65
+ to a `nil` value that it wasn't expecting. You can turn this behavior off on a
66
+ given column by setting `intercept_blanks` to `false` in the options hash:
67
+
68
+ class MyImporter < CSVParty
69
+ column :price, header: 'Price', intercept_blanks: false do |value|
70
+ if value.nil?
71
+ 'n/a'
72
+ else
73
+ BigDecimal.new(value)
74
+ end
75
+ end
76
+ end
77
+
78
+ #### NOTE: Parsers cannot reference each other
79
+
80
+ When using a custom parser to parse a column, the block or method that you
81
+ define has no way to reference the values from any other columns. So, this won't
82
+ work:
83
+
84
+ class MyImporter < CSVParty
85
+ column :product, header: 'Product', do |value|
86
+ Product.find_by(name: value)
87
+ end
88
+
89
+ column :price, header: 'Price', do |value|
90
+ # product is not defined...
91
+ product.price = BigDecimal.new(value)
92
+ end
93
+ end
94
+
95
+ Instead, you would do this in your row import logic. Which brings us to:
96
+
97
+ ## Importing Rows
98
+
99
+ Once you've defined all of your columns, you specify your logic for importing
100
+ rows by passing a block to the `rows` DSL method. That block will have access
101
+ to a `row` variable which contains all of the parsed values for your columns.
102
+ Here's what that looks like:
103
+
104
+ class MyImporter < CSVParty
105
+ rows do |row|
106
+ product = row.product
107
+ product.price = row.price
108
+ product.save
109
+ end
110
+ end
111
+
112
+ The `row` variable also provides access to two other things:
113
+
114
+ - The unparsed values for your columns
115
+ - The raw CSV string for that row
116
+
117
+ Here's how you access those:
118
+
119
+ class MyImporter < CSVParty
120
+ rows do |row|
121
+ row.price # parsed value: #<BigDecimal:7f88d92cb820,'0.9E1',9(18)>
122
+ row.unparsed.price # unparsed value: '$9.00'
123
+ row.string # raw CSV string: 'USB Cable,$9.00,Box,Blue'
124
+ end
125
+ end
126
+
127
+ ## Importing
128
+
129
+ Once your importer class is defined, you use it like this:
130
+
131
+ importer = MyImporter.new('path/to/file.csv')
132
+ importer.import!
133
+
134
+ You can also specify what should happen before and after your import by passing
135
+ a block to `import`, like so:
136
+
137
+ class MyImporter < CSVParty
138
+ # column definitions
139
+ # row import logic
140
+
141
+ import do
142
+ puts 'Starting import'
143
+ import_rows!
144
+ puts 'Import finished!'
145
+ end
146
+ end
147
+
148
+ You can do whatever you want inside of the `import` block, just make sure to
149
+ call `import_rows!` somewhere in there.
150
+
151
+ ## Handling Errors
152
+
153
+ One of the hallmarks of importing data from CSV files is that there are
154
+ inevitably rows with errors of some kind. You can handle error rows by
155
+ specifying an `errors` block:
156
+
157
+ class MyImporter < CSVParty
158
+ # column definitions
159
+ # row import logic
160
+
161
+ errors do |error, line_number|
162
+ # log error
163
+ end
164
+ end
165
+
166
+ Any row in your CSV file which results in an exception will be passed to this
167
+ block. Which means you can specify that there is an error with a given row by
168
+ raising an exception:
169
+
170
+ rows do |row|
171
+ # rows with price less than 0 will be treated as errors
172
+ raise if row.price < 0
173
+ end
174
+
175
+ ## External Dependencies
176
+
177
+ Sometimes you need access to external objects in your importer's logic. You can specify
178
+ what external objects your importer depends on with `depends_on`. Dependencies declared
179
+ this way will then be available in your parsers and your `rows`, `import`, and `errors`
180
+ blocks:
181
+
182
+ class MyImporter < CSVParty
183
+ # column definitions...
184
+
185
+ depends_on: :product_import
186
+
187
+ rows do |row|
188
+ # do some stuff
189
+
190
+ # product_import is not provided by the class,
191
+ # but is passed in at runtime instead!
192
+ product_import.log_success(product)
193
+ end
194
+ end
195
+
196
+ Then, to pass the dependency in at runtime, you just add an option to `.new` with
197
+ the name and value of the dependency:
198
+
199
+ MyImporter.new(
200
+ 'path/to/csv',
201
+ product_import: @product_import
202
+ )
203
+
204
+ # Tested Rubies
205
+
206
+ CSVParty has been tested against the following Rubies:
207
+
208
+ MRI
209
+ - 2.5
210
+ - 2.4
211
+ - 2.3
212
+ - 2.2
213
+ - 2.1
214
+ - 2.0
215
+
216
+ # License
217
+
218
+ This project uses the MIT License. See LICENSE.md for details.
data/ROADMAP.md ADDED
@@ -0,0 +1,271 @@
1
+ Roadmap
2
+ -
3
+
4
+ - [1.1 Early Return While Parsing](#11-early-return-while-parsing)
5
+ - [1.2 Rows to Hash](#12-rows-to-hash)
6
+ - [1.3 Generate Unimported Rows CSV](#13-generate-unimported-rows-csv)
7
+ - [1.4 Batch API](#14-batch-api)
8
+ - [1.5 Runtime Configuration](#15-runtime-configuration)
9
+ - [1.6 CSV Parse Error Handling](#16-csv-parse-error-handling)
10
+ - [Someday Features](#someday-features)
11
+ - [Column Numbers](#column-numbers)
12
+ - [Multi-column Parsing](#multi-column-parsing)
13
+ - [Parse Dependencies](#parse-dependencies)
14
+
15
+ #### 1.1 Early Return While Parsing
16
+
17
+ Currently, CSVParty is pretty well thought out about what should happen when
18
+ either 1) one of the built in flow control methods (`next_row`, `skip_row`,
19
+ `abort_row`, and `abort_import`) is used, or 2) an error is raised while
20
+ the row importer block is being executed. However, all of these things can also
21
+ happen when the columns for a row are being parsed. When/if it does, most of the
22
+ flow control and error handling kind of assumes that the row has been fully
23
+ parsed. So some design work should go into deciding what should happen in these
24
+ cases. And then tests should be written for all of the various scenarios.
25
+
26
+ #### 1.2 Rows to Hash
27
+
28
+ One of the primary use cases for importing CSV files is to insert their contents
29
+ into a database. Apparently this is common enough that the
30
+ [csv-importer](https://github.com/pcreux/csv-importer) gem, which almost
31
+ completely automates this process without much room for customization, is very
32
+ popular. So, in the case where there is a pretty simple correspondence between
33
+ the contents of a CSV file and ActiveRecord models, it should be dead simple to
34
+ get the job done.
35
+
36
+ What I have in mind is something like:
37
+
38
+ class MyImporter < CSVParty::Importer
39
+ column :product_id
40
+ column :quantity
41
+ column :price
42
+
43
+ rows do |row|
44
+ LineItem.create(row.attributes)
45
+ end
46
+ end
47
+
48
+ Where `row.attributes` returns a hash with all of the column names as keys and
49
+ all of the parsed values as values. So, with an importer like the one above,
50
+ `row.attributes` would return a hash like so:
51
+
52
+ { product_id: 42, quantity: 3, price: 9.99 }
53
+
54
+ #### 1.3 Generate Unimported Rows CSV
55
+
56
+ Most user inputs to an application are relatively constrained. CSV files, on the
57
+ other hand, are not. Users can, and will, put all kinds of erroneous data into
58
+ their CSV files. So, it is useful to be able to provide a user with a list of
59
+ the rows in their file that could not be imported, so that they can re-import
60
+ these rows after they have resolved whatever issues existed. And CSV is a
61
+ natural format for this, since the user can open the file in Excel and make
62
+ edits.
63
+
64
+ A motivated user of CSVParty can already achieve this by accessing the
65
+ `skipped_rows`, `aborted_rows`, and `error_rows` arrays and constructing one or
66
+ more CSV files from these, but it would be nice to provide a default
67
+ implementation that is only a method call away. What I have in mind is for the
68
+ CSV file that is created to have the exact same column structure as the original
69
+ file, but with three additional columns:
70
+
71
+ - The original row number
72
+ - The status (skipped, aborted, errored)
73
+ - A message explaining the reason for the status
74
+
75
+ Conveniently, all of these pieces of data are available for skipped, aborted,
76
+ and errored rows. Then, the file would be generated with a method, like so:
77
+
78
+ # all three combined
79
+ importer.unimported_rows_as_csv
80
+ # or separate
81
+ importer.skipped_rows_as_csv
82
+ importer.aborted_rows_as_csv
83
+ importer.error_rows_as_csv
84
+
85
+ #### 1.4 Batch API
86
+
87
+ It can be way more performant to batch imports so that expensive operations,
88
+ like persisting data, are only done every so often. This would add an API to
89
+ accumulate data, execute some logic every X number of rows, reset the
90
+ accumulators, then repeat. Here's a rough sketch of what that API might look
91
+ like:
92
+
93
+ rows do |row|
94
+ customers[row.customer_id] = { name: row.customer_name, phone: row.phone }
95
+ orders[row.order_id] = { customer_id: row.customer_id, invoice_number: row.invoice_number }
96
+ end
97
+
98
+ batch 50, customers: {}, orders: {} do
99
+ # insert customers into database
100
+ # insert orders into database
101
+ end
102
+
103
+ The first argument is how often the batch logic should be executed. In this
104
+ case, every 50 rows. Then there is a hash of accumulators, where the keys are
105
+ the names of the accumulators and the values are the initial values. Declaring
106
+ the accumulators accomplished two things:
107
+
108
+ 1. It provides accessor methods so that the accumulators can be accessed from
109
+ within the row import block.
110
+ 2. It automatically resets the accumulators to their initial values each time
111
+ the batch block is executed.
112
+
113
+ So, it is essentially functionally identical to doing the following:
114
+
115
+ class MyImporter < CSVParty::Importer
116
+ attr_accessor :customers, :orders
117
+
118
+ def customers
119
+ @customers ||= {}
120
+ end
121
+
122
+ def orders
123
+ @orders ||= {}
124
+ end
125
+
126
+ rows do |row|
127
+ # add customer to customers accumulator
128
+ # add order to orders accumulator
129
+ end
130
+
131
+ batch 50 do
132
+ # insert customers into database
133
+ # insert orders into database
134
+ customers = {}
135
+ orders = {}
136
+ end
137
+ end
138
+
139
+ _Note:_ The following is a rough sketch of an API that would handle a use case
140
+ that has come up. However, some research should be done first to figure out if
141
+ the use case it addresses is common.
142
+
143
+ One use case that has been mentioned is when rows are grouped by their
144
+ relationship to a parent record and those rows need to be acted on as a group.
145
+ So, imagine a CSV file like so:
146
+
147
+ Customer,Address,Product,Quantity,Price
148
+ Joe Smith,123 Main St.,Birkenstocks,1,74.99
149
+ Joe Smith,123 Main St.,Air Jordans,1,129.99
150
+ Joe Smith,123 Main St.,Tevas,3,59.99
151
+ Jane Doe,713 Broadway,Converse All-Star,1,39.99
152
+ Jane Doe,713 Broadway,Toms,1,59.99
153
+
154
+ It might be useful to be able to specify the batch interval in terms of one of
155
+ the columns in the CSV file, rather than as a number of rows. So, you would be
156
+ able to do:
157
+
158
+ class MyImporter < CSVParty::Importer
159
+ column :customer
160
+ column :address
161
+ column :product
162
+ column :quantity, as: :integer
163
+ column :price, as: :decimal
164
+
165
+ rows do |row|
166
+ line_items << { product: row.product, quantity: row.quantity, price: row.price }
167
+ end
168
+
169
+ batch :customer, line_items: [] do |current_row|
170
+ Customer.create(name: current_row.customer, address: current_row.address)
171
+ line_items.each do |li|
172
+ LineItem.create(li)
173
+ end
174
+ end
175
+ end
176
+
177
+ In this case, the batch logic gets executed everytime there is a change in the
178
+ `:customer` column from one row to the next, rather than every X number of rows.
179
+ The accumulator works the same way: accessors are made available for adding
180
+ records to the accumulator and then the accumulator is automatically reset to
181
+ its initial value each time the batch logic is executed.
182
+
183
+ #### 1.5 Runtime Configuration
184
+
185
+ Sometimes it useful to be able to configure an importer at runtime, rather than
186
+ at code writing time. An obvious example of when this would be useful is in the
187
+ case of user defined column header names. So, imagine a UI in which the user
188
+ uploads their CSV file, then specifies which column is, for example, the product
189
+ column, which is the quantity column, and which is the price column. In a case
190
+ like this, there is no way to specify the column definitions ahead of time; we
191
+ have to wait for the header names from the user.
192
+
193
+ Here is a sketch of what the API for runtime configuration would look like:
194
+
195
+ class MyImporter < CSVParty::Importer
196
+ rows do |row|
197
+ # persist data
198
+ end
199
+ end
200
+
201
+ # then:
202
+
203
+ my_importer = MyImporter.new
204
+ my_importer.configure do
205
+ column :product, header: user_product_header
206
+ column :quantity, header: user_quantity_header, as: :integer
207
+ column :price, header: user_price_header, as: :decimal
208
+ end
209
+
210
+ An open question is whether all DSL methods should be configurable at runtime.
211
+
212
+ #### 1.6 CSV Parse Error Handling
213
+
214
+ Sometimes it is useful to be able to completely ignore parsing and encoding
215
+ errors raised by the `CSV` class. To be clear, doing so is dangerous, since the
216
+ parsing logic in the `CSV` class is not designed to continue operating after it
217
+ encounters an error and raises. But sometimes you don't want to let a single
218
+ improperly encoded character prevent you from importing an entire CSV file. So,
219
+ this feature would be an optional way to either ignore those errors or respond
220
+ to them, and then continue importing. The API would probably be similar to the
221
+ error handling API for non-parse errors. So:
222
+
223
+ parse_errors :ignore # silently continue importing the next row
224
+
225
+ parse_errors do |line_number|
226
+ # handle parse error
227
+ end
228
+
229
+ my_import.parse_error_rows # returns array of parse error rows
230
+
231
+ ## Someday Features
232
+
233
+ #### Column Numbers
234
+
235
+ CSVParty is entirely oriented around a CSV file having a header. This is not
236
+ always the case, though. This would add the ability to specify columns using a
237
+ column number, rather than a header. A rough sketch of the API might look like:
238
+
239
+ class MyImporter < CSVParty::Importer
240
+ column :product, number: 7
241
+ column :quantity, number: 8, as: :integer
242
+ column :price, number: 9, as: :decimal
243
+ end
244
+
245
+ #### Multi-column Parsing
246
+
247
+ The whole idea behind custom parsers is that it makes for much cleaner code to
248
+ get all the logic related to parsing a raw value into a useful intermediate
249
+ object in one place, away from the larger logic of what needs to happen to each
250
+ row. Sometimes, though, you need access to multiple column values to create a
251
+ useful parsed value. Here is what an API for that might look like:
252
+
253
+ column :total, header: ['Price', 'Quantity'] do |price, quantity|
254
+ BigDecimal.new(price) * BigDecimal.new(quantity)
255
+ end
256
+
257
+ #### Parse Dependencies
258
+
259
+ Sometimes, while parsing a column, it would be useful to have access to the
260
+ parsed value from another column. This would make that possible. Here is what
261
+ that might look like:
262
+
263
+ class MyImporter < CSVParty::Importer
264
+ column :customer do |customer_id|
265
+ Customer.find(customer_id)
266
+ end
267
+
268
+ column :order, depends_on: :customer do |order_id, customer|
269
+ customer.orders.find(order_id)
270
+ end
271
+ end